Optimize your AML/FT tools with a simulator

Optimize your AML/FT tools with a simulator

Optimize your AML/FT tools with a simulator 1871 1241 Quanteam

Criminal activities continue to exist, despite the considerable global effort made by all governments to put an end to illegal activities such as drug trafficking, terrorism and corruption. As long as these criminal activities continue to exist, so will money laundering, the two being closely linked. The aim of money laundering for criminals is to legitimize money earned illegally, in order to integrate it into the financial system and profit from it.

According to the United Nations Office on Drugs and Crime, between 2% and 5% of the world's GDP is laundered every year, representing an amount of between 800 billion and 2 trillion.

Financial crime is constantly evolving, using the latest technologies. That's why regulatory compliance and the implementation of scalable and adapted AML systems remain a major challenge for financial institutions.

This objective must be achieved while maintaining a high level of relevance, because too many false alarms (false-positives) generate a disproportionate financial cost.

There are many tools available to meet these challenges. The Actimize software package is a leader in this field. It offers advanced solutions for detecting money laundering and managing compliance risks.

This article will present the founding principles of these AML tools, but also discuss how reporting entities can develop them, and the possibility of assessing the impact of these developments without first having to modify the tools themselves.

These evaluation and simulation methods include the use of languages such as Java, Python and PL/SQL.

1. AML solutions

Financial institutions generally rely on tools offered by publishers or other suppliers involved in AML. These providers offer different types of solutions.

1.1. The different types of solution

  • Cloud solutions
    A number of publishers and RegTech offer cloud solutions, which offer a number of advantages and disadvantages. The main advantage is scalability, which is defined as the ability of an IT system to adapt to a company's changing needs and workloads.
    Cloud Solutions also offer pay-as-you-go or pay-as-you-go options. This helps to save money in the event of variations in requirements.
    The major drawback of these solutions comes from concerns about data confidentiality. The difficulty of migrating to other solutions at a later date can also be a problem.
    Of course, the chosen supplier must be sufficiently resilient to avoid any service interruptions.
  • On-premise solutions
    To avoid the disadvantages of cloud solutions, banks can use on-premise solutions. But these solutions are not as scalable as cloud solutions. Before deciding on the best solution, it's important to study your needs carefully. If IT resource requirements are stable, it's best to opt for an on-site solution. However, if requirements fluctuate, then a cloud solution will be more cost-effective.
  • In-house solutions
    The cost of licensing on-premise or cloud solutions can exceed hundreds of thousands of euros. That's why some banks develop their own software in-house, to strike the right balance between paying on-premise and open source solutions.

1.2 An example: AML-SAM Solution

AML-SAM (Anti-Money Laundering - Suspicious Activity Monitoring) is a tool published by NICE Actimize. It operates in the same way as other AML vendors: the tool generates an alert if certain predefined conditions are met. A score is also calculated and associated with the alert, to provide an indicator of the estimated seriousness of the alert to the compliance teams in charge of dealing with it. Exceeding thresholds, deviations from standards and the level of suspicion already established against the entities concerned by the alerts are all factors taken into account by the tool to calculate this score.

On the basis of this and the information attached to the alert, AML-SAM analysts (using ACTIMIZE Risk Case Manager) can then decide whether the account and/or holder concerned should be investigated further.

The diagram below shows all the stages in the complete AML-SAM workflow.

The AML-SAM solution is designed to process and analyze a wide range of source data and generate meaningful, targeted alerts. The steps that make up the AML-SAM solution can be summarized as follows:

  1. The solution receives data from the organization's data sources.
  2. Filter rules eliminate irrelevant data to determine the scope.
  3. Detection models then process the data to generate alerts.
  4. Alerts are then consolidated to facilitate the work of compliance officers.

Detection models

An AML monitoring solution contains several detection models, each based on one or more rules. Each rule monitors a laundering scenario and is based on one or more parameters, such as thresholds.
By way of illustration, a concrete case could be where a scenario is configured to detect a customer who makes a transfer of an amount that is too high in relation to his profile and business activity. In this configuration, an alert will be triggered as soon as a transaction represents more than a certain percentage of the customer's "normal" annual or monthly activity.

A detection model is therefore typically based on 3 elements:

  • Detection rules, possibly modulated or defined by risk level,
  • Trigger criteria or thresholds
  • A score associated with each alert,

Detection rules or thresholds can be defined differently for different population groups, corresponding to different customer profiles.
Detection models aim to identify unusual transactions or sets of transactions. To determine what is usual, the Compliance Officer separates customers into several groups, called population groups. Each group contains customers who share the same characteristics, such as comparable annual sales. There may be as many as ten or even a hundred groups. Each group will then have its own trigger thresholds to generate alerts relevant to the customer's usual profile.

Fine-Tuning Process

The ability to define trigger thresholds tailored to customer profiles meets a key need for compliance teams: to minimize the number of false positives, i.e. alerts triggered when their analysis will not reveal any evidence of money laundering or terrorist financing.
False positives can waste compliance officers' time, and even degrade their ability to analyze the most credible alerts. For this reason, compliance officers need to review thresholds in order to minimize the number of false positives and improve detection accuracy.
This process is called fine-tuning or calibration.

Fine tuning can be carried out using different approaches, or even a combination of them:

  • By examining historical transcripts, we may be able to identify recurring patterns. We also gain a better understanding of each customer's activity. All this helps to set up relevant thresholds or detection rules.
  • Statistical analysis. This plays an essential role in the fine-tuning process, by providing statistics that enable a better understanding of each customer's profile. The statistical measures most commonly used to analyze the trend and dispersion of historical transaction amounts and volumes are mean and standard deviation.

However, this fine-tuning often has to be done directly in the detection tool, even if it's in a dedicated environment such as a development or pre-production platform.
This approach has the disadvantage of relying solely on the chosen tool, and requires people with the necessary skills and expertise in the tool itself.
For this reason, some customers prefer to have a tool developed in-house to perform these fine-tuning simulations and estimate their impact on the number of alerts and the workload of Compliance teams.
Indeed, it may be more advantageous to work on the data used by the detection tool, but to set up a "simulator" in another environment, or even another language, to measure the effects of a threshold change or the implementation of a new rule.

2. Implementation of a simulator to optimize AML detection

2.1. General principles

The simulator is a system that reproduces the behavior of a detection tool in terms of alert generation, but is implemented in a completely different way.

It lets you simulate the volume of alerts under predefined conditions without affecting Actimize environments (for example, tests for other evolutions can continue while a simulation is running).

It provides a precise estimate of the number of alerts per model based on user-defined parameters, and can thus measure the number of alerts "avoided" or "added" by these new parameters and/or new detection rules.

Depending on the language and technologies used to build the simulator, it may be simpler and quicker to test several working hypotheses, to carry out backtesting (in particular to ensure that no alerts deemed relevant are generated), and to do so without mobilizing experts from the usual tool.

Also, depending on the environments available for the detection tool and those implemented for the simulator, shorter response times are generally observed when using a simulator, even on less powerful and less expensive environments.

For example, it's possible to create a simulator using a MySQL or posGreSQL database, and develop in Java or Python: it's much easier to find developers who master these development languages than experts in a solution like Actimize.

2.2. Simulator architecture

The simulator is generally composed of two modules, front-end and back-end. The front-end is the visual part of the application, dedicated to screen and user management. The back-end is the non-visual part of the application, responsible for managing processes, database connections and API calls.

The architecture of each part can be defined separately:

For the front-end, you could choose one of the three most widely used frameworks: Angular, React or Vue.js. These three frameworks offer a wide range of features, are fairly flexible and can meet the needs of different customers. As we'll see in the case study that concludes this article, Angular and its PrimeNG library offer many advantages.

For the backend, the following technologies seem to us to be the most appropriate:

2.3. PL/SQL

The main advantage of this technology is speed. The new version of Oracle 19c offers the possibility of loading tables/views in server memory, to eliminate I/O calls to disk and optimize processing.

Oracle also offers other possibilities for optimizing the process (for example, a mechanism for partitioning and sub-partitioning tables, or indexing, which makes it easy to locate records in the table).

What's more, adopting good PL/SQL development practices, such as avoiding FOR loops on large tables, results in optimized code. All these features make Oracle the best technology if performance and speed are the primary concerns.

However, all the above advantages come with two drawbacks: the difficulty of debugging PL/SQL programs and the scarcity of competent PL/SQL developers.

2.4. Python

Python is an easy language to learn, and deploying a program written in Python is straightforward. Python offers great flexibility and is also appreciated for its wealth of open source libraries and frameworks. In fact, the number of open source libraries written in Python exceeds a hundred thousand!

The number of Python developers is also quite high. Python is a popular language thanks to its easy syntax, which makes it attractive to young graduates. In short, it's much easier to find competent Python developers than it is for PL/SQL.

On the other hand, this language has one major drawback: it is an interpreted language and, like all languages of this type, its performance is inferior to that of compiled languages or languages with optimized interpretation, such as Java.

2.5. Spark or Hive

Spark and Hive are part of the Big Data ecosystem, but serve different purposes and have different underlying architectures. Here are a few differences between the two frameworks:

1. Usage 

  • Spark is a distributed computing framework that provides an interface for programming entire clusters with parallel data and fault tolerance. Spark is renowned for its fast processing of large-scale data.
  • Hive is a framework for data warehousing infrastructure built on Hadoop. Hive provides a querying language, called HiveSQL , which is similar to SQL and enables querying of data stored in HDDS (Hadoop Distributed File System).

2. Architecture

  • Spark saves data in memory between calculations, so you can quickly perform interactive queries.
  • Hive is built on Hadoop MapReduce and translates HiveSQL queries into MapReduce jobs that are executed in the Hadoop cluster. This type of architecture is slower than Spark for executing interactive queries.

3. Query language

  • Spark provides APIs in several programming languages, including Python, Scala and Java. Developers can use these APIs to process data. Spark also provides a SparkSQL module for executing SQL queries on data stored in Spark.
  • Hive provides a HiveSQL query language for executing SQL queries on HDFS data.

4. Ease of use

  • Spark is generally easier than Hive, thanks to its APIs in several programming languages.
  • Hive is designed for developers with SQL skills. It also eliminates the complexities of Hadoop's MapReduce programming.

5. Performance

  • Spark is faster than Hive thanks to its in-memory processing capabilities. Spark's runtime engine can launch iterative algorithms and interactive queries.
  • Hive is built on Hadoop MapReduce and suffers from the overhead of writing to and reading from disk, making it slower.

2.6. Java

Java is platform-independent, which means that once compiled, the Java program can be run on any platform running the JVM (Java Virtual Machine).

The Java ecosystem is rich in libraries and platforms for web development and database connectivity.

Java helps to ensure reliable code because its typing allows errors to be detected at compile time, especially in complex systems where accuracy and reliability are crucial.

To implement the detection logic of the AML model, it is advisable to apply filters and conditions, followed by aggregation and sorting operations: it is strongly inadvisable to recode these functionalities in Java from scratch. Rather than reinventing the wheel, it's best to use an open-source library such as Apache Commons Collections, which provides additional functionality to Java's standard collections. Its classes and methods are designed to simplify the process of manipulating data collections in Java.

3. Feedback

3.1. Python & PL/SQL

For one of our customers, who wanted to optimize their AML system and scenarios, we suggested setting up such a simulator, and one of the challenges was to compare the performance of 2 of the languages they were already using: Python and PL/SQL.

The data volume was around 1 million lines.

This comparison confirmed that PL/SQL is faster than Python. This is because Oracle is written in C, is a compiled language, and is the fruit of 50 years of hard work by the best developers and architects.

However, when the volume of data exceeds ten million rows, you should opt for a solution based on distributed computing in Python rather than PL/SQL.

There are several distributed computing solutions written in Python that can be used, such as Dask or Ray. These solutions make it possible to process large quantities of data quickly and efficiently.

On the other hand, using a distributed architecture in Python for volumes such as those used in this study is of little interest compared with the use of PL/SQL.

3.2. Angular and PrimeNG

For another of our customers, we suggested using Angular/PrimeNG for the frontend. The frontend includes the visual elements (buttons, forms, menus, user interface, etc.).

Angular is an open-source framework developed and maintained by Google. PrimeNG is an open-source library for Angular. It offers ready-to-use components.

PrimeNG enables developers to create attractive interfaces without having to describe complex code. PrimeNG's main advantage is its technical support team, which responds within one working day. The team also makes regular releases to fix bugs and add new features.

This "in-house" development approach, based on proven components, has proved its worth in the rapid implementation of a simulator that can also be used to consult alerts.

3.3 Comparative study (Spark Vs Oracle)

Researchers at Telkom University have published a comparative performance study between Spark and Oracle in the International Journal of Computer Visualization. The results showed that there were differences in query processing times between the two tools. Apache Spark is considered the better choice, offering relatively faster query processing times than the Oracle database.

They also concluded that Oracle is more reliable for storing complex data models than for analyzing large volumes of data.

Table 1 shows the configurations used to run Spark and Oracle. Based on the table, we can conclude that Oracle requires higher CPU, storage and RAM resources.

To test scalability and detect possible performance problems, the researchers carried out tests by executing different queries, then running the same query on different volumes. The results in terms of execution time are presented in the following two tables. Spark is faster and more scalable than Oracle.

This table shows the query processing times for both tools. At this stage, Apache Spark is superior to Oracle.

This last table shows the execution time of these queries on three data volumes (1K , 10K and 100K records): Spark is consistently faster than Oracle.

In conclusion...

Our recent visits to customers, as well as our study of more theoretical documents, show that using off-the-shelf solutions to implement an AML system is still the preferred solution, but that using tools developed in-house to optimize this system and simulate the operation of the off-the-shelf solution offers clear advantages, both in terms of speed and cost.

In addition, as such simulators can be used to measure the quantitative as well as qualitative impacts of the changes envisaged for a detection model in the context of AML, this approach can also be advantageously used for other areas based on alert detection, such as the fight against fraud or market abuse.

Do you have
A project or a problem?

an article written by...

Alain KHALIL

Consultant Manager in the Risk, Compliance and Regulatory Practice

Going further

Other news in
Risk, Compliance and Regulatory

Privacy preferences

When you visit our website, it may store information via your browser from specific services, usually in the form of cookies. Here you can change your privacy preferences. Please note that blocking certain types of cookies may affect your experience on our website and the services we are able to offer.

For performance and security reasons, we use Cloudflare
required
Enable / disable Google Analytics tracking code in the browser
Enable / disable the use of Google fonts in the browser
Enable / disable video integration in the browser
Privacy policy Privacy preferences
Our website contains third-party services for its proper functioning. Set your preferences and/or permissions for our use of cookies.