Fraud Detection

According to IBM, 72% of business leaders cite fraud as a growing concern over the past 12 months. USD 44 billion worldwide losses due to fraud by 2025. 25% of declined e-commerce sales transactions were false positives.

  • ~$300K Expected Savings due to Fraud Detection Model
  • 30.4% increase in bottom-line
  • 98.7% accuracy in predicting fraud

Business Implementation

Fraud detection means a hypernym for all applications finding unusual patterns or unexpected behavior like identifying process patterns, network intrusions, or identifying utterances with different meanings in texts.

Threats, frauds, and outliers are growing challenges for almost all companies. Since they occur in every corner of operations, companies must be aware of intrusions and other abnormal or malicious activities on many levels.

Challenges Associated with Fraud Detection

Data issues

Digital transformation amplifies data problems. Silos and data surplus can usher an incomplete view of risk disclosures, preventing the visibility of practices and behaviors needed for prediction.

Our solution

Our blueprints can help scale analysis across your entire organization. Ability to speed modeling, training, and deployment period while facilitating cooperation and sticking to a robust governance and security stance.

Predicting fraud

Fraud creates imbalanced data sets. Today's swift transaction periods and ever-evolving fraudster tactics need help to identify, predict, compensate and recover immediately.

Our solution

Our blueprint supports visual programming to upskill your team. Data science for more people, quicker discovery and deployment, deep learning, and state-of-the-art analytics allow companies to shift from detection to prediction.

Cost of fraud

False positives require expensive manual analyses. ROI is negatively affected by misplaced payouts.

Our solution

makes faster, more accurate decisions. Leverage unstructured data and enable 

deep learning and neural networks to lower false positives. 

Here’s a sample of one of our visualization approaches. It describes the relationship between the transaction amount and the number of Fraudulent transactions in each bin (a certain amount bracket) in a specific Fraud Detection dataset. Such visualizations help interpret the model better and give the user a better understanding of the nature of the data they’re working with.

Salient Features of our AI Blueprint

There are some features of our blueprint to capture specific characteristics that help 

predict fraud. 

Conventional features

These aspects predict fraud, for example, orders, transactions, cards, location, and email.

Behavioral features

Our blueprints derive behavioral attributes from the customer session - these features convey the customer actions, e.g., the velocity of orders, time spent on the page, and length of time between adding an unknown card and completing an order. One intent in extracting these features is to catch other subversive technology: if a fraudster uses a script to scrape a webpage vs. regular browsing activity.

Real-time features

It is delivered on up-to-date, real-world incidences of fraud. These features are based on categorical data to detect fraud rates in certain regions/countries - give the real-time fraud rate by category, e.g., country / ASN card digits/email domain, etc. We help monitor the real-time traffic to enable merchants coherently move into fresh/new markets (with no existing data) without noticing any unfavorable effects from the machine learning model, like biases. 

Individual customer features

These segments are similar to the specific customer’s typical past behavior: typical spending, regular billing address, or home IP address.

Entity features

Entities include devices, addresses, locations, domains, and emails. One objective is to alert the company to a fraud goods drop-off point. An example of this feature is the number of orders dispatched to a specific address.

Features of Attri AI Blueprint

  • Onboard Metrics/KPIs:
    Connect to real-time and offline data to detect outliers at a massive scale with Attri
  • Detect Fraud:
    Understand what went wrong by identifying data pattern changes 
  • Fast-Track Problem-Solving:
    Use interactive root-cause analysis to gain insights into the forecast and what caused it

Benefits of Attri Fraud Detection Blueprint

  • Deploy and detect fraudulent sources in days, not months
  • Access our blueprint without the need for dedicated teams.
  • Expedite alerts faster with a holistic view of customers with an omnichannel case manager
  • Easily explain risk decisions to regulators and teams with Whitebox explanations.
  • Accelerate investigations with detailed analysis
  • Score transactions in real-time to stop fraud before it happens.

Tech Implementation

Understanding the Data

In this example, we aim to detect if an online transaction is fraudulent or otherwise. The variable of interest (Target Variable) is isFraud. The data that we’ll be using here is synthetically generated. We have over 6M+ training samples and 10 features (columns).

Input Features

Setting up the Training

The data that we have here is extremely imbalanced, it needs to be refined before it is used to train the models. We have about 6M+ non-fraudulent samples, as opposed to a mere 8000 fraudulent samples. In order to tackle this problem, we use data resampling techniques. First, we strategically reduce the number of samples corresponding to the majority class (Non-Fraudulent Transactions), later we use a variant of the Synthetic Minority Oversampling TEchnique (SMOTE) to generate samples for the minority class (Fraudulent Transactions). Furthermore, we drop certain input features (step, nameOrig, nameDest) to reduce the complexity of the model.

Customization made easy

When it comes to model training, we train various models spanning something as simple as logistic regression to something as sophisticated as a neural network, all while varying the hyperparameters of the models, so you don't have to worry if you are using the right model for the job. We operate in a completely transparent manner making sure that there are no black boxes. We make use of open-source libraries and frameworks so that you can easily modify and optimize the pipeline for much more customization should you need it. Thanks to our transparent approach, you can get your hands dirty by playing around with the hyperparameters should you feel the need for experimentation. The performance of the model corresponding to different sets of hyperparameters is saved so you don’t need to make a note every time you intend to tweak the model. 

Model Selection and Evaluation Metrics

For this use case, it is ideal to give preference to Recall. Recall is the number of true positives predicted out of the total number of positives. For example, if you’re working with a recall of let's say 0.8, you’re essentially predicting 8 transactions as positives (fraudulent) when you should’ve predicted 10. It is ideal in cases where it is okay to have a false detection (predicting a non-fraudulent transaction to be fraudulent) but a missed true detection could be costly (predicting a fraudulent transaction otherwise). 

Note that the confusion matrix is normalized.

The Model’s perspective

It is important to understand what led the model to arrive at a certain prediction. Technically referred to as Model Explainability, it is an important phase in the MLOps cycle that is generally overlooked. 

Model Explainability plays a crucial role in building trust around the model. This plot has been generated by an open-source library ‘SHAP’. While the plot might be intimidating at the first glance, it is pretty straightforward and informative. Consider amount, the lower the value (blue) of the amount the greater its impact towards a positive prediction (Fraudulent), the higher the value (red) the greater its impact on the other class (Non-Fraudulent).

The Last Mile

We now have the model ready, and the next big step is to take it live. Model Monitoring and Responsibility are things that are often overlooked. Models in production come with their own set of challenges some noteworthy ones are model drift and data drift. It is also important to make sure that the models are responsible, that is, the prediction of the models shouldn’t biased to sensitive elements such as race, and gender. Our blueprints are further integrated with censius- a platform for end-to-end AI Observability.