BAF Fraud Modeling

Rob Wiederstein

February 23, 2026

Introduction

Bank Account Fraud Dataset

  • Synthetic online account applications
  • 1M rows (Base)
  • 8 months (0–7)
  • Base + 5 biased variants
  • Label: Fraud vs Legit
  • Fraud \(\approx 1\%\)

Typical Scenario

Fraudsters will

  1. Impersonate someone or

  2. Create fake identity then

  3. Max out the line or

  4. receive illicit payment

Data Cleaning

  • Relabel outcome.
  • -1 → NA.
  • Negative amount → NA.
  • Write clean Parquet.

Explore

Variable Importance

Figure 1: Top 15 features driving the diagnostic model.

Feature Interaction

Figure 2: Interaction between Credit Risk Score and Address History.

Missingness Signal

Figure 3: Missingness rates by outcome.

Numeric Correlation

Figure 4: Core numeric correlation matrix.

LightGBM

About

  • Originally released in 2016
  • Maintained by Microsoft
  • Over 18,000 stars on GitHub
  • King of Kaggle for tabular data
  • Announcing paper over 23,000 citations
  • Sped up similar gradient boosting algorithms 20x

Academic Support

For tabular supervised learning, gradient boosted decision trees—most notably XGBoost and LightGBM—are strong, low-latency baselines because they exploit hand-engineered behavioral features; LightGBM remains a standard reference point for card and e-commerce fraud tasks [1]

[W]e found that the LightGBM approach had the highest detection accuracy of fraudulent activity with 97% in the experiments conducted. An additional key objective of reducing false alerts was accomplished, as the number of false alarms went from 13,024 to 6,249[2]

[W]e choose LightGBM as the base machine learning model due to its efficiency and widespread use in handling large-scale and structured datasets, particularly in financial domains such as credit card fraud detection.[3]

Unbalanced Classes

The Challenge



The scarce occurrences of rare events impair the detection task …

Bank Fraud Prevalence

Figure 5: Fraudulent versus legitimate applications by month.

Fraud Prevalence

Table 1: Something
Month Fraud Legit Total % Fraud
0 1,500 130,940 132,440 1.13
1 1,198 126,422 127,620 0.94
2 1,198 135,781 136,979 0.87
3 1,392 149,544 150,936 0.92
4 1,452 126,239 127,691 1.14
5 1,411 117,912 119,323 1.18
6 1,450 106,718 108,168 1.34
7 1,428 95,415 96,843 1.47

Methods Tested

  • Standard: Baseline (No sampling).
  • Weighted: Cost-sensitive learning (\(4\times\) penalty).
  • Undersampling: Random removal of majority class.
  • SMOTE: Synthetic Minority Over-sampling Technique.
  • ADASYN: Adaptive Synthetic Sampling (hard examples).
  • Tomek Links: Cleaning boundary ambiguity.

Strategy Showdown: Results

Table 2: Performance comparison across imbalance strategies using 3-month rolling windows.
Class Imbalance Strategy Showdown
Paired t-test comparison against 'Standard' baseline
recipe avg_pr_auc avg_runtime p_val_vs_std significance
Smote 0.1635 3.9054546 0.9032 No (ns)
Standard 0.1631 2.3006599 1.0000 -
Adasyn 0.1627 3.7494911 0.8361 No (ns)
Weighted 0.1614 2.2227480 0.3971 No (ns)
Tomek 0.1497 2.5417861 0.0501 No (ns)
Under 0.1403 0.9416666 0.0505 No (ns)

Sampling Compared

Figure 6: PR-AUC performance versus computational training time.

Sampling Methods Discarded

  • No statistical gain

  • Resource intensive

  • Scalability

Feature Creation

Final Results

The Confusion Matrix

Figure 7

Precision & Recall


\[\text{Recall} = \frac{TP}{TP + FN}\]

Of all actual frauds, how many did we catch?

\[\text{Precision} = \frac{TP}{TP + FP}\]

Of all flagged cases, how many were real fraud?

ROC vs Precision-Recall AUC

  • Plots Recall vs False Positive Rate
  • AUC = 0.5 is random; 1.0 is perfect
  • Optimistic under class imbalance
  • Inflated by the large TN pool
  • Plots Precision vs Recall
  • Focuses entirely on the minority class
  • Harder to game with a large Legit majority
  • Preferred metric for fraud detection

Final Model Evaluation

Figure 8: Confusion Matrix Heatmap (5% Decision Threshold)

Diagnostic Metrics

Figure 9: ROC and Precision-Recall Curves for Out-of-Sample Data

References

[1]
G. Aminian et al., FraudTransformer: Time-Aware GPT for Transaction Fraud Detection.” arXiv, Oct. 2025. doi: 10.48550/arXiv.2509.23712.
[2]
C. Iscan, O. Kumas, F. P. Akbulut, and A. Akbulut, “Wallet-Based Transaction Fraud Prevention Through LightGBM With the Focus on Minimizing False Alarms,” IEEE Access, vol. 11, pp. 131465–131474, 2023, doi: 10.1109/ACCESS.2023.3321666.
[3]
X. Zhao, Y. Liu, and Q. Zhao, “Improved LightGBM for Extremely Imbalanced Data and Application to Credit Card Fraud Detection,” IEEE Access, vol. 12, pp. 159316–159335, 2024, doi: 10.1109/ACCESS.2024.3487212.