Package index
Data Ingestion & Lakehouse Setup
Functions for moving raw CSV data into the MinIO Lakehouse as partitioned Parquet.
-
baflakehouse-package - baflakehouse: Lakehouse Workflow for the Bank Account Fraud Dataset
-
convert_to_parquet() - Convert BAF CSV to partitioned Parquet in MinIO (S3)
-
connect_baf() - Connect to BAF dataset on MinIO (Arrow or DuckDB)
-
clean_baf_base() - Clean the BAF Base dataset and write to 03_primary
-
engineer_features() - Engineer features for the BAF dataset
-
generate_model_inputs() - Generate Resampled Model Inputs
-
build_eda_recipe() - Build EDA Recipe
-
build_baf_recipe() - Build Untrained BAF Recipe
-
train_diag_model() - Train Diagnostic Model
-
plot_var_imp() - Plot Variable Importance
-
plot_hexbin_interaction() - Plot Hexbin Interaction
-
plot_missingness() - Plot Missingness Signal
-
plot_num_cor() - Plot Numeric Correlation Matrix
Model Selection & Tuning
Imbalance strategy tournament, hyperparameter tuning, and results formatting.
-
run_imbalance_tournament() - Run Class Imbalance Tournament
-
tune_lgbm() - Tune LightGBM Hyperparameters
-
format_tournament_gt() - Format Tournament Results Table
-
plot_efficiency() - Plot Effectiveness vs Efficiency
Final Evaluation & Production Deployment
Holdout evaluation on months 6-7 and MinIO model artifact serialization.
-
evaluate_final_model() - Final Model Evaluation (Months 6 & 7)
-
train_production_model() - Train and Serialize Production LightGBM Model
-
plot_fraud_by_month() - Plot applications by month (Legit vs Fraud) on a log scale
-
plot_conf_mat_heatmap() - Plot Confusion Matrix Heatmap
-
compute_fraud_by_month() - Fraud prevalence by month (counts + percent)
-
format_fraud_by_month_gt() - Format fraud-by-month table as a gt object
-
save_report_figure() - Save a report figure artifact
-
save_report_table() - Save a report table artifact
-
render_slides() - Render Quarto revealjs slideshow after required assets exist