Engineer features for the BAF dataset
engineer_features.RdReads the primary BAF dataset and engineers new features, such as
n_missing, which counts the number of missing values across key
tenure and financial columns. This calculation is performed out-of-memory
using Arrow compute.
Usage
engineer_features(
in_prefix = "baf-fraud/03_primary/variant=Base",
out_prefix = "baf-fraud/04_feature/variant=Base",
bucket_name = "lake",
partitioning = "month",
existing_data_behavior = "delete_matching",
verbose = TRUE
)Arguments
- in_prefix
Character. Input dataset prefix (e.g., "03_primary/variant=Base").
- out_prefix
Character. Output dataset prefix (e.g., "04_feature/variant=Base").
- bucket_name
Character. The S3/MinIO bucket name. Default "lake".
- partitioning
Character vector. Columns to partition by. Default "month".
- existing_data_behavior
Character. Behavior when data exists. Default "delete_matching".
- verbose
Logical. Whether to print progress messages. Default TRUE.