Skip to contents

Reads Base.csv from a MinIO/S3 bucket prefix (e.g., "01_raw") and writes a Hive-style partitioned Parquet dataset to another prefix (e.g., "02_intermediate"), partitioned by variant (e.g., variant=Base/part-*.parquet).

Usage

convert_to_parquet(from_prefix, to_prefix, bucket_name = "lake")

Arguments

from_prefix

Character. Prefix/key under the bucket containing CSVs (e.g. "01_raw").

to_prefix

Character. Prefix/key under the bucket to write Parquet dataset (e.g. "02_intermediate").

bucket_name

Character. Bucket name. Default "lake".

Value

A character string giving the destination dataset prefix (typically to_prefix).

Details

Connection settings are taken from environment variables:

  • BAF_ENDPOINT (e.g. "minio:9000" or "192.168.4.xx:9000")

  • BAF_KEY (MinIO access key)

  • BAF_SECRET (MinIO secret key)

Examples

if (FALSE) { # \dontrun{
Sys.setenv(
  BAF_ENDPOINT = "minio:9000",
  BAF_KEY      = "YOUR_ACCESS_KEY",
  BAF_SECRET   = "YOUR_SECRET_KEY"
)
convert_to_parquet(from_prefix = "01_raw", to_prefix = "02_intermediate", bucket_name = "lake")
} # }