Preprocessing and postprocessing of input immune repertoire files

Usage

make_default_preprocessing(format = c("airr", "10x"))

make_default_postprocessing()

make_exclude_columns(cols = imd_drop_cols("airr"))

make_productive_filter(col_name = c("productive"), truthy = TRUE)

make_barcode_prefix(prefix_col = "Prefix")

Arguments

format: For make_default_preprocessing(), a character string specifying the input data format. Currently supports "airr" (default) or "10x". This determines the default set of columns to exclude and the values considered "productive".
cols: For make_exclude_columns(), a character vector of column names to be removed from the dataset. Defaults to imd_drop_cols("airr"). If empty, the returned function will not remove any columns.
col_name: For make_productive_filter(), a character vector of potential column names that indicate sequence productivity (e.g., "productive"). The first matching column found in the dataset will be used.
truthy: For make_productive_filter(), a value or vector of values that signify a productive sequence in the col_name column. Can be a logical TRUE (default for "airr" format) or a character vector of strings (e.g., c("true", "TRUE", "True", "t", "T", "1") for "10x" format).
prefix_col: For make_barcode_prefix(), the name of the column in the dataset that contains the prefix string to be added to each cell barcode. Defaults to "Prefix". The barcode column itself is identified internally via imd_schema("barcode").

Value

Each make_* function returns a new function. This returned function takes a dataset as its first argument and ... for any additional arguments, and performs the specific processing step. make_default_preprocessing() and make_default_postprocessing() return a named list of such functions.

Details

This collection of "maker" functions generates common preprocessing and postprocessing function steps tailored for immune repertoire data. Each make_* function returns a new function that can then be applied to a dataset.

These functions are designed to be flexible components in constructing custom data processing workflows.

The functions generated by these factories typically expect a dataset (e.g., a duckplyr with annotations) as their first argument and may accept additional arguments via ... (though often unused in the predefined steps).

make_default_preprocessing() and make_default_postprocessing() assemble a list of such processing functions.
The individual make_exclude_columns(), make_productive_filter(), and make_barcode_prefix() functions create specific transformation steps.

These steps are often used when reading data to standardize formats, filter unwanted records, or enrich information like cell barcodes. They are designed to gracefully handle cases where an operation is not applicable (e.g., a specified column is not found) by issuing a warning and returning the dataset unmodified.

Functions