Preprocessing and postprocessing of input immune repertoire files
Source:R/io_repertoires_processing.R
preprocess_postprocess.Rd
Preprocessing and postprocessing of input immune repertoire files
Usage
make_default_preprocessing(format = c("airr", "10x"))
make_default_postprocessing()
make_exclude_columns(cols = imd_drop_cols("airr"))
make_productive_filter(col_name = c("productive"), truthy = TRUE)
make_barcode_prefix(prefix_col = "Prefix")
Arguments
- format
For
make_default_preprocessing()
, a character string specifying the input data format. Currently supports"airr"
(default) or"10x"
. This determines the default set of columns to exclude and the values considered "productive".- cols
For
make_exclude_columns()
, a character vector of column names to be removed from the dataset. Defaults toimd_drop_cols("airr")
. If empty, the returned function will not remove any columns.- col_name
For
make_productive_filter()
, a character vector of potential column names that indicate sequence productivity (e.g.,"productive"
). The first matching column found in the dataset will be used.- truthy
For
make_productive_filter()
, a value or vector of values that signify a productive sequence in thecol_name
column. Can be a logicalTRUE
(default for "airr" format) or a character vector of strings (e.g.,c("true", "TRUE", "True", "t", "T", "1")
for "10x" format).- prefix_col
For
make_barcode_prefix()
, the name of the column in the dataset that contains the prefix string to be added to each cell barcode. Defaults to"Prefix"
. The barcode column itself is identified internally viaimd_schema("barcode")
.
Value
Each make_*
function returns a new function. This returned function takes
a dataset
as its first argument and ...
for any additional arguments,
and performs the specific processing step.
make_default_preprocessing()
and make_default_postprocessing()
return a
named list of such functions.
Details
This collection of "maker" functions generates common preprocessing and
postprocessing function steps tailored for immune repertoire data.
Each make_*
function returns a new function that can then be applied
to a dataset.
These functions are designed to be flexible components in constructing custom data processing workflows.
The functions generated by these factories typically expect a dataset
(e.g., a duckplyr
with annotations) as their first argument
and may accept additional arguments via ...
(though often unused in the
predefined steps).
make_default_preprocessing()
andmake_default_postprocessing()
assemble a list of such processing functions.The individual
make_exclude_columns()
,make_productive_filter()
, andmake_barcode_prefix()
functions create specific transformation steps.
These steps are often used when reading data to standardize formats, filter unwanted records, or enrich information like cell barcodes. They are designed to gracefully handle cases where an operation is not applicable (e.g., a specified column is not found) by issuing a warning and returning the dataset unmodified.
Functions
make_default_preprocessing()
: Creates a default list of preprocessing functions suitable for "airr" or "10x" formatted data. This typically includes steps to exclude unnecessary columns and filter for productive sequences.make_default_postprocessing()
: Creates a default list of postprocessing functions, such as adding a prefix to cell barcodes.make_exclude_columns()
: Creates a function that, when applied to a dataset, removes a specified set of columns.make_productive_filter()
: Creates a function that filters a dataset to retain only rows where sequences are marked as productive, based on a specified column and set of "truthy" values.make_barcode_prefix()
: Creates a function that prepends a prefix (sourced from a specified column in the dataset) to the cell barcodes.