R/io_repertoires_processing.R
preprocess_postprocess.Rd
Preprocessing and postprocessing of input immune repertoire files
make_default_preprocessing(format = c("airr", "10x"))
make_default_postprocessing()
make_exclude_columns(cols = imd_drop_cols("airr"))
make_productive_filter(col_name = c("productive"), truthy = TRUE)
make_barcode_prefix(prefix_col = "Prefix")
For make_default_preprocessing()
, a character string specifying
the input data format. Currently supports "airr"
(default) or "10x"
.
This determines the default set of columns to exclude and the values
considered "productive".
For make_exclude_columns()
, a character vector of column names
to be removed from the dataset. Defaults to imd_drop_cols("airr")
.
If empty, the returned function will not remove any columns.
For make_productive_filter()
, a character vector of potential
column names that indicate sequence productivity (e.g., "productive"
).
The first matching column found in the dataset will be used.
For make_productive_filter()
, a value or vector of values
that signify a productive sequence in the col_name
column.
Can be a logical TRUE
(default for "airr" format) or a character vector
of strings (e.g., c("true", "TRUE", "True", "t", "T", "1")
for "10x" format).
For make_barcode_prefix()
, the name of the column in the
dataset that contains the prefix string to be added to each cell barcode.
Defaults to "Prefix"
. The barcode column itself is identified internally
via imd_schema("barcode")
.
Each make_*
function returns a new function. This returned function takes
a dataset
as its first argument and ...
for any additional arguments,
and performs the specific processing step.
make_default_preprocessing()
and make_default_postprocessing()
return a
named list of such functions.
This collection of "maker" functions generates common preprocessing and
postprocessing function steps tailored for immune repertoire data.
Each make_*
function returns a new function that can then be applied
to a dataset.
These functions are designed to be flexible components in constructing custom data processing workflows.
The functions generated by these factories typically expect a dataset
(e.g., a duckplyr
with annotations) as their first argument
and may accept additional arguments via ...
(though often unused in the
predefined steps).
make_default_preprocessing()
and make_default_postprocessing()
assemble
a list of such processing functions.
The individual make_exclude_columns()
, make_productive_filter()
, and
make_barcode_prefix()
functions create specific transformation steps.
These steps are often used when reading data to standardize formats, filter unwanted records, or enrich information like cell barcodes. They are designed to gracefully handle cases where an operation is not applicable (e.g., a specified column is not found) by issuing a warning and returning the dataset unmodified.
make_default_preprocessing()
: Creates a default list of preprocessing
functions suitable for "airr" or "10x" formatted data. This typically
includes steps to exclude unnecessary columns and filter for productive sequences.
make_default_postprocessing()
: Creates a default list of postprocessing
functions, such as adding a prefix to cell barcodes.
make_exclude_columns()
: Creates a function that, when applied to a
dataset, removes a specified set of columns.
make_productive_filter()
: Creates a function that filters a dataset
to retain only rows where sequences are marked as productive, based on
a specified column and set of "truthy" values.
make_barcode_prefix()
: Creates a function that prepends a prefix
(sourced from a specified column in the dataset) to the cell barcodes.