Joins additional annotation data to the annotations slot of an ImmunData
object.
This function allows you to add extra information to your repertoire data by joining a dataframe of annotations based on specified columns. It supports joining by one or more columns.
Usage
annotate_immundata(
idata,
annotations,
by,
keep_repertoires = TRUE,
remove_limit = FALSE
)
annotate(idata, annotations, by, keep_repertoires = TRUE, remove_limit = FALSE)
annotate_receptors(
idata,
annotations,
annot_col = imd_schema("receptor"),
keep_repertoires = TRUE,
remove_limit = FALSE
)
annotate_barcodes(
idata,
annotations,
annot_col = "<rownames>",
keep_repertoires = TRUE,
remove_limit = FALSE
)
annotate_chains(
idata,
annotations,
annot_col = imd_schema("chain"),
keep_repertoires = TRUE,
remove_limit = FALSE
)
Arguments
- idata
An
ImmunData
R6 object containing repertoire and annotation data.- annotations
A data frame containing the annotations to be joined.
- by
A named character vector specifying the columns to join by. The names of the vector should be the column names in
idata$annotations
and the values should be the corresponding column names in theannotations
data frame.- keep_repertoires
Logical. If
TRUE
(default) and theImmunData
object contains repertoire data (idata$schema_repertoire
is not NULL), the repertoires will be re-aggregated after joining the annotations. Set toFALSE
if you do not want to re-aggregate repertoires immediately.- remove_limit
Logical. If
FALSE
(default), a warning will be issued if theannotations
data frame has 100 or more columns, suggesting potential performance issues. Set toTRUE
to disable this warning and allow joining of annotations with an arbitrary number of columns. Use with caution, as joining wide dataframes can be memory-intensive and slow.- annot_col
A character vector specifying the column with receptor, barcode or chain identifiers to annotate a corresponding receptors, barode or chains in
idata
.
Details
The function performs a left join operation, keeping all rows from
idata$annotations
and adding matching columns from the annotations
data frame.
If there are multiple matches in annotations
for a row in idata$annotations
,
all combinations will be returned, potentially increasing the number of rows
in the resulting annotations table.
The function uses checkmate
to validate the input types and structure.
A check is performed to ensure that the columns specified in by
exist in both
idata$annotations
and the annotations
data frame.
The annotations
data frame is converted to a duckdb tibble internally for
efficient joining, especially with large datasets.
Warning
By default (remove_limit = FALSE
), joining an annotations
data frame with 100 or
more columns will trigger a warning. This is a safeguard to prevent accidental
joining of very wide data (e.g., gene expression data) that could lead to
performance degradation or crashes. If you understand the risks and intend to join
a wide data frame, set remove_limit = TRUE
.
Examples
if (FALSE) { # \dontrun{
# Assuming 'my_immun_data' is an ImmunData object and 'sample_info' is a data frame
# with a column 'sample_id' matching 'sample' in my_immun_data$annotations
# and additional columns like 'treatment' and 'disease_status'.
sample_info <- data.frame(
sample_id = c("sample1", "sample2", "sample3", "sample4"),
treatment = c("Treatment A", "Treatment B", "Treatment A", "Treatment C"),
disease_status = c("Healthy", "Disease", "Healthy", "Disease"),
stringsAsFactors = FALSE # Important to keep characters as characters
)
# Join sample information using the 'sample' column
my_immun_data_annotated <- annotate(
idata = my_immun_data,
annotations = sample_info,
by = c("sample" = "sample_id")
)
# New sample_info
# Join data by multiple columns, e.g., 'sample' and 'barcode'
# Assuming 'cell_annotations' is a data frame with 'sample_barcode' and 'cell_type'
my_immun_data_cell_annotated <- annotate(
idata = my_immun_data,
annotations = cell_annotations,
by = c("sample" = "sample", "barcode" = "sample_barcode")
)
# Join a wide dataframe, suppressing the column limit warning
# Assuming 'gene_expression' is a data frame with 'barcode' and many gene columns
my_immun_data_gene_expression <- annotate(
idata = my_immun_data,
annotations = gene_expression,
by = c("barcode" = "barcode"),
remove_limit = TRUE
)
} # }