Joins additional annotation data to the annotations slot of an ImmunData
object.
This function allows you to add extra information to your repertoire data by joining a dataframe of annotations based on specified columns. It supports joining by one or more columns.
annotate_immundata(
idata,
annotations,
by,
keep_repertoires = TRUE,
remove_limit = FALSE
)
annotate(idata, annotations, by, keep_repertoires = TRUE, remove_limit = FALSE)
annotate_receptors(
idata,
annotations,
annot_col = imd_schema("receptor"),
keep_repertoires = TRUE,
remove_limit = FALSE
)
annotate_barcodes(
idata,
annotations,
annot_col = "<rownames>",
keep_repertoires = TRUE,
remove_limit = FALSE
)
annotate_chains(
idata,
annotations,
annot_col = imd_schema("chain"),
keep_repertoires = TRUE,
remove_limit = FALSE
)
An ImmunData
R6 object containing repertoire and annotation data.
A data frame containing the annotations to be joined.
A named character vector specifying the columns to join by. The names of the
vector should be the column names in idata$annotations
and the values should be
the corresponding column names in the annotations
data frame.
Logical. If TRUE
(default) and the ImmunData
object
contains repertoire data (idata$schema_repertoire
is not NULL), the repertoires
will be re-aggregated after joining the annotations. Set to FALSE
if you do not
want to re-aggregate repertoires immediately.
Logical. If FALSE
(default), a warning will be issued if the
annotations
data frame has 100 or more columns, suggesting potential performance
issues. Set to TRUE
to disable this warning and allow joining of annotations
with an arbitrary number of columns. Use with caution, as joining wide dataframes
can be memory-intensive and slow.
A character vector specifying the column with receptor, barcode or chain identifiers
to annotate a corresponding receptors, barode or chains in idata
.
A new ImmunData
object with the annotations joined to the annotations
slot.
The function performs a left join operation, keeping all rows from
idata$annotations
and adding matching columns from the annotations
data frame.
If there are multiple matches in annotations
for a row in idata$annotations
,
all combinations will be returned, potentially increasing the number of rows
in the resulting annotations table.
The function uses checkmate
to validate the input types and structure.
A check is performed to ensure that the columns specified in by
exist in both
idata$annotations
and the annotations
data frame.
The annotations
data frame is converted to a duckdb tibble internally for
efficient joining, especially with large datasets.
By default (remove_limit = FALSE
), joining an annotations
data frame with 100 or
more columns will trigger a warning. This is a safeguard to prevent accidental
joining of very wide data (e.g., gene expression data) that could lead to
performance degradation or crashes. If you understand the risks and intend to join
a wide data frame, set remove_limit = TRUE
.
if (FALSE) { # \dontrun{
# Assuming 'my_immun_data' is an ImmunData object and 'sample_info' is a data frame
# with a column 'sample_id' matching 'sample' in my_immun_data$annotations
# and additional columns like 'treatment' and 'disease_status'.
sample_info <- data.frame(
sample_id = c("sample1", "sample2", "sample3", "sample4"),
treatment = c("Treatment A", "Treatment B", "Treatment A", "Treatment C"),
disease_status = c("Healthy", "Disease", "Healthy", "Disease"),
stringsAsFactors = FALSE # Important to keep characters as characters
)
# Join sample information using the 'sample' column
my_immun_data_annotated <- annotate(
idata = my_immun_data,
annotations = sample_info,
by = c("sample" = "sample_id")
)
# New sample_info
# Join data by multiple columns, e.g., 'sample' and 'barcode'
# Assuming 'cell_annotations' is a data frame with 'sample_barcode' and 'cell_type'
my_immun_data_cell_annotated <- annotate(
idata = my_immun_data,
annotations = cell_annotations,
by = c("sample" = "sample", "barcode" = "sample_barcode")
)
# Join a wide dataframe, suppressing the column limit warning
# Assuming 'gene_expression' is a data frame with 'barcode' and many gene columns
my_immun_data_gene_expression <- annotate(
idata = my_immun_data,
annotations = gene_expression,
by = c("barcode" = "barcode"),
remove_limit = TRUE
)
} # }