Applies transformations to the $annotations
table within an ImmunData
object, similar to dplyr::mutate
. It allows adding new columns or modifying
existing non-schema columns using standard dplyr
expressions. Additionally,
it can add new columns based on sequence comparisons (exact match, regular
expression matching, or distance calculation) against specified patterns.
Usage
mutate_immundata(idata, ..., seq_options = NULL)
# S3 method for class 'ImmunData'
mutate(.data, ..., seq_options = NULL)
Arguments
- idata, .data
An
ImmunData
object.- ...
dplyr::mutate
-style named expressions (e.g.,new_col = existing_col * 2
,category = ifelse(value > 10, "high", "low")
). These are applied first. Important: You cannot use names for new or modified columns that conflict with the coreImmunData
schema columns (retrieved viaimd_schema()
).- seq_options
Optional named list specifying sequence-based annotation options. Use
make_seq_options()
for convenient creation. Seefilter_immundata
documentation (?filter_immundata
) or the details section here for the list structure (query_col
,patterns
,method
,name_type
).max_dist
is ignored for mutation. IfNULL
(the default), no sequence-based columns are added.
Value
A new ImmunData
object with the $annotations
table modified according
to the provided expressions and seq_options
. The $repertoires
table (if present)
is carried over unchanged from the input idata
.
Details
The function operates in two main steps:
Standard Mutations (
...
): Applies the standarddplyr::mutate
-style expressions provided in...
to the$annotations
table. You can create new columns or modify existing ones, but you cannot modify columns defined in the coreImmunData
schema (e.g.,receptor_id
,cell_id
). An error will occur if you attempt to do so.Sequence-based Annotations (
seq_options
): Ifseq_options
is provided, the function calculates sequence similarities or distances and adds corresponding new columns to the$annotations
table.method = "exact"
: Adds boolean columns (TRUE/FALSE) indicating whether thequery_col
value exactly matches eachpattern
. Column names are generated using a prefix (e.g.,sim_exact_
) and the pattern or its index.method = "regex"
: Usesannotate_tbl_regex
to add columns indicating matches for each regular expression pattern against thequery_col
. The exact nature of the added columns depends onannotate_tbl_regex
(e.g., boolean flags or captured groups).method = "lev"
ormethod = "hamm"
: Usesannotate_tbl_distance
to calculate Levenshtein or Hamming distances between thequery_col
and eachpattern
, adding columns containing these numeric distances.max_dist
is ignored in this context (internally treated asNA
) as all distances are calculated and added, not used for filtering.The naming of the new sequence-based columns depends on the
name_type
option withinseq_options
and internal helper functions likemake_pattern_columns
. Prefixes likesim_exact_
,sim_regex_
,dist_lev_
,dist_hamm_
are typically used based on the schema.
The $repertoires
table, if present in the input idata
, is copied to the
output object without modification. This function only affects the $annotations
table.
See also
dplyr::mutate()
, make_seq_options()
, filter_immundata()
, ImmunData,
vignette("immundata-classes", package = "immunarch")
(replace with actual package name if different)
Examples
# Basic setup (assuming idata_test is a valid ImmunData object)
# print(idata_test)
if (FALSE) { # \dontrun{
# Example 1: Add a simple derived column
idata_mut1 <- mutate(idata_test, V_family = substr(V_gene, 1, 5))
print(idata_mut1$annotations)
# Example 2: Add multiple columns and modify one (if 'custom_score' exists)
# Note: Avoid modifying core schema columns like 'V_gene' itself.
idata_mut2 <- mutate(idata_test,
V_basic = gsub("-.*", "", V_gene),
J_len = nchar(J_gene),
custom_score = custom_score * 1.1
) # Fails if custom_score doesn't exist
print(idata_mut2$annotations)
# Example 3: Add boolean columns for exact CDR3 matches
cdr3_patterns <- c("CARGLGLVFYGMDVW", "CARDNRGAVAGVFGEAFYW")
seq_opts_exact <- make_seq_options(
query_col = "CDR3_aa",
patterns = cdr3_patterns,
method = "exact",
name_type = "pattern"
) # Name cols by pattern
idata_mut_exact <- mutate(idata_test, seq_options = seq_opts_exact)
# Look for new columns like 'sim_exact_CARGLGLVFYGMDVW'
print(idata_mut_exact$annotations)
# Example 4: Add Levenshtein distance columns for a CDR3 pattern
seq_opts_lev <- make_seq_options(
query_col = "CDR3_aa",
patterns = "CARGLGLVFYGMDVW",
method = "lev",
name_type = "index"
) # Name col like 'dist_lev_1'
idata_mut_lev <- mutate(idata_test, seq_options = seq_opts_lev)
# Look for new column 'dist_lev_1' (or similar based on schema)
print(idata_mut_lev$annotations)
# Example 5: Combine standard mutation and sequence annotation
seq_opts_regex <- make_seq_options(
query_col = "V_gene",
patterns = c(ighv1 = "^IGHV1-", ighv3 = "^IGHV3-"),
method = "regex",
name_type = "pattern"
)
idata_mut_combo <- mutate(idata_test,
chain_upper = toupper(chain),
seq_options = seq_opts_regex
)
# Look for 'chain_upper' and regex match columns (e.g., 'sim_regex_ighv1')
print(idata_mut_combo)
} # }