Groups the annotation table of an ImmunData object by user-specified
columns to define distinct repertoires (e.g., based on sample, donor,
time point). It then calculates summary statistics both per-repertoire and
per-receptor within each repertoire.
Calculated per repertoire:
n_barcodes: Total number of unique cells/barcodes within the repertoire (sum ofimd_chain_count, effectively summing unique cells if input was SC, or total counts if input was bulk).n_receptors: Number of unique receptors (imd_receptor_id) found within the repertoire.
Calculated per annotation row (receptor within repertoire context):
imd_count: Total count of a specific receptor (imd_receptor_id) within the specific repertoire it belongs to in that row (sum of relevantimd_chain_count).imd_proportion: The proportion of the repertoire's totaln_barcodesaccounted for by that specific receptor (imd_count / n_barcodes).n_repertoires: The total number of distinct repertoires (across the entire dataset) in which this specific receptor (imd_receptor_id) appears.
These statistics are added to the annotation table, and a summary table is
stored in the $repertoires slot of the returned object.
Arguments
- idata
An
ImmunDataobject, typically the output ofread_repertoires()orread_immundata(). Must contain the$annotationstable with columns specified inschemaand internal columns likeimd_receptor_idandimd_chain_count.- schema
Character vector. Column name(s) in
idata$annotationsthat define a unique repertoire. For example,c("SampleID")orc("DonorID", "TimePoint"). Columns must exist inidata$annotations. Default:"repertoire_id"(assumes such a column exists).
Value
A new ImmunData object. Its $annotations table includes the
added columns (imd_repertoire_id, imd_count, imd_proportion, n_repertoires).
Its $repertoires slot contains the summary table linking schema columns
to imd_repertoire_id, n_barcodes, and n_receptors.
Details
The function operates on the idata$annotations table:
Validation: Checks
idataand existence ofschemacolumns. Removes any pre-existing repertoire summary columns to prevent duplication.Repertoire Definition: Groups annotations by the
schemacolumns. Calculates total counts (n_barcodes) per group. Assigns a unique integerimd_repertoire_idto each distinct repertoire group. This forms the initialrepertoires_table.Receptor Counts & Proportion: Calculates the sum of
imd_chain_countfor each receptor within each repertoire (imd_count). Calculates the proportion (imd_proportion) of each receptor within its repertoire.Repertoire & Receptor Stats: Counts unique receptors per repertoire (
n_receptors, added torepertoires_table). Counts the number of distinct repertoires each unique receptor appears in (n_repertoires).Join Results: Joins the calculated
imd_count,imd_proportion, andn_repertoiresback to the annotation table based on repertoire columns andimd_receptor_id.Return New Object: Creates and returns a new
ImmunDataobject containing the updated$annotationstable (with the added statistics) and the$repertoiresslot populated with therepertoires_table(containingschemacolumns,imd_repertoire_id,n_barcodes,n_receptors).
The original idata object remains unmodified. Internal column names are
typically managed by immundata:::imd_schema().
See also
read_repertoires() (which can call this function), ImmunData class.
Examples
if (FALSE) { # \dontrun{
# Assume 'idata_raw' is an ImmunData object loaded via read_repertoires
# but *without* providing 'repertoire_schema' initially.
# It has $annotations but $repertoires is likely NULL or empty.
# Assume idata_raw$annotations has columns "SampleID" and "TimePoint".
# Define repertoires based on SampleID and TimePoint
idata_aggregated <- agg_repertoires(idata_raw, schema = c("SampleID", "TimePoint"))
# Explore the results
print(idata_aggregated)
print(idata_aggregated$repertoires)
print(head(idata_aggregated$annotations)) # Note the new columns
} # }