Groups the annotation table of an ImmunData
object by user-specified
columns to define distinct repertoires (e.g., based on sample, donor,
time point). It then calculates summary statistics both per-repertoire and
per-receptor within each repertoire.
Calculated per repertoire:
n_barcodes
: Total number of unique cells/barcodes within the repertoire
(sum of imd_chain_count
, effectively summing unique cells if input was SC,
or total counts if input was bulk).
n_receptors
: Number of unique receptors (imd_receptor_id
) found within
the repertoire.
Calculated per annotation row (receptor within repertoire context):
imd_count
: Total count of a specific receptor (imd_receptor_id
) within
the specific repertoire it belongs to in that row (sum of relevant
imd_chain_count
).
imd_proportion
: The proportion of the repertoire's total n_barcodes
accounted for by that specific receptor (imd_count / n_barcodes
).
n_repertoires
: The total number of distinct repertoires (across the entire
dataset) in which this specific receptor (imd_receptor_id
) appears.
These statistics are added to the annotation table, and a summary table is
stored in the $repertoires
slot of the returned object.
agg_repertoires(idata, schema = "repertoire_id")
An ImmunData
object, typically the output of read_repertoires()
or read_immundata()
. Must contain the $annotations
table with columns
specified in schema
and internal columns like imd_receptor_id
and
imd_chain_count
.
Character vector. Column name(s) in idata$annotations
that
define a unique repertoire. For example, c("SampleID")
or
c("DonorID", "TimePoint")
. Columns must exist in idata$annotations
.
Default: "repertoire_id"
(assumes such a column exists).
A new ImmunData
object. Its $annotations
table includes the
added columns (imd_repertoire_id
, imd_count
, imd_proportion
, n_repertoires
).
Its $repertoires
slot contains the summary table linking schema
columns
to imd_repertoire_id
, n_barcodes
, and n_receptors
.
The function operates on the idata$annotations
table:
Validation: Checks idata
and existence of schema
columns. Removes
any pre-existing repertoire summary columns to prevent duplication.
Repertoire Definition: Groups annotations by the schema
columns.
Calculates total counts (n_barcodes
) per group. Assigns a unique integer
imd_repertoire_id
to each distinct repertoire group. This forms the
initial repertoires_table
.
Receptor Counts & Proportion: Calculates the sum of imd_chain_count
for each receptor within each repertoire (imd_count
). Calculates the
proportion (imd_proportion
) of each receptor within its repertoire.
Repertoire & Receptor Stats: Counts unique receptors per repertoire
(n_receptors
, added to repertoires_table
). Counts the number of
distinct repertoires each unique receptor appears in (n_repertoires
).
Join Results: Joins the calculated imd_count
, imd_proportion
, and
n_repertoires
back to the annotation table based on repertoire columns
and imd_receptor_id
.
Return New Object: Creates and returns a new ImmunData
object
containing the updated $annotations
table (with the added statistics)
and the $repertoires
slot populated with the repertoires_table
(containing schema
columns, imd_repertoire_id
, n_barcodes
, n_receptors
).
The original idata
object remains unmodified. Internal column names are
typically managed by immundata:::imd_schema()
.
read_repertoires()
(which can call this function), ImmunData class.
if (FALSE) { # \dontrun{
# Assume 'idata_raw' is an ImmunData object loaded via read_repertoires
# but *without* providing 'repertoire_schema' initially.
# It has $annotations but $repertoires is likely NULL or empty.
# Assume idata_raw$annotations has columns "SampleID" and "TimePoint".
# Define repertoires based on SampleID and TimePoint
idata_aggregated <- agg_repertoires(idata_raw, schema = c("SampleID", "TimePoint"))
# Explore the results
print(idata_aggregated)
print(idata_aggregated$repertoires)
print(head(idata_aggregated$annotations)) # Note the new columns
} # }