Convert immunarch::repLoad() output to ImmunData in R
Use this to migrate your immunarch v0.9 repertoires into an ImmunData dataset from immunarch 0.10/1.0.
How it works:
- Takes an
immunarch::repLoad()object (imm). - Writes one TSV per repertoire (adds a filename column) to
temp_folder. - Imports those TSVs with
read_repertoires()into ImmunData. - Saves Parquet files under
output_folder; returns an ImmunData object.
Key arguments:
imm: output ofimmunarch::repLoad().output_folder: where Parquet data will be stored (auto-created).schema: character vector defining unique receptor keys (defaultc("CDR3.aa", "V.name"); you can add"J.name").temp_folder: where intermediate TSVs are written (defaults to a temp dir).
library(immunarch)
# 1) Load your immunarch object (reads all repertoires + optional metadata)
immdata <- immunarch::repLoad("/path/to/your/files")
# 2) Convert to ImmunData (Parquet-backed), customizing receptor key if needed
idata <- from_immunarch(
imm = immdata,
schema = c("CDR3.aa", "V.name"),
output_folder = "/path/to/immundata_out"
)
idata
Optionally, you can rename the columns in your immdata object before passing to from_immunarch to align it with AIRR-C format:
rename_to_airr <- function(df) {
map <- c(
"CDR3.aa" = "cdr3_aa",
"CDR3.nt" = "cdr3_nt",
"V.name" = "v_call",
"D.name" = "d_call",
"J.name" = "j_call",
"Clones" = "umi_count",
"Read.count"= "duplicate_count",
"Barcode" = "cell_id",
"barcode" = "cell_id",
"Chain" = "locus",
"Gene" = "locus",
"Productive"= "productive"
)
present_old <- intersect(names(df), names(map))
if (!length(present_old)) return(df)
new_names <- unname(map[present_old])
keep <- !duplicated(new_names)
present_old <- present_old[keep]
new_names <- new_names[keep]
spec <- stats::setNames(rlang::syms(present_old), new_names)
dplyr::rename(df, !!!spec)
}
immdata$data <- lapply(immdata$data, rename_to_airr)