dandelion.external.immcantation.polars.scoper.identical_clones
- dandelion.external.immcantation.polars.scoper.identical_clones(vdj, method='nt', junction='junction', v_call='v_call', j_call='j_call', clone_key='clone_id', fields=None, cell_id='cell_id', locus='locus', only_heavy=True, split_light=True, first=False, cdr3=False, mod3=False, max_n=0, nproc=1, verbose=False, summarize_clones=True, remove_ambiguous=True, remove_extra=True)[source]
Clonal assignment using sequence identity partitioning with Polars.
https://scoper.readthedocs.io/en/stable/topics/identicalClones/
This is a wrapper for one of scoper’s method to perform clone clustering using Polars internally for data manipulation.
- Parameters:
vdj (DandelionPolars) – a DandelionPolars object containing the airr data.
method (Literal[“nt”, “aa”], optional) – one of the “nt” for nucleotide based clustering or “aa” for amino acid based clustering.
junction (str, optional) – character name of the column containing junction sequences.
v_call (str, optional) – name of the column containing the V-segment allele calls.
j_call (str, optional) – name of the column containing the J-segment allele calls.
clone_key (str, optional) – output column name containing the clonal cluster identifiers.
fields (list[str], optional) – character vector of additional columns to use for grouping.
cell_id (str | None, optional) – name of the column containing cell identifiers or barcodes.
locus (str, optional) – name of the column containing locus information.
only_heavy (bool, optional) – use only the IGH (BCR) or TRB/TRD (TCR) sequences for grouping.
split_light (bool, optional) – split clones by light chains.
first (bool, optional) – specifies how to handle multiple V(D)J assignments for initial grouping.
cdr3 (bool, optional) – if True removes 3 nucleotides from both ends of “junction” prior to clustering.
mod3 (bool, optional) – if True removes records with a junction length that is not divisible by 3.
max_n (int | None, optional) – The maximum number of degenerate characters to permit in the junction sequence.
nproc (int, optional) – number of cores to distribute the function over.
verbose (bool, optional) – if True prints out a summary of each step cloning process.
summarize_clones (bool, optional) – if True performs a series of analysis to assess the clonal landscape.
remove_ambiguous (bool, optional) – if True removes contigs with ambiguous V(D)J assignments.
remove_extra (bool, optional) – if True removes extra contigs flagged by check_contigs.
- Returns:
DandelionPolars object with .clone_id column populated.
- Return type:
DandelionPolars