dandelion.external.immcantation.base.scoper.identical_clones
- dandelion.external.immcantation.base.scoper.identical_clones(vdj, method='nt', junction='junction', v_call='v_call', j_call='j_call', clone_key='clone_id', fields=None, cell_id='cell_id', locus='locus', only_heavy=True, split_light=True, first=False, cdr3=False, mod3=False, max_n=0, nproc=1, verbose=False, summarize_clones=True, remove_ambiguous=True, remove_extra=True)[source]
Clonal assignment using sequence identity partitioning.
https://scoper.readthedocs.io/en/stable/topics/identicalClones/
This is a wrapper for one of scoper’s method to perform clone clustering. From the original description: identicalClones provides a simple sequence identity based partitioning approach for inferring clonal relationships in high-throughput Adaptive Immune Receptor Repertoire sequencing (AIRR-seq) data. This approach partitions B or T cell receptor sequences into clonal groups based on junction region sequence identity within partitions that share the same V gene, J gene, and junction length, allowing for ambiguous V or J gene annotations.
see also https://scoper.readthedocs.io/en/stable/vignettes/Scoper-Vignette/
- Parameters:
vdj (Dandelion) – a dandelion object containing the airr data.
method (Literal[“nt”, “aa”], optional) – one of the “nt” for nucleotide based clustering or “aa” for amino acid based clustering.
junction (str, optional) – character name of the column containing junction sequences. Also used to determine sequence length for grouping.
v_call (str, optional) – name of the column containing the V-segment allele calls.
j_call (str, optional) – name of the column containing the J-segment allele calls.
clone_key (str, optional) – output column name containing the clonal cluster identifiers.
fields (list[str], optional) – character vector of additional columns to use for grouping. Sequences with disjoint values in the specified fields will be classified as separate clones.
cell_id (str | None, optional) – name of the column containing cell identifiers or barcodes. If specified, grouping will be performed in single-cell mode with the behavior governed by the locus and only_heavy arguments. If set to None then the bulk sequencing data is assumed.
locus (str, optional) – name of the column containing locus information.
only_heavy (bool, optional) – use only the IGH (BCR) or TRB/TRD (TCR) sequences for grouping.
split_light (bool, optional) – split clones by light chains.
first (bool, optional) – specifies how to handle multiple V(D)J assignments for initial grouping. If True only the first call of the gene assignments is used. If False the union of ambiguous gene assignments is used to group all sequences with any overlapping gene calls.
cdr3 (bool, optional) – if True removes 3 nucleotides from both ends of “junction” prior to clustering (converts IMGT junction to CDR3 region). If True this will also remove records with a junction length less than 7 nucleotides.
mod3 (bool, optional) – if True removes records with a junction length that is not divisible by 3 in nucleotide space.
max_n (int | None, optional) – The maximum number of degenerate characters to permit in the junction sequence before excluding the record from clonal assignment. Default is set to be zero. Set it as “None” for no action.
nproc (int, optional) – number of cores to distribute the function over.
verbose (bool, optional) – if True prints out a summary of each step cloning process. if False (default) process cloning silently.
summarize_clones (bool, optional) – if True performs a series of analysis to assess the clonal landscape and returns a ScoperClones object. If False then a modified input db is returned. When grouping by fields, summarize_clones should be False.
remove_ambiguous (bool, optional) – if True removes contigs with ambiguous V(D)J assignments flagged by check_contigs.
remove_extra (bool, optional) – if True removes extra contigs flagged by check_contigs.
- Return type:
None