dandelion.external.immcantation.base.scoper.spectral_clones
- dandelion.external.immcantation.base.scoper.spectral_clones(vdj, method='novj', germline='germline_alignment', sequence='sequence_alignment', junction='junction', v_call='v_call', j_call='j_call', clone_id='clone_id', fields=None, cell_id='cell_id', locus='locus', only_heavy=True, split_light=True, first=False, cdr3=False, mod3=False, max_n=0, threshold=None, base_sim=0.95, iter_max=1000, nstart=1000, nproc=1, verbose=False, summarize_clones=True, remove_ambiguous=True, remove_extra=True)[source]
Spectral clustering method for clonal partitioning.
https://scoper.readthedocs.io/en/stable/topics/spectralClones/
This is a wrapper for one of scoper’s method to perform clone clustering. spectralClones provides an unsupervised spectral clustering approach to infer clonal relationships in high-throughput Adaptive Immune Receptor Repertoire sequencing (AIRR-seq) data. This approach clusters B or T cell receptor sequences based on junction region sequence similarity and shared mutations within partitions that share the same V gene, J gene, and junction length, allowing for ambiguous V or J gene annotations. This is not a full implementation as additional arguments such as targeting_model and len_limit requires access to additional objects that needs to be separately created through other packages e.g. shazam. As such, we will only implement the default argument were both will be set to None (or NULL in R). If you want to use this method in its full functionality, please run it separately through R with scoper’s tutorial.
see also https://scoper.readthedocs.io/en/stable/vignettes/Scoper-Vignette/
- Parameters:
vdj (Dandelion) – a dandelion object containing the airr data.
threshold (float) – numeric scalar where the tree should be cut (the distance threshold for clonal grouping).
method (Literal[“novj”, “vj”], optional) – one of the “novj” or “vj”. If method=”novj”, then clonal relationships are inferred using an adaptive threshold that indicates the level of similarity among junction sequences in a local neighborhood. If method=”vj”, then clonal relationships are inferred not only on junction region homology, but also taking into account the mutation profiles in the V and J segments. Mutation counts are determined by comparing the input sequences (in the column specified by sequence) to the effective germline sequence (IUPAC representation of sequences in the column specified by germline).
germline (str, optional) – character name of the column containing the germline or reference sequence.
sequence (str, optional) – character name of the column containing input sequences.
junction (str, optional) – character name of the column containing junction sequences. Also used to determine sequence length for grouping.
v_call (str, optional) – name of the column containing the V-segment allele calls.
j_call (str, optional) – name of the column containing the J-segment allele calls.
clone (str, optional) – output column name containing the clonal cluster identifiers.
fields (list[str], optional) – character vector of additional columns to use for grouping. Sequences with disjoint values in the specified fields will be classified as separate clones.
cell_id (str | None, optional) – name of the column containing cell identifiers or barcodes. If specified, grouping will be performed in single-cell mode with the behavior governed by the locus and only_heavy arguments. If set to None then the bulk sequencing data is assumed.
locus (str, optional) – name of the column containing locus information.
only_heavy (bool, optional) – use only the IGH (BCR) or TRB/TRD (TCR) sequences for grouping.
split_light (bool, optional) – split clones by light chains.
first (bool, optional) – specifies how to handle multiple V(D)J assignments for initial grouping. If True only the first call of the gene assignments is used. If False the union of ambiguous gene assignments is used to group all sequences with any overlapping gene calls.
cdr3 (bool, optional) – if True removes 3 nucleotides from both ends of “junction” prior to clustering (converts IMGT junction to CDR3 region). If True this will also remove records with a junction length less than 7 nucleotides.
mod3 (bool, optional) – if True removes records with a junction length that is not divisible by 3 in nucleotide space.
max_n (int | None, optional) – The maximum number of degenerate characters to permit in the junction sequence before excluding the record from clonal assignment. Default is set to be zero. Set it as “None” for no action.
threshold (float | None, optional) – the supervising cut-off to enforce an upper-limit distance for clonal grouping. A numeric value between (0,1).
base_sim (float, optional) – required similarity cut-off for sequences in equal distances from each other.
iter_max (int, optional) – the maximum number of iterations allowed for kmean clustering step.
nstart (int, optional) – the number of random sets chosen for kmean clustering initialization.
nproc (int, optional) – number of cores to distribute the function over.
verbose (bool, optional) – if True prints out a summary of each step cloning process. if False (default) process cloning silently.
summarize_clones (bool, optional) – if True performs a series of analysis to assess the clonal landscape and returns a ScoperClones object. If False then a modified input db is returned. When grouping by fields, summarize_clones should be False.
remove_ambiguous (bool, optional) – if True removes contigs with ambiguous V(D)J assignments flagged by check_contigs.
remove_extra (bool, optional) – if True removes extra contigs flagged by check_contigs.
- Return type:
None