dandelion.external.immcantation.polars.scoper.spectral_clones
- dandelion.external.immcantation.polars.scoper.spectral_clones(vdj, method='novj', germline='germline_alignment', sequence='sequence_alignment', junction='junction', v_call='v_call', j_call='j_call', clone_id='clone_id', fields=None, cell_id='cell_id', locus='locus', only_heavy=True, split_light=True, first=False, cdr3=False, mod3=False, max_n=0, threshold=None, base_sim=0.95, iter_max=1000, nstart=1000, nproc=1, verbose=False, summarize_clones=True, remove_ambiguous=True, remove_extra=True)[source]
Spectral clustering method for clonal partitioning with Polars.
https://scoper.readthedocs.io/en/stable/topics/spectralClones/
This is a wrapper for one of scoper’s method to perform clone clustering using Polars internally for data manipulation.
- Parameters:
vdj (DandelionPolars) – a DandelionPolars object containing the airr data.
method (Literal[“novj”, “vj”], optional) – one of the “novj” or “vj”.
germline (str, optional) – character name of the column containing the germline or reference sequence.
sequence (str, optional) – character name of the column containing input sequences.
junction (str, optional) – character name of the column containing junction sequences.
v_call (str, optional) – name of the column containing the V-segment allele calls.
j_call (str, optional) – name of the column containing the J-segment allele calls.
clone_id (str, optional) – output column name containing the clonal cluster identifiers.
fields (list[str], optional) – character vector of additional columns to use for grouping.
cell_id (str | None, optional) – name of the column containing cell identifiers or barcodes.
locus (str, optional) – name of the column containing locus information.
only_heavy (bool, optional) – use only the IGH (BCR) or TRB/TRD (TCR) sequences for grouping.
split_light (bool, optional) – split clones by light chains.
first (bool, optional) – specifies how to handle multiple V(D)J assignments for initial grouping.
cdr3 (bool, optional) – if True removes 3 nucleotides from both ends of “junction” prior to clustering.
mod3 (bool, optional) – if True removes records with a junction length that is not divisible by 3.
max_n (int | None, optional) – The maximum number of degenerate characters to permit in the junction sequence.
threshold (float | None, optional) – the supervising cut-off to enforce an upper-limit distance for clonal grouping.
base_sim (float, optional) – required similarity cut-off for sequences in equal distances from each other.
iter_max (int, optional) – the maximum number of iterations allowed for kmean clustering step.
nstart (int, optional) – the number of random sets chosen for kmean clustering initialization.
nproc (int, optional) – number of cores to distribute the function over.
verbose (bool, optional) – if True prints out a summary of each step cloning process.
summarize_clones (bool, optional) – if True performs a series of analysis to assess the clonal landscape.
remove_ambiguous (bool, optional) – if True removes contigs with ambiguous V(D)J assignments.
remove_extra (bool, optional) – if True removes extra contigs flagged by check_contigs.
- Returns:
DandelionPolars object with .clone_id column populated.
- Return type:
DandelionPolars