dandelion.external.immcantation.polars.scoper.spectral_clones

dandelion.external.immcantation.polars.scoper.spectral_clones(vdj, method='novj', germline='germline_alignment', sequence='sequence_alignment', junction='junction', v_call='v_call', j_call='j_call', clone_id='clone_id', fields=None, cell_id='cell_id', locus='locus', only_heavy=True, split_light=True, first=False, cdr3=False, mod3=False, max_n=0, threshold=None, base_sim=0.95, iter_max=1000, nstart=1000, nproc=1, verbose=False, summarize_clones=True, remove_ambiguous=True, remove_extra=True)[source]

Spectral clustering method for clonal partitioning with Polars.

https://scoper.readthedocs.io/en/stable/topics/spectralClones/

This is a wrapper for one of scoper’s method to perform clone clustering using Polars internally for data manipulation.

Parameters:
  • vdj (DandelionPolars) – a DandelionPolars object containing the airr data.

  • method (Literal[“novj”, “vj”], optional) – one of the “novj” or “vj”.

  • germline (str, optional) – character name of the column containing the germline or reference sequence.

  • sequence (str, optional) – character name of the column containing input sequences.

  • junction (str, optional) – character name of the column containing junction sequences.

  • v_call (str, optional) – name of the column containing the V-segment allele calls.

  • j_call (str, optional) – name of the column containing the J-segment allele calls.

  • clone_id (str, optional) – output column name containing the clonal cluster identifiers.

  • fields (list[str], optional) – character vector of additional columns to use for grouping.

  • cell_id (str | None, optional) – name of the column containing cell identifiers or barcodes.

  • locus (str, optional) – name of the column containing locus information.

  • only_heavy (bool, optional) – use only the IGH (BCR) or TRB/TRD (TCR) sequences for grouping.

  • split_light (bool, optional) – split clones by light chains.

  • first (bool, optional) – specifies how to handle multiple V(D)J assignments for initial grouping.

  • cdr3 (bool, optional) – if True removes 3 nucleotides from both ends of “junction” prior to clustering.

  • mod3 (bool, optional) – if True removes records with a junction length that is not divisible by 3.

  • max_n (int | None, optional) – The maximum number of degenerate characters to permit in the junction sequence.

  • threshold (float | None, optional) – the supervising cut-off to enforce an upper-limit distance for clonal grouping.

  • base_sim (float, optional) – required similarity cut-off for sequences in equal distances from each other.

  • iter_max (int, optional) – the maximum number of iterations allowed for kmean clustering step.

  • nstart (int, optional) – the number of random sets chosen for kmean clustering initialization.

  • nproc (int, optional) – number of cores to distribute the function over.

  • verbose (bool, optional) – if True prints out a summary of each step cloning process.

  • summarize_clones (bool, optional) – if True performs a series of analysis to assess the clonal landscape.

  • remove_ambiguous (bool, optional) – if True removes contigs with ambiguous V(D)J assignments.

  • remove_extra (bool, optional) – if True removes extra contigs flagged by check_contigs.

Returns:

DandelionPolars object with .clone_id column populated.

Return type:

DandelionPolars