dandelion.external.immcantation.polars.scoper.hierarchical_clones

dandelion.external.immcantation.polars.scoper.hierarchical_clones(vdj, threshold, method='nt', linkage='single', normalize='len', junction='junction', v_call='v_call', j_call='j_call', clone_id='clone_id', fields=None, cell_id='cell_id', locus='locus', only_heavy=True, split_light=True, first=False, cdr3=False, mod3=False, max_n=0, nproc=1, verbose=False, summarize_clones=True, remove_ambiguous=True, remove_extra=True)[source]

Hierarchical clustering approach to clonal assignment with Polars.

https://scoper.readthedocs.io/en/stable/topics/hierarchicalClones/

This is a wrapper for one of scoper’s method to perform clone clustering using Polars internally for data manipulation.

Parameters:
  • vdj (DandelionPolars) – a DandelionPolars object containing the airr data.

  • threshold (float) – numeric scalar where the tree should be cut (the distance threshold for clonal grouping).

  • method (Literal[“nt”, “aa”], optional) – one of the “nt” for nucleotide based clustering or “aa” for amino acid based clustering.

  • linkage (Literal[“single”, “average”, “complete”], optional) – one of the “single”, “average” or “complete” for the hierarchical clustering method.

  • normalize (Literal[“len”, “none”], optional) – method of normalization.

  • junction (str, optional) – character name of the column containing junction sequences.

  • v_call (str, optional) – name of the column containing the V-segment allele calls.

  • j_call (str, optional) – name of the column containing the J-segment allele calls.

  • clone_id (str, optional) – output column name containing the clonal cluster identifiers.

  • fields (list[str], optional) – character vector of additional columns to use for grouping.

  • cell_id (str | None, optional) – name of the column containing cell identifiers or barcodes.

  • locus (str, optional) – name of the column containing locus information.

  • only_heavy (bool, optional) – use only the IGH (BCR) or TRB/TRD (TCR) sequences for grouping.

  • split_light (bool, optional) – split clones by light chains.

  • first (bool, optional) – specifies how to handle multiple V(D)J assignments for initial grouping.

  • cdr3 (bool, optional) – if True removes 3 nucleotides from both ends of “junction” prior to clustering.

  • mod3 (bool, optional) – if True removes records with a junction length that is not divisible by 3.

  • max_n (int | None, optional) – The maximum number of degenerate characters to permit in the junction sequence.

  • nproc (int, optional) – number of cores to distribute the function over.

  • verbose (bool, optional) – if True prints out a summary of each step cloning process.

  • summarize_clones (bool, optional) – if True performs a series of analysis to assess the clonal landscape.

  • remove_ambiguous (bool, optional) – if True removes contigs with ambiguous V(D)J assignments.

  • remove_extra (bool, optional) – if True removes extra contigs flagged by check_contigs.

Returns:

DandelionPolars object with .clone_id column populated.

Return type:

DandelionPolars