dandelion.polars.tools.find_clones

dandelion.polars.tools.find_clones(vdj, identity=0.85, hard_cutoff=None, key=None, dist_func='hamming', same_vj=True, same_length=True, by_alleles=False, key_added=None, recalculate_length=True, store_distances=True, verbose=True)[source]

Find clones based on VDJ chain and VJ chain CDR3 junction hamming distance.

Parameters:
  • vdj (DandelionPolars) – Dandelion object.

  • identity (dict[str, float] | float, optional) – Similarity parameter. Default 0.85. Distance cutoff is calculated as threshold = floor(length * (1 - identity)). If dist_func is ‘identity’, threshold is set to 0. If dist_func is ‘levenshtein’ or a substitution matrix, the threshold is calculated based on normalized length internally. If a single float value is provided, this will be used for all loci. If provided as a dictionary, please use the following keys:’ig’, ‘tr-ab’, ‘tr-gd’.

  • hard_cutoff (int | float | None, optional) – Absolute distance cutoff. If supplied, identity is ignored. Only for use with specific distance functions such as levenshtein and substitution matrices. Default is None.

  • key (dict[str, str] | str | None, optional) –

    column name for performing clone clustering. None defaults to a dictionary where:

    {‘ig’: ‘junction_aa’, ‘tr-ab’: ‘junction’, ‘tr-gd’: ‘junction’}

    If provided as a string, this key will be used for all loci.

  • dist_func (Literal[“hamming”, “levenshtein”, “identity”] | Callable | str, optional) – Distance function to use. Can be ‘hamming’, ‘levenshtein’, ‘identity’, substitution matrix name, or a custom lambda function. None defaults to ‘hamming’.

  • same_vj (bool, optional) – whether or not to require same V and J gene assignments to be in the same clone. Default is True.

  • same_length (bool, optional) – whether or not to require same junction length to be in the same clone. Default is True.

  • by_alleles (bool, optional) – whether or not to collapse alleles to genes. None defaults to False.

  • key_added (str | None, optional) – If specified, this will be the column name for clones. None defaults to ‘clone_id’

  • recalculate_length (bool, optional) – whether or not to re-calculate junction length, rather than rely on parsed assignment (which occasionally is wrong). Default is True

  • store_distances (bool, optional) – whether or not to store the distance matrix as a sparse matrix in vdj.distances. Default is True.

  • verbose (bool, optional) – whether or not to print progress.

Returns:

Dandelion object with clone_id annotated in .data slot and .metadata initialized.

Return type:

DandelionPolars

Raises:

ValueError – if key not found in Dandelion.data.