dandelion.base.tools.find_clones

dandelion.base.tools.find_clones(vdj, identity=0.85, key=None, by_alleles=False, key_added=None, recalculate_length=True, verbose=True, **kwargs)[source]

Find clones based on VDJ chain and VJ chain CDR3 junction hamming distance.

Parameters:
  • vdj (Dandelion | pd.DataFrame) – Dandelion object, pandas DataFrame in changeo/airr format, or file path to changeo/airr file after clones have been determined.

  • identity (dict[str, float] | float, optional) – junction similarity parameter. Default 0.85. If provided as a dictionary, please use the following keys:’ig’, ‘tr-ab’, ‘tr-gd’.

  • key (dict[str, str] | str | None, optional) –

    column name for performing clone clustering. None defaults to a dictionary where:

    {‘ig’: ‘junction_aa’, ‘tr-ab’: ‘junction’, ‘tr-gd’: ‘junction’}

    If provided as a string, this key will be used for all loci.

  • by_alleles (bool, optional) – whether or not to collapse alleles to genes. None defaults to False.

  • key_added (str | None, optional) – If specified, this will be the column name for clones. None defaults to ‘clone_id’

  • recalculate_length (bool, optional) – whether or not to re-calculate junction length, rather than rely on parsed assignment (which occasionally is wrong). Default is True

  • verbose (bool, optional) – whether or not to print progress.

  • **kwargs – Additional arguments to pass to Dandelion.update_metadata.

Returns:

Dandelion object with clone_id annotated in .data slot and .metadata initialized.

Return type:

Dandelion

Raises:

ValueError – if key not found in Dandelion.data.