dandelion.base.tools.find_clones
- dandelion.base.tools.find_clones(vdj, identity=0.85, key=None, by_alleles=False, key_added=None, recalculate_length=True, verbose=True, **kwargs)[source]
Find clones based on VDJ chain and VJ chain CDR3 junction hamming distance.
- Parameters:
vdj (Dandelion | pd.DataFrame) – Dandelion object, pandas DataFrame in changeo/airr format, or file path to changeo/airr file after clones have been determined.
identity (dict[str, float] | float, optional) – junction similarity parameter. Default 0.85. If provided as a dictionary, please use the following keys:’ig’, ‘tr-ab’, ‘tr-gd’.
key (dict[str, str] | str | None, optional) –
- column name for performing clone clustering. None defaults to a dictionary where:
{‘ig’: ‘junction_aa’, ‘tr-ab’: ‘junction’, ‘tr-gd’: ‘junction’}
If provided as a string, this key will be used for all loci.
by_alleles (bool, optional) – whether or not to collapse alleles to genes. None defaults to False.
key_added (str | None, optional) – If specified, this will be the column name for clones. None defaults to ‘clone_id’
recalculate_length (bool, optional) – whether or not to re-calculate junction length, rather than rely on parsed assignment (which occasionally is wrong). Default is True
verbose (bool, optional) – whether or not to print progress.
**kwargs – Additional arguments to pass to Dandelion.update_metadata.
- Returns:
Dandelion object with clone_id annotated in .data slot and .metadata initialized.
- Return type:
Dandelion- Raises:
ValueError – if key not found in Dandelion.data.