dandelion.polars.tools.define_clones
- dandelion.polars.tools.define_clones(vdj, dist, action='set', model='ham', norm='len', doublets='drop', fileformat='airr', n_cpus=None, outFilePrefix=None, key_added=None, out_dir=None, additional_args=[])[source]
Find clones using changeo’s DefineClones.py.
Only callable for BCR data at the moment.
- Parameters:
vdj (DandelionPolars) – DandelionPolars object.
dist (float) – The distance threshold for clonal grouping.
action (Literal[“first”, “set”], optional) – Specifies how to handle multiple V(D)J assignments for initial grouping. Default is ‘set’. The “first” action will use only the first gene listed. The “set” action will use all gene assignments and construct a larger gene grouping composed of any sequences sharing an assignment or linked to another sequence by a common assignment (similar to single-linkage).
model (Literal[“ham”, “aa”, “hh_s1f”, “hh_s5f”, “mk_rs1nf”, “mk_rs5nf”, “hs1f_compat”, “m1n_compat”, ], optional) – Specifies which substitution model to use for calculating distance between sequences. Default is ‘ham’. The “ham” model is nucleotide Hamming distance and “aa” is amino acid Hamming distance. The “hh_s1f” and “hh_s5f” models are human specific single nucleotide and 5-mer content models, respectively, from Yaari et al, 2013. The “mk_rs1nf” and “mk_rs5nf” models are mouse specific single nucleotide and 5-mer content models, respectively, from Cui et al, 2016. The “m1n_compat” and “hs1f_compat” models are deprecated models provided backwards compatibility with the “m1n” and “hs1f” models in Change-O v0.3.3 and SHazaM v0.1.4. Both 5-mer models should be considered experimental.
norm (Literal[“len”, “mut”, “none”], optional) – Specifies how to normalize distances. Default is ‘len’. ‘none’ (do not normalize), ‘len’ (normalize by length), or ‘mut’ (normalize by number of mutations between sequences).
doublets (Literal[“drop”, “count”], optional) – Option to control behaviour when dealing with heavy chain ‘doublets’. Default is ‘drop’. ‘drop’ will filter out the doublets while ‘count’ will retain only the highest umi count contig.
fileformat (Literal[“changeo”, “airr”], optional) – Format of V(D)J file/objects. Default is ‘airr’. Also accepts ‘changeo’.
n_cpus (int | None, optional) – Number of cpus for parallelization. Default is 1, no parallelization.
outFilePrefix (str | None, optional) – If specified, the out file name will have this prefix. None defaults to ‘dandelion_define_clones’
key_added (str | None, optional) – Column name to add for define_clones.
out_dir (Path | str | None, optional) – If specified, the files will be written to this directory.
additional_args (list[str], optional) – Additional arguments to pass to DefineClones.py.
- Returns:
DandelionPolars object with clone_id annotated in .data slot and .metadata initialized.
- Return type:
DandelionPolars