dandelion.polars.tools.clone_diversity
- dandelion.polars.tools.clone_diversity(data, group_by, method='gini', use_network=True, network_metric='clone_network', clone_key=None, min_size=None, n_boot=200, n_cpus=-1, normalize=True, expanded_only=False, use_contracted=False, verbose=False, **kwargs)[source]
Compute clonal diversity with bootstrapping.
- Parameters:
data (DandelionPolars | AnnData) – DandelionPolars or AnnData object.
group_by (str) – Column name to calculate the gini indices on, for e.g. sample id, patient etc.
method (Literal[“gini”, “chao1”, “shannon”], optional) – Method for diversity estimation. Either one of [‘gini’, ‘chao1’, ‘shannon’].
use_network (bool, optional) – Whether or not to use network-based Gini index calculation. Default is True.
network_metric (Literal[“clone_network”, “clone_degree”, “clone_centrality”], optional) – Metric to use for calculating Gini indices of clones if use_network is True. Accepts one of [‘clone_network’, ‘clone_degree’, ‘clone_centrality’].
clone_key (str | None, optional) – Column name specifying the clone_id column in metadata.
min_size (int | None, optional) – Minimum cell numbers to keep for diversity calculation. If None, defaults to size of smallest sample. Beware that this may lead to very small sample sizes and unreliable estimates if left as None.
n_boot (int, optional) – Number of times to perform resampling. Default is 200.
n_cpus (int, optional) – Number of CPUs to use for parallel processing. Default is -1 (use all available cores).
normalize (bool, optional) – Whether or not to return normalized Shannon Entropy according to https://math.stackexchange.com/a/945172. Default is True.
expanded_only (bool, optional) – Whether or not to calculate gini indices using expanded clones only. Default is False i.e. use all cells/clones.
use_contracted (bool, optional) – Whether or not to perform the gini calculation after contraction of clone network. Only applies to calculation of clone size gini index. Default is False. This is to try and preserve the single-cell properties of the network.
verbose (bool, optional) – whether to print progress.
**kwargs – Additional keyword arguments passed to ddl.tl.generate_network if using network-based gini.
- Returns:
pandas DataFrame holding summarised diversity estimation and the raw bootstrap results.
- Return type:
tuple[DataFrame,dict[list[float]]]