dandelion.base.tools.clone_rarefaction
- dandelion.base.tools.clone_rarefaction(data, group_by, clone_key=None, palette=None, figsize=(5, 3), chain_status_include=['Single pair', 'Orphan VDJ', 'Orphan VDJ-exception', 'Orphan VJ', 'Orphan VJ-exception', 'Extra pair', 'Extra pair-exception'], plot=False, plateau_fraction=0.95, step=1)[source]
Compute sample-based rarefaction curves with asymptotic extrapolation and optional plotting.
This function calculates rarefaction curves per group, fits an asymptotic model (Michaelis–Menten) to estimate the expected plateau of clone richness, and extrapolates each curve until the predicted value reaches a specified fraction of the asymptote. It supports both tabular output and ggplot-style visualization.
- Parameters:
data (AnnData or Dandelion) – Object containing V(D)J metadata. Clone IDs must be stored in .obs (AnnData) or .metadata (Dandelion).
group_by (str) – Column in metadata specifying the grouping variable (e.g., sample, donor, condition).
clone_key (str, optional) – Column containing clone identifiers. Defaults to “clone_id” if not provided.
palette (list of str, optional) – List of colors to use for plotting. If None, the function tries to use data.uns[f”{group_by}_colors”] when available.
figsize (tuple of float, optional) – Width and height of the plot (in inches). Defaults to (5, 3).
chain_status_include (list of str, optional) – List of chain-status categories to retain. All other chain-status entries are excluded. Defaults to a set of productive/orphan chain categories commonly used in V(D)J QC.
plot (bool, optional) – If True, returns a ggplot object. If False, returns a tidy DataFrame with observed and extrapolated rarefaction values.
plateau_fraction (float, optional) – Fraction of the estimated asymptote at which extrapolation stops. For example, 0.95 stops when the curve reaches 95% of the fitted asymptotic clone richness.
step (int, optional) – Increment for generating extrapolated sampling depths. Smaller values produce smoother curves but increase computation time.
- Returns:
- If plot=False:
- A tidy DataFrame with the following columns:
cells: number of sampled cells
yhat: predicted clone richness
group: group label
type: “observed” or “extrapolated”
plateau: plateau threshold for that group
- If plot=True:
A ggplot object showing observed and extrapolated rarefaction curves with solid and dashed line types, respectively.
- Return type:
DataFrame|ggplot
Notes
Rarefaction for observed values is computed using rarefun adapted from the vegan R package.
Asymptotic extrapolation is performed using a Michaelis–Menten saturation curve:
y = a * x / (b + x)If the nonlinear fit fails, the function falls back to extending the observed rarefaction curve without asymptotic modeling.
The function automatically filters out unused categories in the clone column and removes “No_contig” clones if present.