dandelion.polars.tools.vdj_pseudobulk
- dandelion.polars.tools.vdj_pseudobulk(adata, vdj=None, pbs=None, obs_to_bulk=None, obs_to_take=None, normalise=True, renormalise=False, min_count=1, mode='abT', extract_cols=['v_call_abT_VDJ_main', 'j_call_abT_VDJ_main', 'v_call_abT_VJ_main', 'j_call_abT_VJ_main'])[source]
Function for making pseudobulk vdj feature space. One of pbs or obs_to_bulk needs to be specified when calling.
- Parameters:
adata (AnnData) – Cell adata, preferably after ddl.tl.setup_vdj_pseudobulk()
vdj (DandelionPolars | None, optional) – Dandelion object containing VDJ data. Only needed if columns are not already in adata.obs
pbs (np.ndarray | sp.sparse.csr_matrix | None, optional) – Optional binary matrix with cells as rows and pseudobulk groups as columns
obs_to_bulk (list[str] | str | None, optional) – Optional obs column(s) to group pseudobulks into; if multiple are provided, they will be combined
obs_to_take (list[str] | str | None, optional) – Optional obs column(s) to identify the most common value of for each pseudobulk.
normalise (bool, optional) – If True, will scale the counts of each V(D)J gene group to 1 for each pseudobulk.
renormalise (bool, optional) – If True, will re-scale the counts of each V(D)J gene group to 1 for each pseudobulk with any “missing” calls removed. Relevant with normalise as True, if setup_vdj_pseudobulk() was ran with remove_missing set to False.
min_count (int, optional) – Pseudobulks with fewer than these many non-“missing” calls in a V(D)J gene group will have their non-“missing” calls set to 0 for that group. Relevant with normalise as True.
mode (Literal[“B”, “abT”, “gdT”] | None, optional) – Optional mode for extracting the V(D)J genes. If set as None, it will use e.g. v_call_VDJ instead of v_call_abT_VDJ. If extract_cols is provided, then this argument is ignored.
extract_cols (list[str] | None, optional) – Column names where VDJ/VJ information is stored so that this will be used instead of the standard columns.
- Returns:
pb_adata, whereby each observation is a pseudobulk:
VDJ usage frequency/counts stored in pb_adata.X
VDJ genes stored in pb_adata.var
pseudobulk metadata stored in pb_adata.obs
pseudobulk assignment (binary matrix with input cells as columns) stored in pb_adata.obsm[‘pbs’]
- Return type:
AnnData- Raises:
ValueError – if neither
pbsnorobs_to_bulkis specified, or if both are specified.ValueError – if required VDJ columns are not in
adata.obsandvdjis not provided.