dandelion.polars.tools.vdj_pseudobulk

dandelion.polars.tools.vdj_pseudobulk(adata, vdj=None, pbs=None, obs_to_bulk=None, obs_to_take=None, normalise=True, renormalise=False, min_count=1, mode='abT', extract_cols=['v_call_abT_VDJ_main', 'j_call_abT_VDJ_main', 'v_call_abT_VJ_main', 'j_call_abT_VJ_main'])[source]

Function for making pseudobulk vdj feature space. One of pbs or obs_to_bulk needs to be specified when calling.

Parameters:
  • adata (AnnData) – Cell adata, preferably after ddl.tl.setup_vdj_pseudobulk()

  • vdj (DandelionPolars | None, optional) – Dandelion object containing VDJ data. Only needed if columns are not already in adata.obs

  • pbs (np.ndarray | sp.sparse.csr_matrix | None, optional) – Optional binary matrix with cells as rows and pseudobulk groups as columns

  • obs_to_bulk (list[str] | str | None, optional) – Optional obs column(s) to group pseudobulks into; if multiple are provided, they will be combined

  • obs_to_take (list[str] | str | None, optional) – Optional obs column(s) to identify the most common value of for each pseudobulk.

  • normalise (bool, optional) – If True, will scale the counts of each V(D)J gene group to 1 for each pseudobulk.

  • renormalise (bool, optional) – If True, will re-scale the counts of each V(D)J gene group to 1 for each pseudobulk with any “missing” calls removed. Relevant with normalise as True, if setup_vdj_pseudobulk() was ran with remove_missing set to False.

  • min_count (int, optional) – Pseudobulks with fewer than these many non-“missing” calls in a V(D)J gene group will have their non-“missing” calls set to 0 for that group. Relevant with normalise as True.

  • mode (Literal[“B”, “abT”, “gdT”] | None, optional) – Optional mode for extracting the V(D)J genes. If set as None, it will use e.g. v_call_VDJ instead of v_call_abT_VDJ. If extract_cols is provided, then this argument is ignored.

  • extract_cols (list[str] | None, optional) – Column names where VDJ/VJ information is stored so that this will be used instead of the standard columns.

Returns:

pb_adata, whereby each observation is a pseudobulk:

VDJ usage frequency/counts stored in pb_adata.X

VDJ genes stored in pb_adata.var

pseudobulk metadata stored in pb_adata.obs

pseudobulk assignment (binary matrix with input cells as columns) stored in pb_adata.obsm[‘pbs’]

Return type:

AnnData

Raises:
  • ValueError – if neither pbs nor obs_to_bulk is specified, or if both are specified.

  • ValueError – if required VDJ columns are not in adata.obs and vdj is not provided.