Interoperability with scirpy

It is now possible to convert the file formats between dandelion>=0.1.1 and scirpy>=0.6.2 [Sturm2020] to enhance the collaboration between the analysis toolkits.

Import dandelion module

[1]:
import os

import dandelion as ddl

# new function to setup tutorial data
from dandelion.tutorial import setup_dandelion_tutorial_tcr

setup_dandelion_tutorial_tcr()

# change to tutorial directory
os.chdir("dandelion_tutorial/")

ddl.logging.print_versions()
Downloading filtered_feature_bc_matrix.h5 → dandelion_tutorial/sc5p_v2_hs_PBMC_10k_t/filtered_feature_bc_matrix.h5
Downloading airr_rearrangement.tsv → dandelion_tutorial/sc5p_v2_hs_PBMC_10k_t/airr_rearrangement.tsv
Downloading filtered_contig_annotations.csv → dandelion_tutorial/sc5p_v2_hs_PBMC_10k_t/filtered_contig_annotations.csv
Downloading filtered_contig.fasta → dandelion_tutorial/sc5p_v2_hs_PBMC_10k_t/filtered_contig.fasta
Downloading filtered_feature_bc_matrix.h5 → dandelion_tutorial/sc5p_v1p1_hs_melanoma_10k_t/filtered_feature_bc_matrix.h5
Downloading airr_rearrangement.tsv → dandelion_tutorial/sc5p_v1p1_hs_melanoma_10k_t/airr_rearrangement.tsv
Downloading filtered_contig_annotations.csv → dandelion_tutorial/sc5p_v1p1_hs_melanoma_10k_t/filtered_contig_annotations.csv
Downloading filtered_contig.fasta → dandelion_tutorial/sc5p_v1p1_hs_melanoma_10k_t/filtered_contig.fasta
dandelion==1.0.0a1.dev36 pandas==2.3.3 numpy==2.3.5 matplotlib==3.10.6 networkx==3.6.1 scipy==1.15.2
[2]:
import scirpy as ir
import scanpy as sc

ir.__version__
[2]:
'0.22.5.dev29+g61dba21b4'
[3]:
# read in the airr_rearrangement.tsv file
file_location = "sc5p_v2_hs_PBMC_10k_t/airr_rearrangement.tsv"

# read in gene expression data
adata = sc.read_10x_h5("sc5p_v2_hs_PBMC_10k_t/filtered_feature_bc_matrix.h5")
adata.var_names_make_unique()
adata
/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.12/site-packages/anndata/_core/anndata.py:1798: UserWarning: Variable names are not unique. To make them unique, call `.var_names_make_unique`.
/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.12/site-packages/anndata/_core/anndata.py:1798: UserWarning: Variable names are not unique. To make them unique, call `.var_names_make_unique`.
[3]:
AnnData object with n_obs × n_vars = 10553 × 36601
    var: 'gene_ids', 'feature_types', 'genome', 'pattern', 'read', 'sequence'
[4]:
vdj = ddl.read_10x_airr(file_location)
vdj
[4]:
Lazy Dandelion object with n_obs = 5351 and n_contigs = 10860
    data: cell_id, sequence_id, sequence, sequence_aa, productive, rev_comp, v_call, v_cigar, d_call, d_cigar, j_call, j_cigar, c_call, c_cigar, sequence_alignment, germline_alignment, junction, junction_aa, junction_length, junction_aa_length, v_sequence_start, v_sequence_end, d_sequence_start, d_sequence_end, j_sequence_start, j_sequence_end, c_sequence_start, c_sequence_end, consensus_count, umi_count, is_cell, locus, rearrangement_status
    metadata: cell_id, productive_VDJ, productive_VJ, v_call_VDJ, v_call_VJ, d_call_VDJ, j_call_VDJ, j_call_VJ, c_call_VDJ, c_call_VJ, junction_VDJ, junction_VJ, junction_aa_VDJ, junction_aa_VJ, locus_VDJ, locus_VJ, umi_count_VDJ, umi_count_VJ, productive_VDJ_main, productive_VJ_main, v_call_VDJ_main, v_call_VJ_main, d_call_VDJ_main, j_call_VDJ_main, j_call_VJ_main, c_call_VDJ_main, c_call_VJ_main, junction_VDJ_main, junction_VJ_main, junction_aa_VDJ_main, junction_aa_VJ_main, locus_VDJ_main, locus_VJ_main, umi_count_VDJ_main, umi_count_VJ_main, locus_status, chain_status, rearrangement_status_VDJ, rearrangement_status_VJ

The test file contains a blank clone_id column so we run find_clones to populate it first.

[5]:
ddl.tl.find_clones(vdj)
Using PyTorch backend with Apple Metal GPU
Finding clones based on abT cell VDJ chains using junction: 100%|██████████| 2099/2099 [00:01<00:00, 1059.86it/s]
Finding clones based on abT cell VJ chains using junction: 100%|██████████| 3588/3588 [00:02<00:00, 1787.70it/s]
Building distance matrix (batched): 100%|██████████| 57/57 [00:00<00:00, 4018.07it/s]
[5]:
Lazy Dandelion object with n_obs = 5351 and n_contigs = 10860
    data: cell_id, sequence_id, sequence, sequence_aa, productive, rev_comp, v_call, v_cigar, d_call, d_cigar, j_call, j_cigar, c_call, c_cigar, sequence_alignment, germline_alignment, junction, junction_aa, junction_length, junction_aa_length, v_sequence_start, v_sequence_end, d_sequence_start, d_sequence_end, j_sequence_start, j_sequence_end, c_sequence_start, c_sequence_end, consensus_count, umi_count, is_cell, locus, rearrangement_status, clone_id
    metadata: cell_id, clone_id, clone_id_rank, productive_VDJ, productive_VJ, v_call_VDJ, v_call_VJ, d_call_VDJ, j_call_VDJ, j_call_VJ, c_call_VDJ, c_call_VJ, junction_VDJ, junction_VJ, junction_aa_VDJ, junction_aa_VJ, locus_VDJ, locus_VJ, umi_count_VDJ, umi_count_VJ, productive_VDJ_main, productive_VJ_main, v_call_VDJ_main, v_call_VJ_main, d_call_VDJ_main, j_call_VDJ_main, j_call_VJ_main, c_call_VDJ_main, c_call_VJ_main, junction_VDJ_main, junction_VJ_main, junction_aa_VDJ_main, junction_aa_VJ_main, locus_VDJ_main, locus_VJ_main, umi_count_VDJ_main, umi_count_VJ_main, locus_status, chain_status, rearrangement_status_VDJ, rearrangement_status_VJ
    distances: distance matrix of shape (5351, 5351)
[6]:
# we also generate the network so that this can be passed to scirpy
ddl.tl.generate_network(vdj, key="sequence_alignment")
vdj
OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
[6]:
Lazy Dandelion object with n_obs = 5351 and n_contigs = 10860
    data: cell_id, sequence_id, sequence, sequence_aa, productive, rev_comp, v_call, v_cigar, d_call, d_cigar, j_call, j_cigar, c_call, c_cigar, sequence_alignment, germline_alignment, junction, junction_aa, junction_length, junction_aa_length, v_sequence_start, v_sequence_end, d_sequence_start, d_sequence_end, j_sequence_start, j_sequence_end, c_sequence_start, c_sequence_end, consensus_count, umi_count, is_cell, locus, rearrangement_status, clone_id
    metadata: cell_id, clone_id, clone_id_rank, productive_VDJ, productive_VJ, v_call_VDJ, v_call_VJ, d_call_VDJ, j_call_VDJ, j_call_VJ, c_call_VDJ, c_call_VJ, junction_VDJ, junction_VJ, junction_aa_VDJ, junction_aa_VJ, locus_VDJ, locus_VJ, umi_count_VDJ, umi_count_VJ, productive_VDJ_main, productive_VJ_main, v_call_VDJ_main, v_call_VJ_main, d_call_VDJ_main, j_call_VDJ_main, j_call_VJ_main, c_call_VDJ_main, c_call_VJ_main, junction_VDJ_main, junction_VJ_main, junction_aa_VDJ_main, junction_aa_VJ_main, locus_VDJ_main, locus_VJ_main, umi_count_VDJ_main, umi_count_VJ_main, locus_status, chain_status, rearrangement_status_VDJ, rearrangement_status_VJ
    layout: layout for 5351 vertices, layout for 195 vertices
    graph: networkx graph of 5351 vertices, networkx graph of 195 vertices
    distances: distance matrix of shape (5351, 5351)

ddl.to_scirpy : Converting dandelion to scirpy

The Dandelion object can be converted to an object compatible with scirpy using the ddl.to_scirpy function. This is even when there is no original gene expression data available.

[7]:
# conversion to default scirpy MuData object
irdata = ddl.tl.to_scirpy(vdj)
irdata
/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.12/site-packages/anndata/utils.py:362: ExperimentalFeatureWarning: Support for Awkward Arrays is currently experimental. Behavior may change in the future. Please report any issues you may encounter!
/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.12/site-packages/mudata/_core/mudata.py:1598: FutureWarning: From 0.4 .update() will not pull obs/var columns from individual modalities by default anymore. Set mudata.set_options(pull_on_update=False) to adopt the new behaviour, which will become the default. Use new pull_obs/pull_var and push_obs/push_var methods for more flexibility.
/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.12/site-packages/mudata/_core/mudata.py:1461: FutureWarning: From 0.4 .update() will not pull obs/var columns from individual modalities by default anymore. Set mudata.set_options(pull_on_update=False) to adopt the new behaviour, which will become the default. Use new pull_obs/pull_var and push_obs/push_var methods for more flexibility.
[7]:
MuData object with n_obs × n_vars = 5351 × 0
  1 modality
    airr:   5351 x 0
      obsm: 'airr'

Conversion to AnnData in scirpy format is also available

[8]:
converted_adata = ddl.tl.to_scirpy(vdj, to_mudata=False)
converted_adata
[8]:
AnnData object with n_obs × n_vars = 5351 × 0
    obsm: 'airr'

If you have gene expression data, the parameter gex_adata supports the gene expression data in AnnData format.

Please note that this will slice to the same cell_id that are present in the same in the AIRR data.

[9]:
irdata = ddl.tl.to_scirpy(vdj, to_mudata=False, gex_adata=adata)
irdata
[9]:
AnnData object with n_obs × n_vars = 5333 × 36601
    var: 'gene_ids', 'feature_types', 'genome', 'pattern', 'read', 'sequence'
    uns: 'clone_id'
    obsm: 'airr'
[10]:
mudata = ddl.tl.to_scirpy(vdj, to_mudata=True, gex_adata=adata)
mudata
/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.12/site-packages/mudata/_core/mudata.py:1598: FutureWarning: From 0.4 .update() will not pull obs/var columns from individual modalities by default anymore. Set mudata.set_options(pull_on_update=False) to adopt the new behaviour, which will become the default. Use new pull_obs/pull_var and push_obs/push_var methods for more flexibility.
/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.12/site-packages/mudata/_core/mudata.py:1461: FutureWarning: From 0.4 .update() will not pull obs/var columns from individual modalities by default anymore. Set mudata.set_options(pull_on_update=False) to adopt the new behaviour, which will become the default. Use new pull_obs/pull_var and push_obs/push_var methods for more flexibility.
[10]:
MuData object with n_obs × n_vars = 10553 × 36601
  2 modalities
    gex:    10553 x 36601
      var:  'gene_ids', 'feature_types', 'genome', 'pattern', 'read', 'sequence'
    airr:   5333 x 0
      uns:  'clone_id'
      obsm: 'airr'

Use scirpy’s get functions to retrieve the relevant airr info (https://scirpy.scverse.org/en/latest/generated/scirpy.get.airr.html)

[11]:
ir.get.airr(irdata, "clone_id")
WARNING: No chain indices found under adata.obsm['chain_indices']. Running scirpy.pp.index_chains with default parameters.
[11]:
VJ_1_clone_id VDJ_1_clone_id VJ_2_clone_id VDJ_2_clone_id
AAACCTGAGCGATAGC-1 abT_VDJ_412_5_1_VJ_675_2_1 abT_VDJ_412_5_1_VJ_675_2_1 None None
AAACCTGAGTCACGCC-1 abT_VDJ_393_5_1_VJ_1526_1_2 abT_VDJ_393_5_1_VJ_1526_1_2 None None
AAACCTGCACGTCAGC-1 abT_VDJ_393_2_4_VJ_44_2_1 abT_VDJ_393_2_4_VJ_44_2_1 None None
AAACCTGGTCAATACC-1 abT_VDJ_87_3_1_VJ_653_2_1 abT_VDJ_87_3_1_VJ_653_2_1 None None
AAACCTGGTTCGGCAC-1 abT_VDJ_173_4_1_VJ_1404_2_1|abT_VDJ_173_4_1_VJ... abT_VDJ_173_4_1_VJ_1404_2_1|abT_VDJ_173_4_1_VJ... abT_VDJ_173_4_1_VJ_1404_2_1|abT_VDJ_173_4_1_VJ... None
... ... ... ... ...
TTTGTCATCGCCGTGA-1 abT_VDJ_83_1_1_VJ_1366_1_1 abT_VDJ_83_1_1_VJ_1366_1_1 None None
TTTGTCATCGTCTGAA-1 abT_VDJ_479_2_1_VJ_1498_2_1 abT_VDJ_479_2_1_VJ_1498_2_1 None None
TTTGTCATCTACCAGA-1 abT_VDJ_425_3_1_VJ_1516_2_1 abT_VDJ_425_3_1_VJ_1516_2_1 None None
TTTGTCATCTCTGAGA-1 abT_VDJ_432_5_1_VJ_959_1_1|abT_VDJ_45_2_1_VJ_9... abT_VDJ_432_5_1_VJ_959_1_1|abT_VDJ_45_2_1_VJ_9... None abT_VDJ_432_5_1_VJ_959_1_1|abT_VDJ_45_2_1_VJ_9...
TTTGTCATCTTCGAGA-1 abT_VDJ_164_1_2_VJ_137_4_1 abT_VDJ_164_1_2_VJ_137_4_1 None None

5333 rows × 4 columns

[12]:
ir.get.airr(mudata, "clone_id")
WARNING: No chain indices found under adata.obsm['chain_indices']. Running scirpy.pp.index_chains with default parameters.
[12]:
VJ_1_clone_id VDJ_1_clone_id VJ_2_clone_id VDJ_2_clone_id
cell_id
AAACCTGAGCGATAGC-1 abT_VDJ_412_5_1_VJ_675_2_1 abT_VDJ_412_5_1_VJ_675_2_1 None None
AAACCTGAGTCACGCC-1 abT_VDJ_393_5_1_VJ_1526_1_2 abT_VDJ_393_5_1_VJ_1526_1_2 None None
AAACCTGCACGTCAGC-1 abT_VDJ_393_2_4_VJ_44_2_1 abT_VDJ_393_2_4_VJ_44_2_1 None None
AAACCTGGTCAATACC-1 abT_VDJ_87_3_1_VJ_653_2_1 abT_VDJ_87_3_1_VJ_653_2_1 None None
AAACCTGGTTCGGCAC-1 abT_VDJ_173_4_1_VJ_1404_2_1|abT_VDJ_173_4_1_VJ... abT_VDJ_173_4_1_VJ_1404_2_1|abT_VDJ_173_4_1_VJ... abT_VDJ_173_4_1_VJ_1404_2_1|abT_VDJ_173_4_1_VJ... None
... ... ... ... ...
TTTGTCATCGCCGTGA-1 abT_VDJ_83_1_1_VJ_1366_1_1 abT_VDJ_83_1_1_VJ_1366_1_1 None None
TTTGTCATCGTCTGAA-1 abT_VDJ_479_2_1_VJ_1498_2_1 abT_VDJ_479_2_1_VJ_1498_2_1 None None
TTTGTCATCTACCAGA-1 abT_VDJ_425_3_1_VJ_1516_2_1 abT_VDJ_425_3_1_VJ_1516_2_1 None None
TTTGTCATCTCTGAGA-1 abT_VDJ_432_5_1_VJ_959_1_1|abT_VDJ_45_2_1_VJ_9... abT_VDJ_432_5_1_VJ_959_1_1|abT_VDJ_45_2_1_VJ_9... None abT_VDJ_432_5_1_VJ_959_1_1|abT_VDJ_45_2_1_VJ_9...
TTTGTCATCTTCGAGA-1 abT_VDJ_164_1_2_VJ_137_4_1 abT_VDJ_164_1_2_VJ_137_4_1 None None

5333 rows × 4 columns

Or you can add transfer = True, which will perform dandelion’s tl.transfer.

[13]:
irdatax = ddl.tl.to_scirpy(vdj, transfer=True)
irdatax
/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.12/site-packages/mudata/_core/mudata.py:1598: FutureWarning: From 0.4 .update() will not pull obs/var columns from individual modalities by default anymore. Set mudata.set_options(pull_on_update=False) to adopt the new behaviour, which will become the default. Use new pull_obs/pull_var and push_obs/push_var methods for more flexibility.
/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.12/site-packages/mudata/_core/mudata.py:1461: FutureWarning: From 0.4 .update() will not pull obs/var columns from individual modalities by default anymore. Set mudata.set_options(pull_on_update=False) to adopt the new behaviour, which will become the default. Use new pull_obs/pull_var and push_obs/push_var methods for more flexibility.
[13]:
MuData object with n_obs × n_vars = 5351 × 0
  1 modality
    airr:   5351 x 0
      obs:  'clone_id', 'clone_id_rank', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'v_call_VJ', 'd_call_VDJ', 'j_call_VDJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'locus_VDJ', 'locus_VJ', 'umi_count_VDJ', 'umi_count_VJ', 'productive_VDJ_main', 'productive_VJ_main', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'junction_VDJ_main', 'junction_VJ_main', 'junction_aa_VDJ_main', 'junction_aa_VJ_main', 'locus_VDJ_main', 'locus_VJ_main', 'umi_count_VDJ_main', 'umi_count_VJ_main', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'
      uns:  'dandelion', 'neighbors', 'clone_id'
      obsm: 'airr', 'X_vdj'
      obsp: 'connectivities', 'distances'
[14]:
irdatax = ddl.tl.to_scirpy(vdj, transfer=True, to_mudata=False)
irdatax
[14]:
AnnData object with n_obs × n_vars = 5351 × 0
    obs: 'clone_id', 'clone_id_rank', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'v_call_VJ', 'd_call_VDJ', 'j_call_VDJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'locus_VDJ', 'locus_VJ', 'umi_count_VDJ', 'umi_count_VJ', 'productive_VDJ_main', 'productive_VJ_main', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'junction_VDJ_main', 'junction_VJ_main', 'junction_aa_VDJ_main', 'junction_aa_VJ_main', 'locus_VDJ_main', 'locus_VJ_main', 'umi_count_VDJ_main', 'umi_count_VJ_main', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'
    uns: 'dandelion', 'neighbors', 'clone_id'
    obsm: 'airr', 'X_vdj'
    obsp: 'connectivities', 'distances'

ddl.from_scirpy : Converting scirpy to dandelion

Converting MuData back to Dandelion

[15]:
vdjx = ddl.tl.from_scirpy(mudata)
vdjx
/Users/uqztuong/Documents/GitHub/dandelion/src/dandelion/polars/core/_core.py:2487: PerformanceWarning: Determining the column names of a LazyFrame requires resolving its schema, which is a potentially expensive operation. Use `LazyFrame.collect_schema().names()` to get the column names without this warning.
/Users/uqztuong/Documents/GitHub/dandelion/src/dandelion/polars/core/_core.py:2488: PerformanceWarning: Determining the column names of a LazyFrame requires resolving its schema, which is a potentially expensive operation. Use `LazyFrame.collect_schema().names()` to get the column names without this warning.
[15]:
Lazy Dandelion object with n_obs = 5333 and n_contigs = 10836
    data: c_call, c_cigar, c_sequence_end, c_sequence_start, clone_id, consensus_count, d_call, d_cigar, d_sequence_end, d_sequence_start, germline_alignment, is_cell, j_call, j_cigar, j_sequence_end, j_sequence_start, junction, junction_aa, junction_aa_length, junction_length, locus, productive, rearrangement_status, rev_comp, sequence, sequence_aa, sequence_alignment, sequence_id, umi_count, v_call, v_cigar, v_sequence_end, v_sequence_start, cell_id, clone_id_rank, c_call_VDJ, c_call_VJ, d_call_VDJ, j_call_VDJ, j_call_VJ, v_call_VDJ, v_call_VJ, umi_count_VDJ, umi_count_VJ, c_call_VDJ_main, c_call_VJ_main, d_call_VDJ_main, j_call_VDJ_main, j_call_VJ_main, v_call_VDJ_main, v_call_VJ_main, umi_count_VDJ_main, umi_count_VJ_main, chain_status, rearrangement_status_VDJ, rearrangement_status_VJ
    metadata: cell_id, clone_id, clone_id_rank, c_call_VDJ, c_call_VJ, d_call_VDJ, j_call_VDJ, j_call_VJ, junction_VDJ, junction_VJ, junction_aa_VDJ, junction_aa_VJ, locus_VDJ, locus_VJ, v_call_VDJ, v_call_VJ, umi_count_VDJ, umi_count_VJ, c_call_VDJ_main, c_call_VJ_main, d_call_VDJ_main, j_call_VDJ_main, j_call_VJ_main, junction_VDJ_main, junction_VJ_main, junction_aa_VDJ_main, junction_aa_VJ_main, locus_VDJ_main, locus_VJ_main, v_call_VDJ_main, v_call_VJ_main, umi_count_VDJ_main, umi_count_VJ_main, locus_status, chain_status, rearrangement_status_VDJ, rearrangement_status_VJ
    graph: networkx graph of 5333 vertices

Converting AnnData back to Dandelion

[16]:
vdjx = ddl.tl.from_scirpy(irdata)
vdjx
/Users/uqztuong/Documents/GitHub/dandelion/src/dandelion/polars/core/_core.py:2487: PerformanceWarning: Determining the column names of a LazyFrame requires resolving its schema, which is a potentially expensive operation. Use `LazyFrame.collect_schema().names()` to get the column names without this warning.
/Users/uqztuong/Documents/GitHub/dandelion/src/dandelion/polars/core/_core.py:2488: PerformanceWarning: Determining the column names of a LazyFrame requires resolving its schema, which is a potentially expensive operation. Use `LazyFrame.collect_schema().names()` to get the column names without this warning.
[16]:
Lazy Dandelion object with n_obs = 5333 and n_contigs = 10836
    data: c_call, c_cigar, c_sequence_end, c_sequence_start, clone_id, consensus_count, d_call, d_cigar, d_sequence_end, d_sequence_start, germline_alignment, is_cell, j_call, j_cigar, j_sequence_end, j_sequence_start, junction, junction_aa, junction_aa_length, junction_length, locus, productive, rearrangement_status, rev_comp, sequence, sequence_aa, sequence_alignment, sequence_id, umi_count, v_call, v_cigar, v_sequence_end, v_sequence_start, cell_id, clone_id_rank, c_call_VDJ, c_call_VJ, d_call_VDJ, j_call_VDJ, j_call_VJ, v_call_VDJ, v_call_VJ, umi_count_VDJ, umi_count_VJ, c_call_VDJ_main, c_call_VJ_main, d_call_VDJ_main, j_call_VDJ_main, j_call_VJ_main, v_call_VDJ_main, v_call_VJ_main, umi_count_VDJ_main, umi_count_VJ_main, chain_status, rearrangement_status_VDJ, rearrangement_status_VJ
    metadata: cell_id, clone_id, clone_id_rank, c_call_VDJ, c_call_VJ, d_call_VDJ, j_call_VDJ, j_call_VJ, junction_VDJ, junction_VJ, junction_aa_VDJ, junction_aa_VJ, locus_VDJ, locus_VJ, v_call_VDJ, v_call_VJ, umi_count_VDJ, umi_count_VJ, c_call_VDJ_main, c_call_VJ_main, d_call_VDJ_main, j_call_VDJ_main, j_call_VJ_main, junction_VDJ_main, junction_VJ_main, junction_aa_VDJ_main, junction_aa_VJ_main, locus_VDJ_main, locus_VJ_main, v_call_VDJ_main, v_call_VJ_main, umi_count_VDJ_main, umi_count_VJ_main, locus_status, chain_status, rearrangement_status_VDJ, rearrangement_status_VJ
    graph: networkx graph of 10553 vertices
[17]:
vdjx.metadata
[17]:
naive plan: (run LazyFrame.explain(optimized=True) to see the optimized plan)

Parquet SCAN [/var/folders/_r/j_8_fj3x28n2th3ch0ckn9c40000gt/T/tmpy2dxbwiw.parquet]

PROJECT */37 COLUMNS

ESTIMATED ROWS: 5333

This time, find clones with scirpy’s method.

[18]:
ir.tl.chain_qc(irdata)
ir.pp.ir_dist(irdata)
ir.tl.define_clonotypes(irdata, receptor_arms="all", dual_ir="primary_only")
irdata
[18]:
AnnData object with n_obs × n_vars = 5333 × 36601
    obs: 'receptor_type', 'receptor_subtype', 'chain_pairing', 'clone_id', 'clone_id_size'
    var: 'gene_ids', 'feature_types', 'genome', 'pattern', 'read', 'sequence'
    uns: 'clone_id', 'chain_indices', 'ir_dist_nt_identity'
    obsm: 'airr', 'chain_indices'

Visualising with scirpy’s plotting tools

You can now also plot dandelion networks using scirpy’s functions.

[19]:
irdata = ddl.tl.to_scirpy(vdj, to_mudata=True, gex_adata=adata)
irdata
/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.12/site-packages/mudata/_core/mudata.py:1598: FutureWarning: From 0.4 .update() will not pull obs/var columns from individual modalities by default anymore. Set mudata.set_options(pull_on_update=False) to adopt the new behaviour, which will become the default. Use new pull_obs/pull_var and push_obs/push_var methods for more flexibility.
/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.12/site-packages/mudata/_core/mudata.py:1461: FutureWarning: From 0.4 .update() will not pull obs/var columns from individual modalities by default anymore. Set mudata.set_options(pull_on_update=False) to adopt the new behaviour, which will become the default. Use new pull_obs/pull_var and push_obs/push_var methods for more flexibility.
[19]:
MuData object with n_obs × n_vars = 10553 × 36601
  2 modalities
    gex:    10553 x 36601
      var:  'gene_ids', 'feature_types', 'genome', 'pattern', 'read', 'sequence'
    airr:   5333 x 0
      uns:  'clone_id'
      obsm: 'airr'
[20]:
ddl.tl.transfer(irdata, vdj, overwrite=True)
irdata
[20]:
MuData object with n_obs × n_vars = 10553 × 36601
  2 modalities
    gex:    10553 x 36601
      var:  'gene_ids', 'feature_types', 'genome', 'pattern', 'read', 'sequence'
    airr:   5333 x 0
      obs:  'clone_id', 'clone_id_rank', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'v_call_VJ', 'd_call_VDJ', 'j_call_VDJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'locus_VDJ', 'locus_VJ', 'umi_count_VDJ', 'umi_count_VJ', 'productive_VDJ_main', 'productive_VJ_main', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'junction_VDJ_main', 'junction_VJ_main', 'junction_aa_VDJ_main', 'junction_aa_VJ_main', 'locus_VDJ_main', 'locus_VJ_main', 'umi_count_VDJ_main', 'umi_count_VJ_main', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'
      uns:  'clone_id', 'dandelion', 'neighbors'
      obsm: 'airr', 'X_vdj'
      obsp: 'connectivities', 'distances'
[21]:
ddl.tl.transfer(
    irdata, vdj, overwrite=True
)  # overwrite scirpy's clone_id definition
[22]:
ir.tl.clonotype_network(irdata, min_cells=2)
ir.pl.clonotype_network(
    irdata, color="clone_id", panel_size=(7, 7), show_labels=False
)  # hiding labels because they clutter the plot
/Users/uqztuong/Documents/GitHub/scirpy/src/scirpy/pl/_clonotypes.py:179: FutureWarning: Use obsm (e.g. `k in adata.obsm` or `adata.obsm.keys() | {'u'}`) instead of AnnData.obsm_keys, AnnData.obsm_keys is deprecated and will be removed in the future.
[22]:
<Axes: >
../../_images/notebooks_polars_1c_dandelion_scirpy_polars_35_2.png

We can also use a shorter clone_id name ordered by size, called clone_id_rank.

[23]:
ddl.tl.transfer(irdata, vdj, clone_key="clone_id_rank")
ir.tl.clonotype_network(irdata, clonotype_key="clone_id_rank", min_cells=2)
ir.pl.clonotype_network(irdata, color="clone_id_rank", panel_size=(7, 7))
/Users/uqztuong/Documents/GitHub/scirpy/src/scirpy/pl/_clonotypes.py:179: FutureWarning: Use obsm (e.g. `k in adata.obsm` or `adata.obsm.keys() | {'u'}`) instead of AnnData.obsm_keys, AnnData.obsm_keys is deprecated and will be removed in the future.
[23]:
<Axes: >
../../_images/notebooks_polars_1c_dandelion_scirpy_polars_37_2.png

you can also collapse the networks to a single node and plot by size

[24]:
ddl.tl.transfer(irdata, vdj, clone_key="clone_id_rank", collapse_nodes=True)
ir.tl.clonotype_network(irdata, clonotype_key="clone_id_rank", min_cells=2)
ir.pl.clonotype_network(irdata, color="clone_id_rank", panel_size=(7, 7))
/Users/uqztuong/Documents/GitHub/scirpy/src/scirpy/pl/_clonotypes.py:179: FutureWarning: Use obsm (e.g. `k in adata.obsm` or `adata.obsm.keys() | {'u'}`) instead of AnnData.obsm_keys, AnnData.obsm_keys is deprecated and will be removed in the future.
[24]:
<Axes: >
../../_images/notebooks_polars_1c_dandelion_scirpy_polars_39_2.png

Using ddl.pl.clone_network on MuData with scirpy data

Now that we have created a MuData object with scirpy data, we can also use dandelion’s plotting function to visualise the clone networks. The way to access the column in the obs of the MuData object is to prefix the observation key with the modality name, for example gex:{column name}. If the columns are in the VDJ modality, you can use vdj:{column name}. Or if the column is in the shared obs i.e. MuData.obs, you can use just the column name.

[25]:
# create random 3 categorical variable in gex obs for testing
import numpy as np
import pandas as pd

# create random 3 categorical variable in gex obs for testing
irdata.mod["gex"].obs["new_category"] = pd.Categorical(
    np.random.choice(["A", "B", "C"], size=irdata.mod["gex"].n_obs)
)
# also create in the shared obs
irdata.obs["shared_category"] = pd.Categorical(
    np.random.choice(["D", "E", "F"], size=irdata.n_obs)
)
irdata
[25]:
MuData object with n_obs × n_vars = 10553 × 36601
  obs:      'shared_category'
  2 modalities
    gex:    10553 x 36601
      obs:  'new_category'
      var:  'gene_ids', 'feature_types', 'genome', 'pattern', 'read', 'sequence'
    airr:   5333 x 0
      obs:  'clone_id', 'clone_id_rank', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'v_call_VJ', 'd_call_VDJ', 'j_call_VDJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'locus_VDJ', 'locus_VJ', 'umi_count_VDJ', 'umi_count_VJ', 'productive_VDJ_main', 'productive_VJ_main', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'junction_VDJ_main', 'junction_VJ_main', 'junction_aa_VDJ_main', 'junction_aa_VJ_main', 'locus_VDJ_main', 'locus_VJ_main', 'umi_count_VDJ_main', 'umi_count_VJ_main', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'
      uns:  'clone_id', 'dandelion', 'neighbors', 'gex_neighbors', 'clonotype_network', 'clone_id_colors', 'clone_id_rank', 'clone_id_rank_colors'
      obsm: 'airr', 'X_vdj', 'X_clonotype_network'
      obsp: 'connectivities', 'distances'
[26]:
sc.set_figure_params(figsize=(6, 6))
ddl.pl.clone_network(irdata, color="gex:new_category")
ddl.pl.clone_network(irdata, color="shared_category")
../../_images/notebooks_polars_1c_dandelion_scirpy_polars_42_0.png
../../_images/notebooks_polars_1c_dandelion_scirpy_polars_42_1.png