Interoperability with scirpy

It is now possible to convert the file formats between dandelion>=0.1.1 and scirpy>=0.6.2 [Sturm2020] to enhance the collaboration between the analysis toolkits.

We will download the airr_rearrangement.tsv file from here:

# bash
wget https://cf.10xgenomics.com/samples/cell-vdj/4.0.0/sc5p_v2_hs_PBMC_10k/sc5p_v2_hs_PBMC_10k_b_airr_rearrangement.tsv

Gene expression data can also be obtained here

# bash
wget https://cf.10xgenomics.com/samples/cell-vdj/4.0.0/sc5p_v2_hs_PBMC_10k/sc5p_v2_hs_PBMC_10k_filtered_feature_bc_matrix.h5

Import dandelion module

[2]:
# import sys
# sys.path.append("C://Users//Amos Choo//Desktop//dandelion")
import os

import dandelion as ddl


# change directory to somewhere more workable

os.chdir(os.path.expanduser("~/Downloads/dandelion_tutorial/"))

ddl.logging.print_versions()
dandelion==0.5.5.dev16 pandas==2.2.3 numpy==2.1.3 matplotlib==3.10.1 networkx==3.4.2 scipy==1.15.2
[3]:
import scirpy as ir
import scanpy as sc


ir.__version__
[3]:
'0.22.1.dev1+gb32fd24'

dandelion

[4]:
# read in the airr_rearrangement.tsv file
file_location = (
    "sc5p_v2_hs_PBMC_10k/sc5p_v2_hs_PBMC_10k_t_airr_rearrangement.tsv"
)

# read in gene expression data
adata = sc.read_10x_h5(
    "sc5p_v2_hs_PBMC_10k/sc5p_v2_hs_PBMC_10k_filtered_feature_bc_matrix.h5"
)
adata.var_names_make_unique()

vdj = ddl.read_10x_airr(file_location)
vdj
/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.11/site-packages/anndata/_core/anndata.py:1758: UserWarning: Variable names are not unique. To make them unique, call `.var_names_make_unique`.
/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.11/site-packages/anndata/_core/anndata.py:1758: UserWarning: Variable names are not unique. To make them unique, call `.var_names_make_unique`.
[4]:
Dandelion class object with n_obs = 5351 and n_contigs = 10860
    data: 'cell_id', 'sequence_id', 'sequence', 'sequence_aa', 'productive', 'rev_comp', 'v_call', 'v_cigar', 'd_call', 'd_cigar', 'j_call', 'j_cigar', 'c_call', 'c_cigar', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'junction_length', 'junction_aa_length', 'v_sequence_start', 'v_sequence_end', 'd_sequence_start', 'd_sequence_end', 'j_sequence_start', 'j_sequence_end', 'c_sequence_start', 'c_sequence_end', 'consensus_count', 'umi_count', 'is_cell', 'locus', 'rearrangement_status'
    metadata: 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_abT_VDJ', 'd_call_abT_VDJ', 'j_call_abT_VDJ', 'v_call_abT_VJ', 'j_call_abT_VJ', 'c_call_abT_VDJ', 'c_call_abT_VJ', 'productive_abT_VDJ', 'productive_abT_VJ', 'umi_count_abT_VDJ', 'umi_count_abT_VJ', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'v_call_abT_VDJ_main', 'd_call_abT_VDJ_main', 'j_call_abT_VDJ_main', 'v_call_abT_VJ_main', 'j_call_abT_VJ_main', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'

The test file contains a blank clone_id column so we run find_clones to populate it first.

[5]:
ddl.tl.find_clones(vdj)
Finding clones based on abT cell VDJ chains : 100%|██████████| 512/512 [00:00<00:00, 1157.38it/s]
Finding clones based on abT cell VJ chains : 100%|██████████| 1574/1574 [00:00<00:00, 10558.22it/s]
Refining clone assignment based on VJ chain pairing : 100%|██████████| 5351/5351 [00:00<00:00, 230665.17it/s]

ddl.to_scirpy : Converting dandelion to scirpy

[6]:
irdata = ddl.to_scirpy(vdj)
irdata
/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.11/site-packages/anndata/utils.py:349: ExperimentalFeatureWarning: Support for Awkward Arrays is currently experimental. Behavior may change in the future. Please report any issues you may encounter!
/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.11/site-packages/mudata/_core/mudata.py:1531: FutureWarning: From 0.4 .update() will not pull obs/var columns from individual modalities by default anymore. Set mudata.set_options(pull_on_update=False) to adopt the new behaviour, which will become the default. Use new pull_obs/pull_var and push_obs/push_var methods for more flexibility.
/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.11/site-packages/mudata/_core/mudata.py:1429: FutureWarning: From 0.4 .update() will not pull obs/var columns from individual modalities by default anymore. Set mudata.set_options(pull_on_update=False) to adopt the new behaviour, which will become the default. Use new pull_obs/pull_var and push_obs/push_var methods for more flexibility.
[6]:
MuData object with n_obs × n_vars = 5351 × 0
  1 modality
    airr:   5351 x 0
      obsm: 'airr'

Conversion to AnndData in scirpy format is also available

[7]:
mudata = ddl.to_scirpy(vdj, to_mudata=False)
mudata
[7]:
AnnData object with n_obs × n_vars = 5351 × 0
    obsm: 'airr'

If you have gene expression data, the parameter gex_adata supports the gene expression data in AnnData format.

Please note that this will slice to the same cell_id that are present in the same in the AIRR data.

[8]:
irdata = ddl.to_scirpy(vdj, to_mudata=False, gex_adata=adata)
irdata
[8]:
AnnData object with n_obs × n_vars = 5333 × 36601
    var: 'gene_ids', 'feature_types', 'genome', 'pattern', 'read', 'sequence'
    obsm: 'airr'
[9]:
mudata = ddl.to_scirpy(vdj, to_mudata=True, gex_adata=adata)
mudata
/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.11/site-packages/mudata/_core/mudata.py:1531: FutureWarning: From 0.4 .update() will not pull obs/var columns from individual modalities by default anymore. Set mudata.set_options(pull_on_update=False) to adopt the new behaviour, which will become the default. Use new pull_obs/pull_var and push_obs/push_var methods for more flexibility.
/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.11/site-packages/mudata/_core/mudata.py:1429: FutureWarning: From 0.4 .update() will not pull obs/var columns from individual modalities by default anymore. Set mudata.set_options(pull_on_update=False) to adopt the new behaviour, which will become the default. Use new pull_obs/pull_var and push_obs/push_var methods for more flexibility.
[9]:
MuData object with n_obs × n_vars = 10571 × 36601
  2 modalities
    gex:    10553 x 36601
      var:  'gene_ids', 'feature_types', 'genome', 'pattern', 'read', 'sequence'
    airr:   5351 x 0
      obsm: 'airr'

Use scirpy’s get functions to retrieve the relevant airr info (https://scirpy.scverse.org/en/latest/generated/scirpy.get.airr.html)

[10]:
ir.get.airr(irdata, "clone_id")
WARNING: No chain indices found under adata.obsm['chain_indices']. Running scirpy.pp.index_chains with default parameters.
[10]:
VJ_1_clone_id VDJ_1_clone_id VJ_2_clone_id VDJ_2_clone_id
AAACCTGAGCGATAGC-1 abT_VDJ_119_5_2_VJ_306_2_3 abT_VDJ_119_5_2_VJ_306_2_3 None None
AAACCTGAGTCACGCC-1 abT_VDJ_225_5_1_VJ_1416_1_1 abT_VDJ_225_5_1_VJ_1416_1_1 None None
AAACCTGCACGTCAGC-1 abT_VDJ_225_2_1_VJ_211_2_8 abT_VDJ_225_2_1_VJ_211_2_8 None None
AAACCTGGTCAATACC-1 abT_VDJ_282_4_3_VJ_1376_2_1 abT_VDJ_282_4_3_VJ_1376_2_1 None None
AAACCTGGTTCGGCAC-1 abT_VDJ_214_4_8_VJ_1536_2_1|abT_VDJ_214_4_8_VJ... abT_VDJ_214_4_8_VJ_1536_2_1|abT_VDJ_214_4_8_VJ... abT_VDJ_214_4_8_VJ_1536_2_1|abT_VDJ_214_4_8_VJ... None
... ... ... ... ...
TTTGTCATCGCCGTGA-1 abT_VDJ_382_1_1_VJ_723_1_1 abT_VDJ_382_1_1_VJ_723_1_1 None None
TTTGTCATCGTCTGAA-1 abT_VDJ_338_3_1_VJ_592_2_1 abT_VDJ_338_3_1_VJ_592_2_1 None None
TTTGTCATCTACCAGA-1 abT_VDJ_502_3_1_VJ_621_3_1 abT_VDJ_502_3_1_VJ_621_3_1 None None
TTTGTCATCTCTGAGA-1 abT_VDJ_372_5_1_VJ_1420_1_1|abT_VDJ_58_3_1_VJ_... abT_VDJ_372_5_1_VJ_1420_1_1|abT_VDJ_58_3_1_VJ_... None abT_VDJ_372_5_1_VJ_1420_1_1|abT_VDJ_58_3_1_VJ_...
TTTGTCATCTTCGAGA-1 abT_VDJ_359_1_5_VJ_1561_4_1 abT_VDJ_359_1_5_VJ_1561_4_1 None None

5333 rows × 4 columns

[11]:
ir.get.airr(mudata, "clone_id")
WARNING: No chain indices found under adata.obsm['chain_indices']. Running scirpy.pp.index_chains with default parameters.
[11]:
VJ_1_clone_id VDJ_1_clone_id VJ_2_clone_id VDJ_2_clone_id
cell_id
AAACCTGAGCGATAGC-1 abT_VDJ_119_5_2_VJ_306_2_3 abT_VDJ_119_5_2_VJ_306_2_3 None None
AAACCTGAGTCACGCC-1 abT_VDJ_225_5_1_VJ_1416_1_1 abT_VDJ_225_5_1_VJ_1416_1_1 None None
AAACCTGCACGTCAGC-1 abT_VDJ_225_2_1_VJ_211_2_8 abT_VDJ_225_2_1_VJ_211_2_8 None None
AAACCTGGTCAATACC-1 abT_VDJ_282_4_3_VJ_1376_2_1 abT_VDJ_282_4_3_VJ_1376_2_1 None None
AAACCTGGTTCGGCAC-1 abT_VDJ_214_4_8_VJ_1536_2_1|abT_VDJ_214_4_8_VJ... abT_VDJ_214_4_8_VJ_1536_2_1|abT_VDJ_214_4_8_VJ... abT_VDJ_214_4_8_VJ_1536_2_1|abT_VDJ_214_4_8_VJ... None
... ... ... ... ...
TTTGTCATCGCCGTGA-1 abT_VDJ_382_1_1_VJ_723_1_1 abT_VDJ_382_1_1_VJ_723_1_1 None None
TTTGTCATCGTCTGAA-1 abT_VDJ_338_3_1_VJ_592_2_1 abT_VDJ_338_3_1_VJ_592_2_1 None None
TTTGTCATCTACCAGA-1 abT_VDJ_502_3_1_VJ_621_3_1 abT_VDJ_502_3_1_VJ_621_3_1 None None
TTTGTCATCTCTGAGA-1 abT_VDJ_372_5_1_VJ_1420_1_1|abT_VDJ_58_3_1_VJ_... abT_VDJ_372_5_1_VJ_1420_1_1|abT_VDJ_58_3_1_VJ_... None abT_VDJ_372_5_1_VJ_1420_1_1|abT_VDJ_58_3_1_VJ_...
TTTGTCATCTTCGAGA-1 abT_VDJ_359_1_5_VJ_1561_4_1 abT_VDJ_359_1_5_VJ_1561_4_1 None None

5351 rows × 4 columns

Or you can add transfer = True, which will perform dandelion’s tl.transfer.

[12]:
irdatax = ddl.to_scirpy(vdj, transfer=True)
irdatax
/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.11/site-packages/mudata/_core/mudata.py:1531: FutureWarning: From 0.4 .update() will not pull obs/var columns from individual modalities by default anymore. Set mudata.set_options(pull_on_update=False) to adopt the new behaviour, which will become the default. Use new pull_obs/pull_var and push_obs/push_var methods for more flexibility.
/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.11/site-packages/mudata/_core/mudata.py:1429: FutureWarning: From 0.4 .update() will not pull obs/var columns from individual modalities by default anymore. Set mudata.set_options(pull_on_update=False) to adopt the new behaviour, which will become the default. Use new pull_obs/pull_var and push_obs/push_var methods for more flexibility.
[12]:
MuData object with n_obs × n_vars = 5351 × 0
  1 modality
    airr:   5351 x 0
      obs:  'clone_id', 'clone_id_by_size', 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_abT_VDJ', 'd_call_abT_VDJ', 'j_call_abT_VDJ', 'v_call_abT_VJ', 'j_call_abT_VJ', 'c_call_abT_VDJ', 'c_call_abT_VJ', 'productive_abT_VDJ', 'productive_abT_VJ', 'umi_count_abT_VDJ', 'umi_count_abT_VJ', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'v_call_abT_VDJ_main', 'd_call_abT_VDJ_main', 'j_call_abT_VDJ_main', 'v_call_abT_VJ_main', 'j_call_abT_VJ_main', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'
      obsm: 'airr'
[13]:
irdatax = ddl.to_scirpy(vdj, transfer=True, to_mudata=False)
irdatax
[13]:
AnnData object with n_obs × n_vars = 5351 × 0
    obs: 'clone_id', 'clone_id_by_size', 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_abT_VDJ', 'd_call_abT_VDJ', 'j_call_abT_VDJ', 'v_call_abT_VJ', 'j_call_abT_VJ', 'c_call_abT_VDJ', 'c_call_abT_VJ', 'productive_abT_VDJ', 'productive_abT_VJ', 'umi_count_abT_VDJ', 'umi_count_abT_VJ', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'v_call_abT_VDJ_main', 'd_call_abT_VDJ_main', 'j_call_abT_VDJ_main', 'v_call_abT_VJ_main', 'j_call_abT_VJ_main', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'
    obsm: 'airr'

ddl.from_scirpy : Converting scirpy to dandelion

Converting MuData back to Dandelion

[14]:
vdjx = ddl.from_scirpy(mudata)
vdjx
[14]:
Dandelion class object with n_obs = 5351 and n_contigs = 10860
    data: 'c_call', 'c_cigar', 'c_sequence_end', 'c_sequence_start', 'clone_id', 'consensus_count', 'd_call', 'd_cigar', 'd_sequence_end', 'd_sequence_start', 'germline_alignment', 'is_cell', 'j_call', 'j_cigar', 'j_sequence_end', 'j_sequence_start', 'junction', 'junction_aa', 'junction_aa_length', 'junction_length', 'locus', 'productive', 'rearrangement_status', 'rev_comp', 'sequence', 'sequence_aa', 'sequence_alignment', 'sequence_id', 'umi_count', 'v_call', 'v_cigar', 'v_sequence_end', 'v_sequence_start', 'cell_id'
    metadata: 'clone_id', 'clone_id_by_size', 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_abT_VDJ', 'd_call_abT_VDJ', 'j_call_abT_VDJ', 'v_call_abT_VJ', 'j_call_abT_VJ', 'c_call_abT_VDJ', 'c_call_abT_VJ', 'productive_abT_VDJ', 'productive_abT_VJ', 'umi_count_abT_VDJ', 'umi_count_abT_VJ', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'v_call_abT_VDJ_main', 'd_call_abT_VDJ_main', 'j_call_abT_VDJ_main', 'v_call_abT_VJ_main', 'j_call_abT_VJ_main', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'

Converting AnnData back to Dandelion

[15]:
vdjx = ddl.from_scirpy(irdata)
vdjx
[15]:
Dandelion class object with n_obs = 5333 and n_contigs = 10836
    data: 'c_call', 'c_cigar', 'c_sequence_end', 'c_sequence_start', 'clone_id', 'consensus_count', 'd_call', 'd_cigar', 'd_sequence_end', 'd_sequence_start', 'germline_alignment', 'is_cell', 'j_call', 'j_cigar', 'j_sequence_end', 'j_sequence_start', 'junction', 'junction_aa', 'junction_aa_length', 'junction_length', 'locus', 'productive', 'rearrangement_status', 'rev_comp', 'sequence', 'sequence_aa', 'sequence_alignment', 'sequence_id', 'umi_count', 'v_call', 'v_cigar', 'v_sequence_end', 'v_sequence_start', 'cell_id'
    metadata: 'clone_id', 'clone_id_by_size', 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_abT_VDJ', 'd_call_abT_VDJ', 'j_call_abT_VDJ', 'v_call_abT_VJ', 'j_call_abT_VJ', 'c_call_abT_VDJ', 'c_call_abT_VJ', 'productive_abT_VDJ', 'productive_abT_VJ', 'umi_count_abT_VDJ', 'umi_count_abT_VJ', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'v_call_abT_VDJ_main', 'd_call_abT_VDJ_main', 'j_call_abT_VDJ_main', 'v_call_abT_VJ_main', 'j_call_abT_VJ_main', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'
[16]:
vdjx.metadata
[16]:
clone_id clone_id_by_size locus_VDJ locus_VJ productive_VDJ productive_VJ v_call_VDJ d_call_VDJ j_call_VDJ v_call_VJ ... d_call_abT_VDJ_main j_call_abT_VDJ_main v_call_abT_VJ_main j_call_abT_VJ_main isotype isotype_status locus_status chain_status rearrangement_status_VDJ rearrangement_status_VJ
AAACCTGAGCGATAGC-1 abT_VDJ_119_5_2_VJ_306_2_3 422 TRB TRA True True TRBV6-5 None TRBJ2-3 TRAV23/DV6 ... None TRBJ2-3 TRAV23/DV6 TRAJ22 None None TRB + TRA Single pair standard standard
AAACCTGAGTCACGCC-1 abT_VDJ_225_5_1_VJ_1416_1_1 5053 TRB TRA True True TRBV6-2 None TRBJ2-6 TRAV8-6 ... None TRBJ2-6 TRAV8-6 TRAJ8 None None TRB + TRA Single pair standard standard
AAACCTGCACGTCAGC-1 abT_VDJ_225_2_1_VJ_211_2_8 4081 TRB TRA True True TRBV6-2 None TRBJ2-6 TRAV1-2 ... None TRBJ2-6 TRAV1-2 TRAJ33 None None TRB + TRA Single pair standard standard
AAACCTGGTCAATACC-1 abT_VDJ_282_4_3_VJ_1376_2_1 4080 TRB TRA True True TRBV12-4 None TRBJ2-7 TRAV22 ... None TRBJ2-7 TRAV22 TRAJ4 None None TRB + TRA Single pair standard standard
AAACCTGGTTCGGCAC-1 abT_VDJ_214_4_8_VJ_1113_2_1|abT_VDJ_214_4_8_VJ... 4078|4079 TRB TRA|TRA True True|True TRBV20-1 None TRBJ1-1 TRAV8-3|TRAV8-2 ... None TRBJ1-1 TRAV8-3 TRAJ21 None Multi TRB + Extra VJ Extra pair standard standard
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
TTTGTCATCGCCGTGA-1 abT_VDJ_382_1_1_VJ_723_1_1 2061 TRB TRA True True TRBV12-4 None TRBJ2-3 TRAV8-2 ... None TRBJ2-3 TRAV8-2 TRAJ13 None None TRB + TRA Single pair standard standard
TTTGTCATCGTCTGAA-1 abT_VDJ_338_3_1_VJ_592_2_1 2060 TRB TRA True True TRBV7-8 None TRBJ1-3 TRAV8-6 ... None TRBJ1-3 TRAV8-6 TRAJ27 None None TRB + TRA Single pair standard standard
TTTGTCATCTACCAGA-1 abT_VDJ_502_3_1_VJ_621_3_1 2059 TRB TRA True True TRBV6-6 TRBD1 TRBJ2-3 TRAV8-6 ... TRBD1 TRBJ2-3 TRAV8-6 TRAJ48 None None TRB + TRA Single pair standard standard
TTTGTCATCTCTGAGA-1 abT_VDJ_372_5_1_VJ_1420_1_1|abT_VDJ_58_3_1_VJ_... 2057|2058 TRB|TRB TRA True|True True TRBV7-2|TRBV11-2 None TRBJ1-2|TRBJ1-4 TRAV3 ... None TRBJ1-2 TRAV3 TRAJ22 None|None Multi Extra VDJ + TRA Extra pair standard standard
TTTGTCATCTTCGAGA-1 abT_VDJ_359_1_5_VJ_1561_4_1 6080 TRB TRA True True TRBV2 None TRBJ1-5 TRAV12-1 ... None TRBJ1-5 TRAV12-1 TRAJ9 None None TRB + TRA Single pair standard standard

5333 rows × 46 columns

This time, find clones with scirpy’s method.

[17]:
ir.tl.chain_qc(irdata)
ir.pp.ir_dist(irdata)
ir.tl.define_clonotypes(irdata, receptor_arms="all", dual_ir="primary_only")
irdata
[17]:
AnnData object with n_obs × n_vars = 5333 × 36601
    obs: 'receptor_type', 'receptor_subtype', 'chain_pairing', 'clone_id', 'clone_id_size'
    var: 'gene_ids', 'feature_types', 'genome', 'pattern', 'read', 'sequence'
    uns: 'chain_indices', 'ir_dist_nt_identity', 'clone_id'
    obsm: 'airr', 'chain_indices'

Visualising with scirpy’s plotting tools

You can now also plot dandelion networks using scirpy’s functions.

[18]:
ddl.tl.generate_network(vdj, key="junction")
Setting up data: 10860it [00:00, 22983.30it/s]
Calculating distances : 100%|██████████| 6097/6097 [00:00<00:00, 14028.15it/s]
Aggregating distances : 100%|██████████| 4/4 [00:00<00:00, 19.92it/s]
Sorting into clusters : 100%|██████████| 6097/6097 [00:04<00:00, 1235.43it/s]
Calculating minimum spanning tree : 100%|██████████| 69/69 [00:00<00:00, 1144.94it/s]
Generating edge list : 100%|██████████| 69/69 [00:00<00:00, 3913.60it/s]
Computing overlap : 100%|██████████| 6097/6097 [00:04<00:00, 1220.27it/s]
Adjust overlap : 100%|██████████| 684/684 [00:00<00:00, 3583.29it/s]
Linking edges : 100%|██████████| 5245/5245 [00:00<00:00, 52144.47it/s]
Computing network layout
Computing expanded network layout
[19]:
irdata.obs["scirpy_clone_id"] = irdata.obs["clone_id"]  # stash it
ddl.tl.transfer(
    irdata, vdj, overwrite=True
)  # overwrite scirpy's clone_id definition
/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.11/site-packages/pandas/core/arraylike.py:399: RuntimeWarning: overflow encountered in exp
[20]:
ir.tl.clonotype_network(irdata, min_cells=2)
ir.pl.clonotype_network(irdata, color="clone_id", panel_size=(7, 7))
[20]:
<Axes: >
../_images/notebooks_1c_dandelion_scirpy_32_1.png

to swap to a shorter clone_id name (ordered by size)

[21]:
ddl.tl.transfer(irdata, vdj, clone_key="clone_id_by_size")
ir.tl.clonotype_network(irdata, clonotype_key="clone_id_by_size", min_cells=2)
ir.pl.clonotype_network(irdata, color="clone_id_by_size", panel_size=(7, 7))
/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.11/site-packages/pandas/core/arraylike.py:399: RuntimeWarning: overflow encountered in exp
[21]:
<Axes: >
../_images/notebooks_1c_dandelion_scirpy_34_2.png

you can also collapse the networks to a single node and plot by size

[22]:
ddl.tl.transfer(irdata, vdj, clone_key="clone_id_by_size", collapse_nodes=True)
ir.tl.clonotype_network(irdata, clonotype_key="clone_id_by_size", min_cells=2)
ir.pl.clonotype_network(irdata, color="scirpy_clone_id", panel_size=(7, 7))
/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.11/site-packages/pandas/core/arraylike.py:399: RuntimeWarning: overflow encountered in exp
[22]:
<Axes: >
../_images/notebooks_1c_dandelion_scirpy_36_2.png
[ ]: