Interoperability with scirpy
It is now possible to convert the file formats between dandelion>=0.1.1 and scirpy>=0.6.2 [Sturm2020] to enhance the collaboration between the analysis toolkits.
We will download the airr_rearrangement.tsv file from here:
# bash
wget https://cf.10xgenomics.com/samples/cell-vdj/4.0.0/sc5p_v2_hs_PBMC_10k/sc5p_v2_hs_PBMC_10k_b_airr_rearrangement.tsv
Gene expression data can also be obtained here
# bash
wget https://cf.10xgenomics.com/samples/cell-vdj/4.0.0/sc5p_v2_hs_PBMC_10k/sc5p_v2_hs_PBMC_10k_filtered_feature_bc_matrix.h5
Import dandelion module
[2]:
# import sys
# sys.path.append("C://Users//Amos Choo//Desktop//dandelion")
import os
import dandelion as ddl
# change directory to somewhere more workable
os.chdir(os.path.expanduser("~/Downloads/dandelion_tutorial/"))
ddl.logging.print_versions()
dandelion==0.5.5.dev16 pandas==2.2.3 numpy==2.1.3 matplotlib==3.10.1 networkx==3.4.2 scipy==1.15.2
[3]:
import scirpy as ir
import scanpy as sc
ir.__version__
[3]:
'0.22.1.dev1+gb32fd24'
dandelion
[4]:
# read in the airr_rearrangement.tsv file
file_location = (
"sc5p_v2_hs_PBMC_10k/sc5p_v2_hs_PBMC_10k_t_airr_rearrangement.tsv"
)
# read in gene expression data
adata = sc.read_10x_h5(
"sc5p_v2_hs_PBMC_10k/sc5p_v2_hs_PBMC_10k_filtered_feature_bc_matrix.h5"
)
adata.var_names_make_unique()
vdj = ddl.read_10x_airr(file_location)
vdj
/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.11/site-packages/anndata/_core/anndata.py:1758: UserWarning: Variable names are not unique. To make them unique, call `.var_names_make_unique`.
/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.11/site-packages/anndata/_core/anndata.py:1758: UserWarning: Variable names are not unique. To make them unique, call `.var_names_make_unique`.
[4]:
Dandelion class object with n_obs = 5351 and n_contigs = 10860
data: 'cell_id', 'sequence_id', 'sequence', 'sequence_aa', 'productive', 'rev_comp', 'v_call', 'v_cigar', 'd_call', 'd_cigar', 'j_call', 'j_cigar', 'c_call', 'c_cigar', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'junction_length', 'junction_aa_length', 'v_sequence_start', 'v_sequence_end', 'd_sequence_start', 'd_sequence_end', 'j_sequence_start', 'j_sequence_end', 'c_sequence_start', 'c_sequence_end', 'consensus_count', 'umi_count', 'is_cell', 'locus', 'rearrangement_status'
metadata: 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_abT_VDJ', 'd_call_abT_VDJ', 'j_call_abT_VDJ', 'v_call_abT_VJ', 'j_call_abT_VJ', 'c_call_abT_VDJ', 'c_call_abT_VJ', 'productive_abT_VDJ', 'productive_abT_VJ', 'umi_count_abT_VDJ', 'umi_count_abT_VJ', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'v_call_abT_VDJ_main', 'd_call_abT_VDJ_main', 'j_call_abT_VDJ_main', 'v_call_abT_VJ_main', 'j_call_abT_VJ_main', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'
The test file contains a blank clone_id column so we run find_clones to populate it first.
[5]:
ddl.tl.find_clones(vdj)
Finding clones based on abT cell VDJ chains : 100%|██████████| 512/512 [00:00<00:00, 1157.38it/s]
Finding clones based on abT cell VJ chains : 100%|██████████| 1574/1574 [00:00<00:00, 10558.22it/s]
Refining clone assignment based on VJ chain pairing : 100%|██████████| 5351/5351 [00:00<00:00, 230665.17it/s]
ddl.to_scirpy : Converting dandelion to scirpy
[6]:
irdata = ddl.to_scirpy(vdj)
irdata
/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.11/site-packages/anndata/utils.py:349: ExperimentalFeatureWarning: Support for Awkward Arrays is currently experimental. Behavior may change in the future. Please report any issues you may encounter!
/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.11/site-packages/mudata/_core/mudata.py:1531: FutureWarning: From 0.4 .update() will not pull obs/var columns from individual modalities by default anymore. Set mudata.set_options(pull_on_update=False) to adopt the new behaviour, which will become the default. Use new pull_obs/pull_var and push_obs/push_var methods for more flexibility.
/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.11/site-packages/mudata/_core/mudata.py:1429: FutureWarning: From 0.4 .update() will not pull obs/var columns from individual modalities by default anymore. Set mudata.set_options(pull_on_update=False) to adopt the new behaviour, which will become the default. Use new pull_obs/pull_var and push_obs/push_var methods for more flexibility.
[6]:
MuData object with n_obs × n_vars = 5351 × 0
1 modality
airr: 5351 x 0
obsm: 'airr'Conversion to AnndData in scirpy format is also available
[7]:
mudata = ddl.to_scirpy(vdj, to_mudata=False)
mudata
[7]:
AnnData object with n_obs × n_vars = 5351 × 0
obsm: 'airr'
If you have gene expression data, the parameter gex_adata supports the gene expression data in AnnData format.
Please note that this will slice to the same cell_id that are present in the same in the AIRR data.
[8]:
irdata = ddl.to_scirpy(vdj, to_mudata=False, gex_adata=adata)
irdata
[8]:
AnnData object with n_obs × n_vars = 5333 × 36601
var: 'gene_ids', 'feature_types', 'genome', 'pattern', 'read', 'sequence'
obsm: 'airr'
[9]:
mudata = ddl.to_scirpy(vdj, to_mudata=True, gex_adata=adata)
mudata
/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.11/site-packages/mudata/_core/mudata.py:1531: FutureWarning: From 0.4 .update() will not pull obs/var columns from individual modalities by default anymore. Set mudata.set_options(pull_on_update=False) to adopt the new behaviour, which will become the default. Use new pull_obs/pull_var and push_obs/push_var methods for more flexibility.
/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.11/site-packages/mudata/_core/mudata.py:1429: FutureWarning: From 0.4 .update() will not pull obs/var columns from individual modalities by default anymore. Set mudata.set_options(pull_on_update=False) to adopt the new behaviour, which will become the default. Use new pull_obs/pull_var and push_obs/push_var methods for more flexibility.
[9]:
MuData object with n_obs × n_vars = 10571 × 36601
2 modalities
gex: 10553 x 36601
var: 'gene_ids', 'feature_types', 'genome', 'pattern', 'read', 'sequence'
airr: 5351 x 0
obsm: 'airr'Use scirpy’s get functions to retrieve the relevant airr info (https://scirpy.scverse.org/en/latest/generated/scirpy.get.airr.html)
[10]:
ir.get.airr(irdata, "clone_id")
WARNING: No chain indices found under adata.obsm['chain_indices']. Running scirpy.pp.index_chains with default parameters.
[10]:
| VJ_1_clone_id | VDJ_1_clone_id | VJ_2_clone_id | VDJ_2_clone_id | |
|---|---|---|---|---|
| AAACCTGAGCGATAGC-1 | abT_VDJ_119_5_2_VJ_306_2_3 | abT_VDJ_119_5_2_VJ_306_2_3 | None | None |
| AAACCTGAGTCACGCC-1 | abT_VDJ_225_5_1_VJ_1416_1_1 | abT_VDJ_225_5_1_VJ_1416_1_1 | None | None |
| AAACCTGCACGTCAGC-1 | abT_VDJ_225_2_1_VJ_211_2_8 | abT_VDJ_225_2_1_VJ_211_2_8 | None | None |
| AAACCTGGTCAATACC-1 | abT_VDJ_282_4_3_VJ_1376_2_1 | abT_VDJ_282_4_3_VJ_1376_2_1 | None | None |
| AAACCTGGTTCGGCAC-1 | abT_VDJ_214_4_8_VJ_1536_2_1|abT_VDJ_214_4_8_VJ... | abT_VDJ_214_4_8_VJ_1536_2_1|abT_VDJ_214_4_8_VJ... | abT_VDJ_214_4_8_VJ_1536_2_1|abT_VDJ_214_4_8_VJ... | None |
| ... | ... | ... | ... | ... |
| TTTGTCATCGCCGTGA-1 | abT_VDJ_382_1_1_VJ_723_1_1 | abT_VDJ_382_1_1_VJ_723_1_1 | None | None |
| TTTGTCATCGTCTGAA-1 | abT_VDJ_338_3_1_VJ_592_2_1 | abT_VDJ_338_3_1_VJ_592_2_1 | None | None |
| TTTGTCATCTACCAGA-1 | abT_VDJ_502_3_1_VJ_621_3_1 | abT_VDJ_502_3_1_VJ_621_3_1 | None | None |
| TTTGTCATCTCTGAGA-1 | abT_VDJ_372_5_1_VJ_1420_1_1|abT_VDJ_58_3_1_VJ_... | abT_VDJ_372_5_1_VJ_1420_1_1|abT_VDJ_58_3_1_VJ_... | None | abT_VDJ_372_5_1_VJ_1420_1_1|abT_VDJ_58_3_1_VJ_... |
| TTTGTCATCTTCGAGA-1 | abT_VDJ_359_1_5_VJ_1561_4_1 | abT_VDJ_359_1_5_VJ_1561_4_1 | None | None |
5333 rows × 4 columns
[11]:
ir.get.airr(mudata, "clone_id")
WARNING: No chain indices found under adata.obsm['chain_indices']. Running scirpy.pp.index_chains with default parameters.
[11]:
| VJ_1_clone_id | VDJ_1_clone_id | VJ_2_clone_id | VDJ_2_clone_id | |
|---|---|---|---|---|
| cell_id | ||||
| AAACCTGAGCGATAGC-1 | abT_VDJ_119_5_2_VJ_306_2_3 | abT_VDJ_119_5_2_VJ_306_2_3 | None | None |
| AAACCTGAGTCACGCC-1 | abT_VDJ_225_5_1_VJ_1416_1_1 | abT_VDJ_225_5_1_VJ_1416_1_1 | None | None |
| AAACCTGCACGTCAGC-1 | abT_VDJ_225_2_1_VJ_211_2_8 | abT_VDJ_225_2_1_VJ_211_2_8 | None | None |
| AAACCTGGTCAATACC-1 | abT_VDJ_282_4_3_VJ_1376_2_1 | abT_VDJ_282_4_3_VJ_1376_2_1 | None | None |
| AAACCTGGTTCGGCAC-1 | abT_VDJ_214_4_8_VJ_1536_2_1|abT_VDJ_214_4_8_VJ... | abT_VDJ_214_4_8_VJ_1536_2_1|abT_VDJ_214_4_8_VJ... | abT_VDJ_214_4_8_VJ_1536_2_1|abT_VDJ_214_4_8_VJ... | None |
| ... | ... | ... | ... | ... |
| TTTGTCATCGCCGTGA-1 | abT_VDJ_382_1_1_VJ_723_1_1 | abT_VDJ_382_1_1_VJ_723_1_1 | None | None |
| TTTGTCATCGTCTGAA-1 | abT_VDJ_338_3_1_VJ_592_2_1 | abT_VDJ_338_3_1_VJ_592_2_1 | None | None |
| TTTGTCATCTACCAGA-1 | abT_VDJ_502_3_1_VJ_621_3_1 | abT_VDJ_502_3_1_VJ_621_3_1 | None | None |
| TTTGTCATCTCTGAGA-1 | abT_VDJ_372_5_1_VJ_1420_1_1|abT_VDJ_58_3_1_VJ_... | abT_VDJ_372_5_1_VJ_1420_1_1|abT_VDJ_58_3_1_VJ_... | None | abT_VDJ_372_5_1_VJ_1420_1_1|abT_VDJ_58_3_1_VJ_... |
| TTTGTCATCTTCGAGA-1 | abT_VDJ_359_1_5_VJ_1561_4_1 | abT_VDJ_359_1_5_VJ_1561_4_1 | None | None |
5351 rows × 4 columns
Or you can add transfer = True, which will perform dandelion’s tl.transfer.
[12]:
irdatax = ddl.to_scirpy(vdj, transfer=True)
irdatax
/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.11/site-packages/mudata/_core/mudata.py:1531: FutureWarning: From 0.4 .update() will not pull obs/var columns from individual modalities by default anymore. Set mudata.set_options(pull_on_update=False) to adopt the new behaviour, which will become the default. Use new pull_obs/pull_var and push_obs/push_var methods for more flexibility.
/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.11/site-packages/mudata/_core/mudata.py:1429: FutureWarning: From 0.4 .update() will not pull obs/var columns from individual modalities by default anymore. Set mudata.set_options(pull_on_update=False) to adopt the new behaviour, which will become the default. Use new pull_obs/pull_var and push_obs/push_var methods for more flexibility.
[12]:
MuData object with n_obs × n_vars = 5351 × 0
1 modality
airr: 5351 x 0
obs: 'clone_id', 'clone_id_by_size', 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_abT_VDJ', 'd_call_abT_VDJ', 'j_call_abT_VDJ', 'v_call_abT_VJ', 'j_call_abT_VJ', 'c_call_abT_VDJ', 'c_call_abT_VJ', 'productive_abT_VDJ', 'productive_abT_VJ', 'umi_count_abT_VDJ', 'umi_count_abT_VJ', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'v_call_abT_VDJ_main', 'd_call_abT_VDJ_main', 'j_call_abT_VDJ_main', 'v_call_abT_VJ_main', 'j_call_abT_VJ_main', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'
obsm: 'airr'[13]:
irdatax = ddl.to_scirpy(vdj, transfer=True, to_mudata=False)
irdatax
[13]:
AnnData object with n_obs × n_vars = 5351 × 0
obs: 'clone_id', 'clone_id_by_size', 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_abT_VDJ', 'd_call_abT_VDJ', 'j_call_abT_VDJ', 'v_call_abT_VJ', 'j_call_abT_VJ', 'c_call_abT_VDJ', 'c_call_abT_VJ', 'productive_abT_VDJ', 'productive_abT_VJ', 'umi_count_abT_VDJ', 'umi_count_abT_VJ', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'v_call_abT_VDJ_main', 'd_call_abT_VDJ_main', 'j_call_abT_VDJ_main', 'v_call_abT_VJ_main', 'j_call_abT_VJ_main', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'
obsm: 'airr'
ddl.from_scirpy : Converting scirpy to dandelion
Converting MuData back to Dandelion
[14]:
vdjx = ddl.from_scirpy(mudata)
vdjx
[14]:
Dandelion class object with n_obs = 5351 and n_contigs = 10860
data: 'c_call', 'c_cigar', 'c_sequence_end', 'c_sequence_start', 'clone_id', 'consensus_count', 'd_call', 'd_cigar', 'd_sequence_end', 'd_sequence_start', 'germline_alignment', 'is_cell', 'j_call', 'j_cigar', 'j_sequence_end', 'j_sequence_start', 'junction', 'junction_aa', 'junction_aa_length', 'junction_length', 'locus', 'productive', 'rearrangement_status', 'rev_comp', 'sequence', 'sequence_aa', 'sequence_alignment', 'sequence_id', 'umi_count', 'v_call', 'v_cigar', 'v_sequence_end', 'v_sequence_start', 'cell_id'
metadata: 'clone_id', 'clone_id_by_size', 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_abT_VDJ', 'd_call_abT_VDJ', 'j_call_abT_VDJ', 'v_call_abT_VJ', 'j_call_abT_VJ', 'c_call_abT_VDJ', 'c_call_abT_VJ', 'productive_abT_VDJ', 'productive_abT_VJ', 'umi_count_abT_VDJ', 'umi_count_abT_VJ', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'v_call_abT_VDJ_main', 'd_call_abT_VDJ_main', 'j_call_abT_VDJ_main', 'v_call_abT_VJ_main', 'j_call_abT_VJ_main', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'
Converting AnnData back to Dandelion
[15]:
vdjx = ddl.from_scirpy(irdata)
vdjx
[15]:
Dandelion class object with n_obs = 5333 and n_contigs = 10836
data: 'c_call', 'c_cigar', 'c_sequence_end', 'c_sequence_start', 'clone_id', 'consensus_count', 'd_call', 'd_cigar', 'd_sequence_end', 'd_sequence_start', 'germline_alignment', 'is_cell', 'j_call', 'j_cigar', 'j_sequence_end', 'j_sequence_start', 'junction', 'junction_aa', 'junction_aa_length', 'junction_length', 'locus', 'productive', 'rearrangement_status', 'rev_comp', 'sequence', 'sequence_aa', 'sequence_alignment', 'sequence_id', 'umi_count', 'v_call', 'v_cigar', 'v_sequence_end', 'v_sequence_start', 'cell_id'
metadata: 'clone_id', 'clone_id_by_size', 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_abT_VDJ', 'd_call_abT_VDJ', 'j_call_abT_VDJ', 'v_call_abT_VJ', 'j_call_abT_VJ', 'c_call_abT_VDJ', 'c_call_abT_VJ', 'productive_abT_VDJ', 'productive_abT_VJ', 'umi_count_abT_VDJ', 'umi_count_abT_VJ', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'v_call_abT_VDJ_main', 'd_call_abT_VDJ_main', 'j_call_abT_VDJ_main', 'v_call_abT_VJ_main', 'j_call_abT_VJ_main', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'
[16]:
vdjx.metadata
[16]:
| clone_id | clone_id_by_size | locus_VDJ | locus_VJ | productive_VDJ | productive_VJ | v_call_VDJ | d_call_VDJ | j_call_VDJ | v_call_VJ | ... | d_call_abT_VDJ_main | j_call_abT_VDJ_main | v_call_abT_VJ_main | j_call_abT_VJ_main | isotype | isotype_status | locus_status | chain_status | rearrangement_status_VDJ | rearrangement_status_VJ | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| AAACCTGAGCGATAGC-1 | abT_VDJ_119_5_2_VJ_306_2_3 | 422 | TRB | TRA | True | True | TRBV6-5 | None | TRBJ2-3 | TRAV23/DV6 | ... | None | TRBJ2-3 | TRAV23/DV6 | TRAJ22 | None | None | TRB + TRA | Single pair | standard | standard |
| AAACCTGAGTCACGCC-1 | abT_VDJ_225_5_1_VJ_1416_1_1 | 5053 | TRB | TRA | True | True | TRBV6-2 | None | TRBJ2-6 | TRAV8-6 | ... | None | TRBJ2-6 | TRAV8-6 | TRAJ8 | None | None | TRB + TRA | Single pair | standard | standard |
| AAACCTGCACGTCAGC-1 | abT_VDJ_225_2_1_VJ_211_2_8 | 4081 | TRB | TRA | True | True | TRBV6-2 | None | TRBJ2-6 | TRAV1-2 | ... | None | TRBJ2-6 | TRAV1-2 | TRAJ33 | None | None | TRB + TRA | Single pair | standard | standard |
| AAACCTGGTCAATACC-1 | abT_VDJ_282_4_3_VJ_1376_2_1 | 4080 | TRB | TRA | True | True | TRBV12-4 | None | TRBJ2-7 | TRAV22 | ... | None | TRBJ2-7 | TRAV22 | TRAJ4 | None | None | TRB + TRA | Single pair | standard | standard |
| AAACCTGGTTCGGCAC-1 | abT_VDJ_214_4_8_VJ_1113_2_1|abT_VDJ_214_4_8_VJ... | 4078|4079 | TRB | TRA|TRA | True | True|True | TRBV20-1 | None | TRBJ1-1 | TRAV8-3|TRAV8-2 | ... | None | TRBJ1-1 | TRAV8-3 | TRAJ21 | None | Multi | TRB + Extra VJ | Extra pair | standard | standard |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| TTTGTCATCGCCGTGA-1 | abT_VDJ_382_1_1_VJ_723_1_1 | 2061 | TRB | TRA | True | True | TRBV12-4 | None | TRBJ2-3 | TRAV8-2 | ... | None | TRBJ2-3 | TRAV8-2 | TRAJ13 | None | None | TRB + TRA | Single pair | standard | standard |
| TTTGTCATCGTCTGAA-1 | abT_VDJ_338_3_1_VJ_592_2_1 | 2060 | TRB | TRA | True | True | TRBV7-8 | None | TRBJ1-3 | TRAV8-6 | ... | None | TRBJ1-3 | TRAV8-6 | TRAJ27 | None | None | TRB + TRA | Single pair | standard | standard |
| TTTGTCATCTACCAGA-1 | abT_VDJ_502_3_1_VJ_621_3_1 | 2059 | TRB | TRA | True | True | TRBV6-6 | TRBD1 | TRBJ2-3 | TRAV8-6 | ... | TRBD1 | TRBJ2-3 | TRAV8-6 | TRAJ48 | None | None | TRB + TRA | Single pair | standard | standard |
| TTTGTCATCTCTGAGA-1 | abT_VDJ_372_5_1_VJ_1420_1_1|abT_VDJ_58_3_1_VJ_... | 2057|2058 | TRB|TRB | TRA | True|True | True | TRBV7-2|TRBV11-2 | None | TRBJ1-2|TRBJ1-4 | TRAV3 | ... | None | TRBJ1-2 | TRAV3 | TRAJ22 | None|None | Multi | Extra VDJ + TRA | Extra pair | standard | standard |
| TTTGTCATCTTCGAGA-1 | abT_VDJ_359_1_5_VJ_1561_4_1 | 6080 | TRB | TRA | True | True | TRBV2 | None | TRBJ1-5 | TRAV12-1 | ... | None | TRBJ1-5 | TRAV12-1 | TRAJ9 | None | None | TRB + TRA | Single pair | standard | standard |
5333 rows × 46 columns
This time, find clones with scirpy’s method.
[17]:
ir.tl.chain_qc(irdata)
ir.pp.ir_dist(irdata)
ir.tl.define_clonotypes(irdata, receptor_arms="all", dual_ir="primary_only")
irdata
[17]:
AnnData object with n_obs × n_vars = 5333 × 36601
obs: 'receptor_type', 'receptor_subtype', 'chain_pairing', 'clone_id', 'clone_id_size'
var: 'gene_ids', 'feature_types', 'genome', 'pattern', 'read', 'sequence'
uns: 'chain_indices', 'ir_dist_nt_identity', 'clone_id'
obsm: 'airr', 'chain_indices'
Visualising with scirpy’s plotting tools
You can now also plot dandelion networks using scirpy’s functions.
[18]:
ddl.tl.generate_network(vdj, key="junction")
Setting up data: 10860it [00:00, 22983.30it/s]
Calculating distances : 100%|██████████| 6097/6097 [00:00<00:00, 14028.15it/s]
Aggregating distances : 100%|██████████| 4/4 [00:00<00:00, 19.92it/s]
Sorting into clusters : 100%|██████████| 6097/6097 [00:04<00:00, 1235.43it/s]
Calculating minimum spanning tree : 100%|██████████| 69/69 [00:00<00:00, 1144.94it/s]
Generating edge list : 100%|██████████| 69/69 [00:00<00:00, 3913.60it/s]
Computing overlap : 100%|██████████| 6097/6097 [00:04<00:00, 1220.27it/s]
Adjust overlap : 100%|██████████| 684/684 [00:00<00:00, 3583.29it/s]
Linking edges : 100%|██████████| 5245/5245 [00:00<00:00, 52144.47it/s]
Computing network layout
Computing expanded network layout
[19]:
irdata.obs["scirpy_clone_id"] = irdata.obs["clone_id"] # stash it
ddl.tl.transfer(
irdata, vdj, overwrite=True
) # overwrite scirpy's clone_id definition
/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.11/site-packages/pandas/core/arraylike.py:399: RuntimeWarning: overflow encountered in exp
[20]:
ir.tl.clonotype_network(irdata, min_cells=2)
ir.pl.clonotype_network(irdata, color="clone_id", panel_size=(7, 7))
[20]:
<Axes: >
to swap to a shorter clone_id name (ordered by size)
[21]:
ddl.tl.transfer(irdata, vdj, clone_key="clone_id_by_size")
ir.tl.clonotype_network(irdata, clonotype_key="clone_id_by_size", min_cells=2)
ir.pl.clonotype_network(irdata, color="clone_id_by_size", panel_size=(7, 7))
/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.11/site-packages/pandas/core/arraylike.py:399: RuntimeWarning: overflow encountered in exp
[21]:
<Axes: >
you can also collapse the networks to a single node and plot by size
[22]:
ddl.tl.transfer(irdata, vdj, clone_key="clone_id_by_size", collapse_nodes=True)
ir.tl.clonotype_network(irdata, clonotype_key="clone_id_by_size", min_cells=2)
ir.pl.clonotype_network(irdata, color="scirpy_clone_id", panel_size=(7, 7))
/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.11/site-packages/pandas/core/arraylike.py:399: RuntimeWarning: overflow encountered in exp
[22]:
<Axes: >
[ ]: