Interoperability with scirpy
It is now possible to convert the file formats between dandelion>=0.1.1 and scirpy>=0.6.2 [Sturm2020] to enhance the collaboration between the analysis toolkits.
Import dandelion module
[1]:
import os
import dandelion as ddl
# new function to setup tutorial data
from dandelion.tutorial import setup_dandelion_tutorial_tcr
setup_dandelion_tutorial_tcr()
# change to tutorial directory
os.chdir("dandelion_tutorial/")
ddl.logging.print_versions()
Downloading filtered_feature_bc_matrix.h5 → dandelion_tutorial/sc5p_v2_hs_PBMC_10k_t/filtered_feature_bc_matrix.h5
Downloading airr_rearrangement.tsv → dandelion_tutorial/sc5p_v2_hs_PBMC_10k_t/airr_rearrangement.tsv
Downloading filtered_contig_annotations.csv → dandelion_tutorial/sc5p_v2_hs_PBMC_10k_t/filtered_contig_annotations.csv
Downloading filtered_contig.fasta → dandelion_tutorial/sc5p_v2_hs_PBMC_10k_t/filtered_contig.fasta
Downloading filtered_feature_bc_matrix.h5 → dandelion_tutorial/sc5p_v1p1_hs_melanoma_10k_t/filtered_feature_bc_matrix.h5
Downloading airr_rearrangement.tsv → dandelion_tutorial/sc5p_v1p1_hs_melanoma_10k_t/airr_rearrangement.tsv
Downloading filtered_contig_annotations.csv → dandelion_tutorial/sc5p_v1p1_hs_melanoma_10k_t/filtered_contig_annotations.csv
Downloading filtered_contig.fasta → dandelion_tutorial/sc5p_v1p1_hs_melanoma_10k_t/filtered_contig.fasta
dandelion==1.0.0a1.dev36 pandas==2.3.3 numpy==2.3.5 matplotlib==3.10.6 networkx==3.6.1 scipy==1.15.2
[2]:
import scirpy as ir
import scanpy as sc
ir.__version__
[2]:
'0.22.5.dev29+g61dba21b4'
[3]:
# read in the airr_rearrangement.tsv file
file_location = "sc5p_v2_hs_PBMC_10k_t/airr_rearrangement.tsv"
# read in gene expression data
adata = sc.read_10x_h5("sc5p_v2_hs_PBMC_10k_t/filtered_feature_bc_matrix.h5")
adata.var_names_make_unique()
adata
/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.12/site-packages/anndata/_core/anndata.py:1798: UserWarning: Variable names are not unique. To make them unique, call `.var_names_make_unique`.
/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.12/site-packages/anndata/_core/anndata.py:1798: UserWarning: Variable names are not unique. To make them unique, call `.var_names_make_unique`.
[3]:
AnnData object with n_obs × n_vars = 10553 × 36601
var: 'gene_ids', 'feature_types', 'genome', 'pattern', 'read', 'sequence'
[4]:
vdj = ddl.read_10x_airr(file_location)
vdj
[4]:
Lazy Dandelion object with n_obs = 5351 and n_contigs = 10860
data: cell_id, sequence_id, sequence, sequence_aa, productive, rev_comp, v_call, v_cigar, d_call, d_cigar, j_call, j_cigar, c_call, c_cigar, sequence_alignment, germline_alignment, junction, junction_aa, junction_length, junction_aa_length, v_sequence_start, v_sequence_end, d_sequence_start, d_sequence_end, j_sequence_start, j_sequence_end, c_sequence_start, c_sequence_end, consensus_count, umi_count, is_cell, locus, rearrangement_status
metadata: cell_id, productive_VDJ, productive_VJ, v_call_VDJ, v_call_VJ, d_call_VDJ, j_call_VDJ, j_call_VJ, c_call_VDJ, c_call_VJ, junction_VDJ, junction_VJ, junction_aa_VDJ, junction_aa_VJ, locus_VDJ, locus_VJ, umi_count_VDJ, umi_count_VJ, productive_VDJ_main, productive_VJ_main, v_call_VDJ_main, v_call_VJ_main, d_call_VDJ_main, j_call_VDJ_main, j_call_VJ_main, c_call_VDJ_main, c_call_VJ_main, junction_VDJ_main, junction_VJ_main, junction_aa_VDJ_main, junction_aa_VJ_main, locus_VDJ_main, locus_VJ_main, umi_count_VDJ_main, umi_count_VJ_main, locus_status, chain_status, rearrangement_status_VDJ, rearrangement_status_VJ
The test file contains a blank clone_id column so we run find_clones to populate it first.
[5]:
ddl.tl.find_clones(vdj)
Using PyTorch backend with Apple Metal GPU
Finding clones based on abT cell VDJ chains using junction: 100%|██████████| 2099/2099 [00:01<00:00, 1059.86it/s]
Finding clones based on abT cell VJ chains using junction: 100%|██████████| 3588/3588 [00:02<00:00, 1787.70it/s]
Building distance matrix (batched): 100%|██████████| 57/57 [00:00<00:00, 4018.07it/s]
[5]:
Lazy Dandelion object with n_obs = 5351 and n_contigs = 10860
data: cell_id, sequence_id, sequence, sequence_aa, productive, rev_comp, v_call, v_cigar, d_call, d_cigar, j_call, j_cigar, c_call, c_cigar, sequence_alignment, germline_alignment, junction, junction_aa, junction_length, junction_aa_length, v_sequence_start, v_sequence_end, d_sequence_start, d_sequence_end, j_sequence_start, j_sequence_end, c_sequence_start, c_sequence_end, consensus_count, umi_count, is_cell, locus, rearrangement_status, clone_id
metadata: cell_id, clone_id, clone_id_rank, productive_VDJ, productive_VJ, v_call_VDJ, v_call_VJ, d_call_VDJ, j_call_VDJ, j_call_VJ, c_call_VDJ, c_call_VJ, junction_VDJ, junction_VJ, junction_aa_VDJ, junction_aa_VJ, locus_VDJ, locus_VJ, umi_count_VDJ, umi_count_VJ, productive_VDJ_main, productive_VJ_main, v_call_VDJ_main, v_call_VJ_main, d_call_VDJ_main, j_call_VDJ_main, j_call_VJ_main, c_call_VDJ_main, c_call_VJ_main, junction_VDJ_main, junction_VJ_main, junction_aa_VDJ_main, junction_aa_VJ_main, locus_VDJ_main, locus_VJ_main, umi_count_VDJ_main, umi_count_VJ_main, locus_status, chain_status, rearrangement_status_VDJ, rearrangement_status_VJ
distances: distance matrix of shape (5351, 5351)
[6]:
# we also generate the network so that this can be passed to scirpy
ddl.tl.generate_network(vdj, key="sequence_alignment")
vdj
OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
[6]:
Lazy Dandelion object with n_obs = 5351 and n_contigs = 10860
data: cell_id, sequence_id, sequence, sequence_aa, productive, rev_comp, v_call, v_cigar, d_call, d_cigar, j_call, j_cigar, c_call, c_cigar, sequence_alignment, germline_alignment, junction, junction_aa, junction_length, junction_aa_length, v_sequence_start, v_sequence_end, d_sequence_start, d_sequence_end, j_sequence_start, j_sequence_end, c_sequence_start, c_sequence_end, consensus_count, umi_count, is_cell, locus, rearrangement_status, clone_id
metadata: cell_id, clone_id, clone_id_rank, productive_VDJ, productive_VJ, v_call_VDJ, v_call_VJ, d_call_VDJ, j_call_VDJ, j_call_VJ, c_call_VDJ, c_call_VJ, junction_VDJ, junction_VJ, junction_aa_VDJ, junction_aa_VJ, locus_VDJ, locus_VJ, umi_count_VDJ, umi_count_VJ, productive_VDJ_main, productive_VJ_main, v_call_VDJ_main, v_call_VJ_main, d_call_VDJ_main, j_call_VDJ_main, j_call_VJ_main, c_call_VDJ_main, c_call_VJ_main, junction_VDJ_main, junction_VJ_main, junction_aa_VDJ_main, junction_aa_VJ_main, locus_VDJ_main, locus_VJ_main, umi_count_VDJ_main, umi_count_VJ_main, locus_status, chain_status, rearrangement_status_VDJ, rearrangement_status_VJ
layout: layout for 5351 vertices, layout for 195 vertices
graph: networkx graph of 5351 vertices, networkx graph of 195 vertices
distances: distance matrix of shape (5351, 5351)
ddl.to_scirpy : Converting dandelion to scirpy
The Dandelion object can be converted to an object compatible with scirpy using the ddl.to_scirpy function. This is even when there is no original gene expression data available.
[7]:
# conversion to default scirpy MuData object
irdata = ddl.tl.to_scirpy(vdj)
irdata
/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.12/site-packages/anndata/utils.py:362: ExperimentalFeatureWarning: Support for Awkward Arrays is currently experimental. Behavior may change in the future. Please report any issues you may encounter!
/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.12/site-packages/mudata/_core/mudata.py:1598: FutureWarning: From 0.4 .update() will not pull obs/var columns from individual modalities by default anymore. Set mudata.set_options(pull_on_update=False) to adopt the new behaviour, which will become the default. Use new pull_obs/pull_var and push_obs/push_var methods for more flexibility.
/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.12/site-packages/mudata/_core/mudata.py:1461: FutureWarning: From 0.4 .update() will not pull obs/var columns from individual modalities by default anymore. Set mudata.set_options(pull_on_update=False) to adopt the new behaviour, which will become the default. Use new pull_obs/pull_var and push_obs/push_var methods for more flexibility.
[7]:
MuData object with n_obs × n_vars = 5351 × 0
1 modality
airr: 5351 x 0
obsm: 'airr'Conversion to AnnData in scirpy format is also available
[8]:
converted_adata = ddl.tl.to_scirpy(vdj, to_mudata=False)
converted_adata
[8]:
AnnData object with n_obs × n_vars = 5351 × 0
obsm: 'airr'
If you have gene expression data, the parameter gex_adata supports the gene expression data in AnnData format.
Please note that this will slice to the same cell_id that are present in the same in the AIRR data.
[9]:
irdata = ddl.tl.to_scirpy(vdj, to_mudata=False, gex_adata=adata)
irdata
[9]:
AnnData object with n_obs × n_vars = 5333 × 36601
var: 'gene_ids', 'feature_types', 'genome', 'pattern', 'read', 'sequence'
uns: 'clone_id'
obsm: 'airr'
[10]:
mudata = ddl.tl.to_scirpy(vdj, to_mudata=True, gex_adata=adata)
mudata
/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.12/site-packages/mudata/_core/mudata.py:1598: FutureWarning: From 0.4 .update() will not pull obs/var columns from individual modalities by default anymore. Set mudata.set_options(pull_on_update=False) to adopt the new behaviour, which will become the default. Use new pull_obs/pull_var and push_obs/push_var methods for more flexibility.
/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.12/site-packages/mudata/_core/mudata.py:1461: FutureWarning: From 0.4 .update() will not pull obs/var columns from individual modalities by default anymore. Set mudata.set_options(pull_on_update=False) to adopt the new behaviour, which will become the default. Use new pull_obs/pull_var and push_obs/push_var methods for more flexibility.
[10]:
MuData object with n_obs × n_vars = 10553 × 36601
2 modalities
gex: 10553 x 36601
var: 'gene_ids', 'feature_types', 'genome', 'pattern', 'read', 'sequence'
airr: 5333 x 0
uns: 'clone_id'
obsm: 'airr'Use scirpy’s get functions to retrieve the relevant airr info (https://scirpy.scverse.org/en/latest/generated/scirpy.get.airr.html)
[11]:
ir.get.airr(irdata, "clone_id")
WARNING: No chain indices found under adata.obsm['chain_indices']. Running scirpy.pp.index_chains with default parameters.
[11]:
| VJ_1_clone_id | VDJ_1_clone_id | VJ_2_clone_id | VDJ_2_clone_id | |
|---|---|---|---|---|
| AAACCTGAGCGATAGC-1 | abT_VDJ_412_5_1_VJ_675_2_1 | abT_VDJ_412_5_1_VJ_675_2_1 | None | None |
| AAACCTGAGTCACGCC-1 | abT_VDJ_393_5_1_VJ_1526_1_2 | abT_VDJ_393_5_1_VJ_1526_1_2 | None | None |
| AAACCTGCACGTCAGC-1 | abT_VDJ_393_2_4_VJ_44_2_1 | abT_VDJ_393_2_4_VJ_44_2_1 | None | None |
| AAACCTGGTCAATACC-1 | abT_VDJ_87_3_1_VJ_653_2_1 | abT_VDJ_87_3_1_VJ_653_2_1 | None | None |
| AAACCTGGTTCGGCAC-1 | abT_VDJ_173_4_1_VJ_1404_2_1|abT_VDJ_173_4_1_VJ... | abT_VDJ_173_4_1_VJ_1404_2_1|abT_VDJ_173_4_1_VJ... | abT_VDJ_173_4_1_VJ_1404_2_1|abT_VDJ_173_4_1_VJ... | None |
| ... | ... | ... | ... | ... |
| TTTGTCATCGCCGTGA-1 | abT_VDJ_83_1_1_VJ_1366_1_1 | abT_VDJ_83_1_1_VJ_1366_1_1 | None | None |
| TTTGTCATCGTCTGAA-1 | abT_VDJ_479_2_1_VJ_1498_2_1 | abT_VDJ_479_2_1_VJ_1498_2_1 | None | None |
| TTTGTCATCTACCAGA-1 | abT_VDJ_425_3_1_VJ_1516_2_1 | abT_VDJ_425_3_1_VJ_1516_2_1 | None | None |
| TTTGTCATCTCTGAGA-1 | abT_VDJ_432_5_1_VJ_959_1_1|abT_VDJ_45_2_1_VJ_9... | abT_VDJ_432_5_1_VJ_959_1_1|abT_VDJ_45_2_1_VJ_9... | None | abT_VDJ_432_5_1_VJ_959_1_1|abT_VDJ_45_2_1_VJ_9... |
| TTTGTCATCTTCGAGA-1 | abT_VDJ_164_1_2_VJ_137_4_1 | abT_VDJ_164_1_2_VJ_137_4_1 | None | None |
5333 rows × 4 columns
[12]:
ir.get.airr(mudata, "clone_id")
WARNING: No chain indices found under adata.obsm['chain_indices']. Running scirpy.pp.index_chains with default parameters.
[12]:
| VJ_1_clone_id | VDJ_1_clone_id | VJ_2_clone_id | VDJ_2_clone_id | |
|---|---|---|---|---|
| cell_id | ||||
| AAACCTGAGCGATAGC-1 | abT_VDJ_412_5_1_VJ_675_2_1 | abT_VDJ_412_5_1_VJ_675_2_1 | None | None |
| AAACCTGAGTCACGCC-1 | abT_VDJ_393_5_1_VJ_1526_1_2 | abT_VDJ_393_5_1_VJ_1526_1_2 | None | None |
| AAACCTGCACGTCAGC-1 | abT_VDJ_393_2_4_VJ_44_2_1 | abT_VDJ_393_2_4_VJ_44_2_1 | None | None |
| AAACCTGGTCAATACC-1 | abT_VDJ_87_3_1_VJ_653_2_1 | abT_VDJ_87_3_1_VJ_653_2_1 | None | None |
| AAACCTGGTTCGGCAC-1 | abT_VDJ_173_4_1_VJ_1404_2_1|abT_VDJ_173_4_1_VJ... | abT_VDJ_173_4_1_VJ_1404_2_1|abT_VDJ_173_4_1_VJ... | abT_VDJ_173_4_1_VJ_1404_2_1|abT_VDJ_173_4_1_VJ... | None |
| ... | ... | ... | ... | ... |
| TTTGTCATCGCCGTGA-1 | abT_VDJ_83_1_1_VJ_1366_1_1 | abT_VDJ_83_1_1_VJ_1366_1_1 | None | None |
| TTTGTCATCGTCTGAA-1 | abT_VDJ_479_2_1_VJ_1498_2_1 | abT_VDJ_479_2_1_VJ_1498_2_1 | None | None |
| TTTGTCATCTACCAGA-1 | abT_VDJ_425_3_1_VJ_1516_2_1 | abT_VDJ_425_3_1_VJ_1516_2_1 | None | None |
| TTTGTCATCTCTGAGA-1 | abT_VDJ_432_5_1_VJ_959_1_1|abT_VDJ_45_2_1_VJ_9... | abT_VDJ_432_5_1_VJ_959_1_1|abT_VDJ_45_2_1_VJ_9... | None | abT_VDJ_432_5_1_VJ_959_1_1|abT_VDJ_45_2_1_VJ_9... |
| TTTGTCATCTTCGAGA-1 | abT_VDJ_164_1_2_VJ_137_4_1 | abT_VDJ_164_1_2_VJ_137_4_1 | None | None |
5333 rows × 4 columns
Or you can add transfer = True, which will perform dandelion’s tl.transfer.
[13]:
irdatax = ddl.tl.to_scirpy(vdj, transfer=True)
irdatax
/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.12/site-packages/mudata/_core/mudata.py:1598: FutureWarning: From 0.4 .update() will not pull obs/var columns from individual modalities by default anymore. Set mudata.set_options(pull_on_update=False) to adopt the new behaviour, which will become the default. Use new pull_obs/pull_var and push_obs/push_var methods for more flexibility.
/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.12/site-packages/mudata/_core/mudata.py:1461: FutureWarning: From 0.4 .update() will not pull obs/var columns from individual modalities by default anymore. Set mudata.set_options(pull_on_update=False) to adopt the new behaviour, which will become the default. Use new pull_obs/pull_var and push_obs/push_var methods for more flexibility.
[13]:
MuData object with n_obs × n_vars = 5351 × 0
1 modality
airr: 5351 x 0
obs: 'clone_id', 'clone_id_rank', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'v_call_VJ', 'd_call_VDJ', 'j_call_VDJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'locus_VDJ', 'locus_VJ', 'umi_count_VDJ', 'umi_count_VJ', 'productive_VDJ_main', 'productive_VJ_main', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'junction_VDJ_main', 'junction_VJ_main', 'junction_aa_VDJ_main', 'junction_aa_VJ_main', 'locus_VDJ_main', 'locus_VJ_main', 'umi_count_VDJ_main', 'umi_count_VJ_main', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'
uns: 'dandelion', 'neighbors', 'clone_id'
obsm: 'airr', 'X_vdj'
obsp: 'connectivities', 'distances'[14]:
irdatax = ddl.tl.to_scirpy(vdj, transfer=True, to_mudata=False)
irdatax
[14]:
AnnData object with n_obs × n_vars = 5351 × 0
obs: 'clone_id', 'clone_id_rank', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'v_call_VJ', 'd_call_VDJ', 'j_call_VDJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'locus_VDJ', 'locus_VJ', 'umi_count_VDJ', 'umi_count_VJ', 'productive_VDJ_main', 'productive_VJ_main', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'junction_VDJ_main', 'junction_VJ_main', 'junction_aa_VDJ_main', 'junction_aa_VJ_main', 'locus_VDJ_main', 'locus_VJ_main', 'umi_count_VDJ_main', 'umi_count_VJ_main', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'
uns: 'dandelion', 'neighbors', 'clone_id'
obsm: 'airr', 'X_vdj'
obsp: 'connectivities', 'distances'
ddl.from_scirpy : Converting scirpy to dandelion
Converting MuData back to Dandelion
[15]:
vdjx = ddl.tl.from_scirpy(mudata)
vdjx
/Users/uqztuong/Documents/GitHub/dandelion/src/dandelion/polars/core/_core.py:2487: PerformanceWarning: Determining the column names of a LazyFrame requires resolving its schema, which is a potentially expensive operation. Use `LazyFrame.collect_schema().names()` to get the column names without this warning.
/Users/uqztuong/Documents/GitHub/dandelion/src/dandelion/polars/core/_core.py:2488: PerformanceWarning: Determining the column names of a LazyFrame requires resolving its schema, which is a potentially expensive operation. Use `LazyFrame.collect_schema().names()` to get the column names without this warning.
[15]:
Lazy Dandelion object with n_obs = 5333 and n_contigs = 10836
data: c_call, c_cigar, c_sequence_end, c_sequence_start, clone_id, consensus_count, d_call, d_cigar, d_sequence_end, d_sequence_start, germline_alignment, is_cell, j_call, j_cigar, j_sequence_end, j_sequence_start, junction, junction_aa, junction_aa_length, junction_length, locus, productive, rearrangement_status, rev_comp, sequence, sequence_aa, sequence_alignment, sequence_id, umi_count, v_call, v_cigar, v_sequence_end, v_sequence_start, cell_id, clone_id_rank, c_call_VDJ, c_call_VJ, d_call_VDJ, j_call_VDJ, j_call_VJ, v_call_VDJ, v_call_VJ, umi_count_VDJ, umi_count_VJ, c_call_VDJ_main, c_call_VJ_main, d_call_VDJ_main, j_call_VDJ_main, j_call_VJ_main, v_call_VDJ_main, v_call_VJ_main, umi_count_VDJ_main, umi_count_VJ_main, chain_status, rearrangement_status_VDJ, rearrangement_status_VJ
metadata: cell_id, clone_id, clone_id_rank, c_call_VDJ, c_call_VJ, d_call_VDJ, j_call_VDJ, j_call_VJ, junction_VDJ, junction_VJ, junction_aa_VDJ, junction_aa_VJ, locus_VDJ, locus_VJ, v_call_VDJ, v_call_VJ, umi_count_VDJ, umi_count_VJ, c_call_VDJ_main, c_call_VJ_main, d_call_VDJ_main, j_call_VDJ_main, j_call_VJ_main, junction_VDJ_main, junction_VJ_main, junction_aa_VDJ_main, junction_aa_VJ_main, locus_VDJ_main, locus_VJ_main, v_call_VDJ_main, v_call_VJ_main, umi_count_VDJ_main, umi_count_VJ_main, locus_status, chain_status, rearrangement_status_VDJ, rearrangement_status_VJ
graph: networkx graph of 5333 vertices
Converting AnnData back to Dandelion
[16]:
vdjx = ddl.tl.from_scirpy(irdata)
vdjx
/Users/uqztuong/Documents/GitHub/dandelion/src/dandelion/polars/core/_core.py:2487: PerformanceWarning: Determining the column names of a LazyFrame requires resolving its schema, which is a potentially expensive operation. Use `LazyFrame.collect_schema().names()` to get the column names without this warning.
/Users/uqztuong/Documents/GitHub/dandelion/src/dandelion/polars/core/_core.py:2488: PerformanceWarning: Determining the column names of a LazyFrame requires resolving its schema, which is a potentially expensive operation. Use `LazyFrame.collect_schema().names()` to get the column names without this warning.
[16]:
Lazy Dandelion object with n_obs = 5333 and n_contigs = 10836
data: c_call, c_cigar, c_sequence_end, c_sequence_start, clone_id, consensus_count, d_call, d_cigar, d_sequence_end, d_sequence_start, germline_alignment, is_cell, j_call, j_cigar, j_sequence_end, j_sequence_start, junction, junction_aa, junction_aa_length, junction_length, locus, productive, rearrangement_status, rev_comp, sequence, sequence_aa, sequence_alignment, sequence_id, umi_count, v_call, v_cigar, v_sequence_end, v_sequence_start, cell_id, clone_id_rank, c_call_VDJ, c_call_VJ, d_call_VDJ, j_call_VDJ, j_call_VJ, v_call_VDJ, v_call_VJ, umi_count_VDJ, umi_count_VJ, c_call_VDJ_main, c_call_VJ_main, d_call_VDJ_main, j_call_VDJ_main, j_call_VJ_main, v_call_VDJ_main, v_call_VJ_main, umi_count_VDJ_main, umi_count_VJ_main, chain_status, rearrangement_status_VDJ, rearrangement_status_VJ
metadata: cell_id, clone_id, clone_id_rank, c_call_VDJ, c_call_VJ, d_call_VDJ, j_call_VDJ, j_call_VJ, junction_VDJ, junction_VJ, junction_aa_VDJ, junction_aa_VJ, locus_VDJ, locus_VJ, v_call_VDJ, v_call_VJ, umi_count_VDJ, umi_count_VJ, c_call_VDJ_main, c_call_VJ_main, d_call_VDJ_main, j_call_VDJ_main, j_call_VJ_main, junction_VDJ_main, junction_VJ_main, junction_aa_VDJ_main, junction_aa_VJ_main, locus_VDJ_main, locus_VJ_main, v_call_VDJ_main, v_call_VJ_main, umi_count_VDJ_main, umi_count_VJ_main, locus_status, chain_status, rearrangement_status_VDJ, rearrangement_status_VJ
graph: networkx graph of 10553 vertices
[17]:
vdjx.metadata
[17]:
This time, find clones with scirpy’s method.
[18]:
ir.tl.chain_qc(irdata)
ir.pp.ir_dist(irdata)
ir.tl.define_clonotypes(irdata, receptor_arms="all", dual_ir="primary_only")
irdata
[18]:
AnnData object with n_obs × n_vars = 5333 × 36601
obs: 'receptor_type', 'receptor_subtype', 'chain_pairing', 'clone_id', 'clone_id_size'
var: 'gene_ids', 'feature_types', 'genome', 'pattern', 'read', 'sequence'
uns: 'clone_id', 'chain_indices', 'ir_dist_nt_identity'
obsm: 'airr', 'chain_indices'
Visualising with scirpy’s plotting tools
You can now also plot dandelion networks using scirpy’s functions.
[19]:
irdata = ddl.tl.to_scirpy(vdj, to_mudata=True, gex_adata=adata)
irdata
/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.12/site-packages/mudata/_core/mudata.py:1598: FutureWarning: From 0.4 .update() will not pull obs/var columns from individual modalities by default anymore. Set mudata.set_options(pull_on_update=False) to adopt the new behaviour, which will become the default. Use new pull_obs/pull_var and push_obs/push_var methods for more flexibility.
/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.12/site-packages/mudata/_core/mudata.py:1461: FutureWarning: From 0.4 .update() will not pull obs/var columns from individual modalities by default anymore. Set mudata.set_options(pull_on_update=False) to adopt the new behaviour, which will become the default. Use new pull_obs/pull_var and push_obs/push_var methods for more flexibility.
[19]:
MuData object with n_obs × n_vars = 10553 × 36601
2 modalities
gex: 10553 x 36601
var: 'gene_ids', 'feature_types', 'genome', 'pattern', 'read', 'sequence'
airr: 5333 x 0
uns: 'clone_id'
obsm: 'airr'[20]:
ddl.tl.transfer(irdata, vdj, overwrite=True)
irdata
[20]:
MuData object with n_obs × n_vars = 10553 × 36601
2 modalities
gex: 10553 x 36601
var: 'gene_ids', 'feature_types', 'genome', 'pattern', 'read', 'sequence'
airr: 5333 x 0
obs: 'clone_id', 'clone_id_rank', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'v_call_VJ', 'd_call_VDJ', 'j_call_VDJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'locus_VDJ', 'locus_VJ', 'umi_count_VDJ', 'umi_count_VJ', 'productive_VDJ_main', 'productive_VJ_main', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'junction_VDJ_main', 'junction_VJ_main', 'junction_aa_VDJ_main', 'junction_aa_VJ_main', 'locus_VDJ_main', 'locus_VJ_main', 'umi_count_VDJ_main', 'umi_count_VJ_main', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'
uns: 'clone_id', 'dandelion', 'neighbors'
obsm: 'airr', 'X_vdj'
obsp: 'connectivities', 'distances'[21]:
ddl.tl.transfer(
irdata, vdj, overwrite=True
) # overwrite scirpy's clone_id definition
[22]:
ir.tl.clonotype_network(irdata, min_cells=2)
ir.pl.clonotype_network(
irdata, color="clone_id", panel_size=(7, 7), show_labels=False
) # hiding labels because they clutter the plot
/Users/uqztuong/Documents/GitHub/scirpy/src/scirpy/pl/_clonotypes.py:179: FutureWarning: Use obsm (e.g. `k in adata.obsm` or `adata.obsm.keys() | {'u'}`) instead of AnnData.obsm_keys, AnnData.obsm_keys is deprecated and will be removed in the future.
[22]:
<Axes: >
We can also use a shorter clone_id name ordered by size, called clone_id_rank.
[23]:
ddl.tl.transfer(irdata, vdj, clone_key="clone_id_rank")
ir.tl.clonotype_network(irdata, clonotype_key="clone_id_rank", min_cells=2)
ir.pl.clonotype_network(irdata, color="clone_id_rank", panel_size=(7, 7))
/Users/uqztuong/Documents/GitHub/scirpy/src/scirpy/pl/_clonotypes.py:179: FutureWarning: Use obsm (e.g. `k in adata.obsm` or `adata.obsm.keys() | {'u'}`) instead of AnnData.obsm_keys, AnnData.obsm_keys is deprecated and will be removed in the future.
[23]:
<Axes: >
you can also collapse the networks to a single node and plot by size
[24]:
ddl.tl.transfer(irdata, vdj, clone_key="clone_id_rank", collapse_nodes=True)
ir.tl.clonotype_network(irdata, clonotype_key="clone_id_rank", min_cells=2)
ir.pl.clonotype_network(irdata, color="clone_id_rank", panel_size=(7, 7))
/Users/uqztuong/Documents/GitHub/scirpy/src/scirpy/pl/_clonotypes.py:179: FutureWarning: Use obsm (e.g. `k in adata.obsm` or `adata.obsm.keys() | {'u'}`) instead of AnnData.obsm_keys, AnnData.obsm_keys is deprecated and will be removed in the future.
[24]:
<Axes: >
Using ddl.pl.clone_network on MuData with scirpy data
Now that we have created a MuData object with scirpy data, we can also use dandelion’s plotting function to visualise the clone networks. The way to access the column in the obs of the MuData object is to prefix the observation key with the modality name, for example gex:{column name}. If the columns are in the VDJ modality, you can use vdj:{column name}. Or if the column is in the shared obs i.e. MuData.obs, you can use just the column name.
[25]:
# create random 3 categorical variable in gex obs for testing
import numpy as np
import pandas as pd
# create random 3 categorical variable in gex obs for testing
irdata.mod["gex"].obs["new_category"] = pd.Categorical(
np.random.choice(["A", "B", "C"], size=irdata.mod["gex"].n_obs)
)
# also create in the shared obs
irdata.obs["shared_category"] = pd.Categorical(
np.random.choice(["D", "E", "F"], size=irdata.n_obs)
)
irdata
[25]:
MuData object with n_obs × n_vars = 10553 × 36601
obs: 'shared_category'
2 modalities
gex: 10553 x 36601
obs: 'new_category'
var: 'gene_ids', 'feature_types', 'genome', 'pattern', 'read', 'sequence'
airr: 5333 x 0
obs: 'clone_id', 'clone_id_rank', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'v_call_VJ', 'd_call_VDJ', 'j_call_VDJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'locus_VDJ', 'locus_VJ', 'umi_count_VDJ', 'umi_count_VJ', 'productive_VDJ_main', 'productive_VJ_main', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'junction_VDJ_main', 'junction_VJ_main', 'junction_aa_VDJ_main', 'junction_aa_VJ_main', 'locus_VDJ_main', 'locus_VJ_main', 'umi_count_VDJ_main', 'umi_count_VJ_main', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'
uns: 'clone_id', 'dandelion', 'neighbors', 'gex_neighbors', 'clonotype_network', 'clone_id_colors', 'clone_id_rank', 'clone_id_rank_colors'
obsm: 'airr', 'X_vdj', 'X_clonotype_network'
obsp: 'connectivities', 'distances'[26]:
sc.set_figure_params(figsize=(6, 6))
ddl.pl.clone_network(irdata, color="gex:new_category")
ddl.pl.clone_network(irdata, color="shared_category")