{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Calculating diversity and mutation\n", "\n", "## Calculating mutational load\n", "To calculate mutational load, the functions from `immcantation` suite's `shazam` [[Gupta2015]](https://academic.oup.com/bioinformatics/article/31/20/3356/195677) can be accessed via `rpy2` to work with the `dandelion` class object.\n", "\n", "This can be run immediately after `pp.reassign_alleles` during the reannotation pre-processing stage because the required germline columns should be present in the genotyped `.tsv` file. I would recommend to run this after TIgGER [[Gadala-Maria2015]](https://www.pnas.org/content/112/8/E862), after the v_calls were corrected. Otherwise, if the reannotation was skipped, you can run it now as follows:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Import modules" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "scrolled": true }, "outputs": [], "source": [ "import os\n", "import dandelion as ddl\n", "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "import seaborn as sns\n", "import scanpy as sc\n", "\n", "sc.settings.verbosity = 3\n", "\n", "ddl.set_backend(\"base\")\n", "\n", "# change to tutorials directory\n", "os.chdir(\"dandelion_tutorial\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Read in the previously saved files" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "AnnData object with n_obs × n_vars = 25057 × 1308\n", " obs: 'sample_id', 'n_genes', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'gmm_pct_count_clusters_keep', 'scrublet_score', 'is_doublet', 'filter_rna', 'has_contig', 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'umi_count_B_VDJ', 'umi_count_B_VJ', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'v_call_B_VDJ_main', 'd_call_B_VDJ_main', 'j_call_B_VDJ_main', 'v_call_B_VJ_main', 'j_call_B_VJ_main', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ', 'leiden', 'clone_id', 'clone_id_rank'\n", " var: 'n_cells', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'mean', 'std'\n", " uns: 'chain_status_colors', 'clone_id', 'dandelion', 'gex_neighbors', 'hvg', 'isotype_status_colors', 'leiden', 'leiden_colors', 'locus_status_colors', 'log1p', 'neighbors', 'pca', 'sample_id_colors', 'umap'\n", " obsm: 'X_pca', 'X_umap', 'X_vdj'\n", " varm: 'PCs'\n", " obsp: 'connectivities', 'distances'" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "adata = sc.read_h5ad(\"adata.h5ad\")\n", "adata" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
| \n", " | sequence_id | \n", "sequence | \n", "rev_comp | \n", "productive | \n", "v_call | \n", "d_call | \n", "j_call | \n", "sequence_alignment | \n", "germline_alignment | \n", "junction | \n", "... | \n", "j_call_multimappers | \n", "j_call_multiplicity | \n", "j_call_sequence_start_multimappers | \n", "j_call_sequence_end_multimappers | \n", "j_call_support_multimappers | \n", "mu_count | \n", "ambiguous | \n", "extra | \n", "rearrangement_status | \n", "clone_id | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| sequence_id | \n", "\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
| sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG_contig_1 | \n", "sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG_contig_1 | \n", "TGGGGAGGAGTCAGTCCCAACCAGGACACGGCCTGGACATGAGGGT... | \n", "F | \n", "T | \n", "IGKV1-33*01,IGKV1D-33*01 | \n", "\n", " | IGKJ4*01 | \n", "GACATCCAGATGACCCAGTCTCCATCCTCCCTGTCTGCATCTGTGG... | \n", "GACATCCAGATGACCCAGTCTCCATCCTCCCTGTCTGCATCTGTAG... | \n", "TGTCAACAATATGACGAACTTCCCGTCACTTTC | \n", "... | \n", "[\"IGKJ4*01\"] | \n", "1 | \n", "[385] | \n", "[412] | \n", "[3.56e-09] | \n", "27 | \n", "F | \n", "F | \n", "standard | \n", "B_VJ_95_2_2 | \n", "
| sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_contig_2 | \n", "sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_contig_2 | \n", "ATCACATAACAACCACATTCCTCCTCTAAAGAAGCCCCTGGGAGCA... | \n", "F | \n", "T | \n", "IGHV1-69*01,IGHV1-69D*01 | \n", "IGHD3-22*01 | \n", "IGHJ3*02 | \n", "CAGGTGCAGCTGGTGCAGTCTGGGGCT...GAGGTGAAGAAGCCTG... | \n", "CAGGTGCAGCTGGTGCAGTCTGGGGCT...GAGGTGAAGAAGCCTG... | \n", "TGTGCGACTACGTATTACTATGATAGTAGTGGTTATTACCAGAATG... | \n", "... | \n", "[\"IGHJ3*02\"] | \n", "1 | \n", "[445] | \n", "[494] | \n", "[4.5799999999999995e-23] | \n", "0 | \n", "F | \n", "F | \n", "standard | \n", "B_VDJ_21_3_1_VJ_49_2_1 | \n", "
| sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_contig_1 | \n", "sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_contig_1 | \n", "AGGAGTCAGACCCTGTCAGGACACAGCATAGACATGAGGGTCCCCG... | \n", "F | \n", "T | \n", "IGKV1-8*01 | \n", "\n", " | IGKJ1*01 | \n", "GCCATCCGGATGACCCAGTCTCCATCCTCATTCTCTGCATCTACAG... | \n", "GCCATCCGGATGACCCAGTCTCCATCCTCATTCTCTGCATCTACAG... | \n", "TGTCAACAGTATTATAGTTACCCTCGGACGTTC | \n", "... | \n", "[\"IGKJ1*01\"] | \n", "1 | \n", "[380] | \n", "[415] | \n", "[2.7e-15] | \n", "0 | \n", "F | \n", "F | \n", "standard | \n", "B_VDJ_21_3_1_VJ_49_2_1 | \n", "
| sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_contig_1 | \n", "sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_contig_1 | \n", "ACTGTGGGGGTAAGAGGTTGTGTCCACCATGGCCTGGACTCCTCTC... | \n", "F | \n", "T | \n", "IGLV5-45*02 | \n", "\n", " | IGLJ3*02 | \n", "CAGGCTGTGCTGACTCAGCCGTCTTCC...CTCTCTGCATCTCCTG... | \n", "CAGGCTGTGCTGACTCAGCCGTCTTCC...CTCTCTGCATCTCCTG... | \n", "TGTATGATTTGGCACAGCAGCGCTTGGGTGGTC | \n", "... | \n", "[\"IGLJ3*01\"] | \n", "1 | \n", "[402] | \n", "[431] | \n", "[6.84e-12] | \n", "8 | \n", "F | \n", "F | \n", "standard | \n", "B_VDJ_73_1_2_VJ_184_1_1 | \n", "
| sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_contig_2 | \n", "sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_contig_2 | \n", "GGGAGCATCACCCAGCAACCACATCTGTCCTCTAGAGAATCCCCTG... | \n", "F | \n", "T | \n", "IGHV1-2*02 | \n", "\n", " | IGHJ3*02 | \n", "CAGGTGCAACTGGTGCAGTCTGGGGGT...GAGGTAAAGAAGCCTG... | \n", "CAGGTGCAGCTGGTGCAGTCTGGGGCT...GAGGTGAAGAAGCCTG... | \n", "TGTGCGAGAGAGATAGAGGGGGACGGTGTTTTTGAAATCTGG | \n", "... | \n", "[\"IGHJ3*02\"] | \n", "1 | \n", "[433] | \n", "[479] | \n", "[4.48e-18] | \n", "22 | \n", "F | \n", "F | \n", "standard | \n", "B_VDJ_73_1_2_VJ_184_1_1 | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
| sc5p_v2_hs_PBMC_1k_b_TTCCCAGAGTACATGA_contig_1 | \n", "sc5p_v2_hs_PBMC_1k_b_TTCCCAGAGTACATGA_contig_1 | \n", "AGGAATCAGACCCAGTCAGGACACAGCATGGACATGAGAGTCCTCG... | \n", "F | \n", "T | \n", "IGKV1-16*01 | \n", "\n", " | IGKJ2*01,IGKJ2*02 | \n", "GACATCCAGATGACCCAGTCTCCATCCTCACTGTCTGCATCTGTGG... | \n", "GACATCCAGATGACCCAGTCTCCATCCTCACTGTCTGCATCTGTAG... | \n", "TGCCAACAATACATTACTGACCCGTTCACTTTT | \n", "... | \n", "[\"IGKJ2*02\"] | \n", "1 | \n", "[378] | \n", "[412] | \n", "[4.56e-08] | \n", "20 | \n", "F | \n", "F | \n", "standard | \n", "B_VDJ_135_10_13_VJ_191_1_4 | \n", "
| sc5p_v2_hs_PBMC_1k_b_TTGAACGCAGGCTGAA_contig_1 | \n", "sc5p_v2_hs_PBMC_1k_b_TTGAACGCAGGCTGAA_contig_1 | \n", "AGGAGTCAGACCCTGTCAGGACACAGCATAGACATGAGGGTCCCCG... | \n", "F | \n", "T | \n", "IGKV1-8*01 | \n", "\n", " | IGKJ1*01 | \n", "GCCATCCGGATGACCCAGTCTCCATCCTCATTCTCTGCATCTACAG... | \n", "GCCATCCGGATGACCCAGTCTCCATCCTCATTCTCTGCATCTACAG... | \n", "TGTCAACAGTATTATAGTTACCCGTGGACGTTC | \n", "... | \n", "[\"IGKJ1*01\"] | \n", "1 | \n", "[378] | \n", "[415] | \n", "[2.09e-16] | \n", "0 | \n", "F | \n", "F | \n", "standard | \n", "B_VDJ_127_11_1_VJ_49_2_1 | \n", "
| sc5p_v2_hs_PBMC_1k_b_TTGAACGCAGGCTGAA_contig_2 | \n", "sc5p_v2_hs_PBMC_1k_b_TTGAACGCAGGCTGAA_contig_2 | \n", "CGAGCCCAGCACTGGAAGTCGCCGGTGTTTCCATTCGGTGATCATC... | \n", "F | \n", "T | \n", "IGHV3-30-3*01 | \n", "IGHD3-9*01 | \n", "IGHJ4*02 | \n", "CAGGTGCAGCTGGTGGAGTCTGGGGGA...GGCGTGGTCCAGCCTG... | \n", "CAGGTGCAGCTGGTGGAGTCTGGGGGA...GGCGTGGTCCAGCCTG... | \n", "TGTGCGAGAGATGAGTTAGATATTTTGACTGGTTACAATATCCCAA... | \n", "... | \n", "[\"IGHJ4*02\"] | \n", "1 | \n", "[469] | \n", "[509] | \n", "[2.2e-16] | \n", "0 | \n", "F | \n", "F | \n", "standard | \n", "B_VDJ_127_11_1_VJ_49_2_1 | \n", "
| sc5p_v2_hs_PBMC_1k_b_TTGCCGTAGAATGTGT_contig_1 | \n", "sc5p_v2_hs_PBMC_1k_b_TTGCCGTAGAATGTGT_contig_1 | \n", "GAGCTACAACAGGCAGGCAGGGGCAGCAAGATGGTGTTGCAGACCC... | \n", "F | \n", "T | \n", "IGKV4-1*01 | \n", "\n", " | IGKJ2*01 | \n", "GACATCGTGATGACCCAGTCTCCAGACTCCCTGGCTGTGTCTCTGG... | \n", "GACATCGTGATGACCCAGTCTCCAGACTCCCTGGCTGTGTCTCTGG... | \n", "TGTCAGCAATATTATAGTACTCCGTACACTTTT | \n", "... | \n", "[\"IGKJ2*01\"] | \n", "1 | \n", "[393] | \n", "[430] | \n", "[2.15e-16] | \n", "0 | \n", "F | \n", "F | \n", "standard | \n", "B_VDJ_80_1_1_VJ_17_2_1 | \n", "
| sc5p_v2_hs_PBMC_1k_b_TTGCCGTAGAATGTGT_contig_2 | \n", "sc5p_v2_hs_PBMC_1k_b_TTGCCGTAGAATGTGT_contig_2 | \n", "TGGGGAGTGACTCCTGTGCCCCACCATGGACACACTTTGCTCCACG... | \n", "F | \n", "T | \n", "IGHV2-5*02 | \n", "\n", " | IGHJ6*02 | \n", "CAGATCACCTTGAAGGAGTCTGGTCCT...ACGCTGGTGAAACCCA... | \n", "CAGATCACCTTGAAGGAGTCTGGTCCT...ACGCTGGTGAAACCCA... | \n", "TGTGCACACAGCGACTACTATGAGGGGCGCGGTATGGACGTCTGG | \n", "... | \n", "[\"IGHJ6*02\"] | \n", "1 | \n", "[400] | \n", "[446] | \n", "[1.94e-21] | \n", "0 | \n", "F | \n", "F | \n", "standard | \n", "B_VDJ_80_1_1_VJ_17_2_1 | \n", "
2575 rows × 124 columns
\n", "| \n", " | v_call_genotyped | \n", "germline_alignment_d_mask | \n", "
|---|---|---|
| sequence_id | \n", "\n", " | \n", " |
| sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG_contig_1 | \n", "IGKV1-33*01,IGKV1D-33*01 | \n", "GACATCCAGATGACCCAGTCTCCATCCTCCCTGTCTGCATCTGTAG... | \n", "
| sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_contig_2 | \n", "IGHV1-69*01,IGHV1-69D*01 | \n", "CAGGTGCAGCTGGTGCAGTCTGGGGCT...GAGGTGAAGAAGCCTG... | \n", "
| sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_contig_1 | \n", "IGKV1-8*01 | \n", "GCCATCCGGATGACCCAGTCTCCATCCTCATTCTCTGCATCTACAG... | \n", "
| sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_contig_1 | \n", "IGLV5-45*02 | \n", "CAGGCTGTGCTGACTCAGCCGTCTTCC...CTCTCTGCATCTCCTG... | \n", "
| sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_contig_2 | \n", "IGHV1-2*02 | \n", "CAGGTGCAGCTGGTGCAGTCTGGGGCT...GAGGTGAAGAAGCCTG... | \n", "
| ... | \n", "... | \n", "... | \n", "
| sc5p_v2_hs_PBMC_1k_b_TTCCCAGAGTACATGA_contig_2 | \n", "IGHV3-23*01,IGHV3-23D*01 | \n", "GAGGTGCAGCTGTTGGAGTCTGGGGGA...GGCTTGGTACAGCCTG... | \n", "
| sc5p_v2_hs_PBMC_1k_b_TTGAACGCAGGCTGAA_contig_1 | \n", "IGKV1-8*01 | \n", "GCCATCCGGATGACCCAGTCTCCATCCTCATTCTCTGCATCTACAG... | \n", "
| sc5p_v2_hs_PBMC_1k_b_TTGAACGCAGGCTGAA_contig_2 | \n", "IGHV3-30-3*01 | \n", "CAGGTGCAGCTGGTGGAGTCTGGGGGA...GGCGTGGTCCAGCCTG... | \n", "
| sc5p_v2_hs_PBMC_1k_b_TTGCCGTAGAATGTGT_contig_1 | \n", "IGKV4-1*01 | \n", "GACATCGTGATGACCCAGTCTCCAGACTCCCTGGCTGTGTCTCTGG... | \n", "
| sc5p_v2_hs_PBMC_1k_b_TTGCCGTAGAATGTGT_contig_2 | \n", "IGHV2-5*02 | \n", "CAGATCACCTTGAAGGAGTCTGGTCCT...ACGCTGGTGAAACCCA... | \n", "
2575 rows × 2 columns
\n", "| \n", " | cells | \n", "yhat | \n", "group | \n", "type | \n", "plateau | \n", "
|---|---|---|---|---|---|
| 0 | \n", "1 | \n", "0.986055 | \n", "vdj_nextgem_hs_pbmc3_b | \n", "observed | \n", "14202.406243 | \n", "
| 1 | \n", "2 | \n", "1.971979 | \n", "vdj_nextgem_hs_pbmc3_b | \n", "observed | \n", "14202.406243 | \n", "
| 2 | \n", "3 | \n", "2.957774 | \n", "vdj_nextgem_hs_pbmc3_b | \n", "observed | \n", "14202.406243 | \n", "
| 3 | \n", "4 | \n", "3.943439 | \n", "vdj_nextgem_hs_pbmc3_b | \n", "observed | \n", "14202.406243 | \n", "
| 4 | \n", "5 | \n", "4.928973 | \n", "vdj_nextgem_hs_pbmc3_b | \n", "observed | \n", "14202.406243 | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
| 46671 | \n", "10955 | \n", "10205.490307 | \n", "vdj_v1_hs_pbmc3_b | \n", "extrapolated | \n", "141683.928775 | \n", "
| 46672 | \n", "10956 | \n", "10206.358137 | \n", "vdj_v1_hs_pbmc3_b | \n", "extrapolated | \n", "141683.928775 | \n", "
| 46673 | \n", "10957 | \n", "10207.225957 | \n", "vdj_v1_hs_pbmc3_b | \n", "extrapolated | \n", "141683.928775 | \n", "
| 46674 | \n", "10958 | \n", "10208.093766 | \n", "vdj_v1_hs_pbmc3_b | \n", "extrapolated | \n", "141683.928775 | \n", "
| 46675 | \n", "10959 | \n", "10208.961564 | \n", "vdj_v1_hs_pbmc3_b | \n", "extrapolated | \n", "141683.928775 | \n", "
46676 rows × 5 columns
\n", "