{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Visualizing V(D)J data\n", "\n", "## Integration with `scanpy`\n", "Now that we have both 1) a pre-processed V(D)J data in `Dandelion` object and 2) matching `AnnData` object, we can start finding clones and *'integrate'* the results. All the V(D)J (AIRR) analyses files can be saved as *.tsv* format so that it can be used in other tools like *immcantation*, *immunoarch*, *vdjtools*, etc.\n", "\n", "The results can also be ported into the `AnnData` object for access to more plotting functions provided through `scanpy` [[Wolf2018]](https://doi.org/10.1186/s13059-017-1382-0)." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "scrolled": true }, "outputs": [], "source": [ "import os\n", "import dandelion as ddl\n", "import scanpy as sc\n", "\n", "sc.settings.verbosity = 3\n", "\n", "ddl.set_backend(\"base\")\n", "\n", "# change to tutorials directory\n", "os.chdir(\"dandelion_tutorial\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Read in the previously saved files\n", "\n", "I will work with the same example from the previous section since I have the `AnnData` object saved and vdj table filtered." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "AnnData object with n_obs × n_vars = 25057 × 1308\n", " obs: 'sample_id', 'n_genes', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'gmm_pct_count_clusters_keep', 'scrublet_score', 'is_doublet', 'filter_rna', 'has_contig', 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'umi_count_B_VDJ', 'umi_count_B_VJ', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'v_call_B_VDJ_main', 'd_call_B_VDJ_main', 'j_call_B_VDJ_main', 'v_call_B_VJ_main', 'j_call_B_VJ_main', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ', 'leiden'\n", " var: 'n_cells', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'mean', 'std'\n", " uns: 'chain_status_colors', 'hvg', 'leiden', 'leiden_colors', 'log1p', 'neighbors', 'pca', 'sample_id_colors', 'umap'\n", " obsm: 'X_pca', 'X_umap'\n", " varm: 'PCs'\n", " obsp: 'connectivities', 'distances'" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "adata = sc.read_h5ad(\"adata.h5ad\")\n", "adata" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "Dandelion class object with n_obs = 2334 and n_contigs = 5557\n", " data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'c_call', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end', 'v_score', 'v_identity', 'v_support', 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2', 'cdr3', 'cell_id', 'consensus_count', 'umi_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'j_support_igblastn', 'j_score_igblastn', 'j_call_igblastn', 'j_call_blastn', 'j_identity_blastn', 'j_alignment_length_blastn', 'j_number_of_mismatches_blastn', 'j_number_of_gap_openings_blastn', 'j_sequence_start_blastn', 'j_sequence_end_blastn', 'j_germline_start_blastn', 'j_germline_end_blastn', 'j_support_blastn', 'j_score_blastn', 'j_sequence_alignment_blastn', 'j_germline_alignment_blastn', 'j_source', 'd_support_igblastn', 'd_score_igblastn', 'd_call_igblastn', 'd_call_blastn', 'd_identity_blastn', 'd_alignment_length_blastn', 'd_number_of_mismatches_blastn', 'd_number_of_gap_openings_blastn', 'd_sequence_start_blastn', 'd_sequence_end_blastn', 'd_germline_start_blastn', 'd_germline_end_blastn', 'd_support_blastn', 'd_score_blastn', 'd_sequence_alignment_blastn', 'd_germline_alignment_blastn', 'd_source', 'v_call_genotyped', 'germline_alignment_d_mask', 'sample_id', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_sequence_end', 'c_score', 'c_identity', 'c_call_10x', 'junction_aa_length', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa', 'sequence_alignment_aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'complete_vdj', 'j_call_multimappers', 'j_call_multiplicity', 'j_call_sequence_start_multimappers', 'j_call_sequence_end_multimappers', 'j_call_support_multimappers', 'mu_count', 'ambiguous', 'extra', 'rearrangement_status', 'clone_id'\n", " metadata: 'clone_id', 'clone_id_rank', 'sample_id', 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'umi_count_B_VDJ', 'umi_count_B_VJ', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'v_call_B_VDJ_main', 'd_call_B_VDJ_main', 'j_call_B_VDJ_main', 'v_call_B_VJ_main', 'j_call_B_VJ_main', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'\n", " layout: layout for 2334 vertices, layout for 146 vertices\n", " graph: networkx graph of 2334 vertices, networkx graph of 146 vertices \n", " distances: distance matrix of shape (2334, 2334)" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "vdj = ddl.read_h5ddl(\"dandelion_results_simplified.h5ddl\")\n", "vdj" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### `ddl.tl.transfer`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can sync the V(D)J data from `Dandelion` object to the matching `AnnData` object using `ddl.tl.transfer` function." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Transferring network\n", " finished: updated `.obs` with `.metadata`\n", "wrote active layout to `.obsm['X_vdj']`; stashed all views in `.uns['dandelion']` ('X_vdj_all', 'X_vdj_expanded')\n", "wrote `.obsp['connectivities']` & `['distances']` from graph[0]\n", "stashed GEX matrices in `.uns['dandelion']` ('gex_connectivities', 'gex_distances')\n", "stashed VDJ matrices in `.uns['dandelion']` under 'vdj_connectivities_*' keys\n", "added `.uns['clone_id']` clone-level mapping (0:00:00)\n" ] }, { "data": { "text/plain": [ "AnnData object with n_obs × n_vars = 25057 × 1308\n", " obs: 'sample_id', 'n_genes', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'gmm_pct_count_clusters_keep', 'scrublet_score', 'is_doublet', 'filter_rna', 'has_contig', 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'umi_count_B_VDJ', 'umi_count_B_VJ', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'v_call_B_VDJ_main', 'd_call_B_VDJ_main', 'j_call_B_VDJ_main', 'v_call_B_VJ_main', 'j_call_B_VJ_main', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ', 'leiden', 'clone_id', 'clone_id_rank'\n", " var: 'n_cells', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'mean', 'std'\n", " uns: 'chain_status_colors', 'hvg', 'leiden', 'leiden_colors', 'log1p', 'neighbors', 'pca', 'sample_id_colors', 'umap', 'dandelion', 'gex_neighbors', 'clone_id'\n", " obsm: 'X_pca', 'X_umap', 'X_vdj'\n", " varm: 'PCs'\n", " obsp: 'connectivities', 'distances'" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ddl.tl.transfer(adata, vdj)\n", "adata" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
| \n", " | clone_id_size | \n", "clone_id_size_prop | \n", "clone_id_size_category | \n", "
|---|---|---|---|
| sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC | \n", "1 | \n", "0.000428 | \n", "Small | \n", "
| sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG | \n", "1 | \n", "0.000428 | \n", "Small | \n", "
| sc5p_v2_hs_PBMC_10k_b_AAACCTGTCTTGAGAC | \n", "1 | \n", "0.000428 | \n", "Small | \n", "
| sc5p_v2_hs_PBMC_10k_b_AAACGGGAGCGACGTA | \n", "1 | \n", "0.000428 | \n", "Small | \n", "
| sc5p_v2_hs_PBMC_10k_b_AAACGGGCACTGTTAG | \n", "1 | \n", "0.000428 | \n", "Small | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "
| vdj_v1_hs_pbmc3_b_TTTCCTCAGCGCTTAT | \n", "1 | \n", "0.000428 | \n", "Small | \n", "
| vdj_v1_hs_pbmc3_b_TTTCCTCAGGGAAACA | \n", "1 | \n", "0.000428 | \n", "Small | \n", "
| vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC | \n", "1 | \n", "0.000428 | \n", "Small | \n", "
| vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG | \n", "1 | \n", "0.000428 | \n", "Small | \n", "
| vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG | \n", "1 | \n", "0.000428 | \n", "Small | \n", "
2334 rows × 3 columns
\n", "| clone_id_size_category | \n", "Hyperexpanded | \n", "Large | \n", "Medium | \n", "Small | \n", "
|---|---|---|---|---|
| isotype_status | \n", "\n", " | \n", " | \n", " | \n", " |
| \n", " | 100.0 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "
| IgA | \n", "0.0 | \n", "3.448276 | \n", "96.551724 | \n", "0.000000 | \n", "
| IgD | \n", "0.0 | \n", "100.000000 | \n", "0.000000 | \n", "0.000000 | \n", "
| IgG | \n", "0.0 | \n", "9.137056 | \n", "90.862944 | \n", "0.000000 | \n", "
| IgM | \n", "0.0 | \n", "0.000000 | \n", "5.888828 | \n", "94.111172 | \n", "
| Multi | \n", "0.0 | \n", "9.937888 | \n", "90.062112 | \n", "0.000000 | \n", "