{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Reading 10X Cell Ranger output directly\n", "\n", "If for whatever reason you've decided to skip the reannotation/preprocessing, you can read the files directly from the Cell Ranger output folder with `Dandelion`'s `ddl.read_10x_vdj`, which accepts the `*_contig_annotations.csv` or `all_contig_annotations.json` file(s) as input. If reading with the `.csv` file, and the `.fasta` file and/or `.json` file(s) are in the same folder, `ddl.read_10x_vdj` will try to extract additional information not found in the `.csv` file e.g. contig sequences.\n", "\n", "From Cell Ranger V4 onwards, there is also an `airr_rearrangement.tsv` file that can be used directly with `Dandelion`. However, doing so will miss out on the reannotation steps but that is entirely up to you.\n", "\n", "We will download the airr_rearrangement.tsv file from here:\n", "```bash\n", "# bash\n", "wget https://cf.10xgenomics.com/samples/cell-vdj/4.0.0/sc5p_v2_hs_PBMC_10k/sc5p_v2_hs_PBMC_10k_b_filtered_contig_annotations.csv\n", "wget https://cf.10xgenomics.com/samples/cell-vdj/4.0.0/sc5p_v2_hs_PBMC_10k/sc5p_v2_hs_PBMC_10k_b_filtered_contig.fasta\n", "# wget https://cf.10xgenomics.com/samples/cell-vdj/4.0.0/sc5p_v2_hs_PBMC_10k/sc5p_v2_hs_PBMC_10k_b_all_contig_annotations.json\n", "wget https://cf.10xgenomics.com/samples/cell-vdj/4.0.0/sc5p_v2_hs_PBMC_10k/sc5p_v2_hs_PBMC_10k_b_airr_rearrangement.tsv\n", "```\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Import dandelion module" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "scrolled": true }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.11/site-packages/anndata/utils.py:429: FutureWarning: Importing read_csv from `anndata` is deprecated. Import anndata.io.read_csv instead.\n", "/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.11/site-packages/anndata/utils.py:429: FutureWarning: Importing read_excel from `anndata` is deprecated. Import anndata.io.read_excel instead.\n", "/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.11/site-packages/anndata/utils.py:429: FutureWarning: Importing read_hdf from `anndata` is deprecated. Import anndata.io.read_hdf instead.\n", "/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.11/site-packages/anndata/utils.py:429: FutureWarning: Importing read_loom from `anndata` is deprecated. Import anndata.io.read_loom instead.\n", "/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.11/site-packages/anndata/utils.py:429: FutureWarning: Importing read_mtx from `anndata` is deprecated. Import anndata.io.read_mtx instead.\n", "/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.11/site-packages/anndata/utils.py:429: FutureWarning: Importing read_text from `anndata` is deprecated. Import anndata.io.read_text instead.\n", "/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.11/site-packages/anndata/utils.py:429: FutureWarning: Importing read_umi_tools from `anndata` is deprecated. Import anndata.io.read_umi_tools instead.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "dandelion==0.5.5.dev16 pandas==2.2.3 numpy==2.1.3 matplotlib==3.10.1 networkx==3.4.2 scipy==1.15.2\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.11/site-packages/nxviz/__init__.py:33: UserWarning: \n", "nxviz has a new API! Version 0.7.4 onwards, the old class-based API is being\n", "deprecated in favour of a new API focused on advancing a grammar of network\n", "graphics. If your plotting code depends on the old API, please consider\n", "pinning nxviz at version 0.7.4, as the new API will break your old code.\n", "\n", "To check out the new API, please head over to the docs at\n", "https://ericmjl.github.io/nxviz/ to learn more. We hope you enjoy using it!\n", "\n", "(This deprecation message will go away in version 1.0.)\n", "\n" ] } ], "source": [ "import os\n", "import dandelion as ddl\n", "\n", "# change directory to somewhere more workable\n", "os.chdir(os.path.expanduser(\"~/Downloads/dandelion_tutorial/\"))\n", "ddl.logging.print_versions()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With `ddl.read_10x_vdj`:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Dandelion class object with n_obs = 994 and n_contigs = 2601\n", " data: 'cell_id', 'is_cell_10x', 'sequence_id', 'high_confidence_10x', 'sequence_length_10x', 'locus', 'v_call', 'd_call', 'j_call', 'c_call', 'complete_vdj', 'productive', 'junction_aa', 'junction', 'consensus_count', 'umi_count', 'clone_id', 'raw_consensus_id_10x', 'sequence', 'rearrangement_status'\n", " metadata: 'clone_id', 'clone_id_by_size', 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'umi_count_B_VDJ', 'umi_count_B_VJ', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'v_call_B_VDJ_main', 'd_call_B_VDJ_main', 'j_call_B_VDJ_main', 'v_call_B_VJ_main', 'j_call_B_VJ_main', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "folder_location = \"sc5p_v2_hs_PBMC_10k\"\n", "# or file_location = 'sc5p_v2_hs_PBMC_10k/'\n", "vdj = ddl.read_10x_vdj(\n", " folder_location, filename_prefix=\"sc5p_v2_hs_PBMC_10k_b_filtered\"\n", ")\n", "vdj" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With `ddl.read_10x_airr`:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "Dandelion class object with n_obs = 994 and n_contigs = 2093\n", " data: 'cell_id', 'sequence_id', 'sequence', 'sequence_aa', 'productive', 'rev_comp', 'v_call', 'v_cigar', 'd_call', 'd_cigar', 'j_call', 'j_cigar', 'c_call', 'c_cigar', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'junction_length', 'junction_aa_length', 'v_sequence_start', 'v_sequence_end', 'd_sequence_start', 'd_sequence_end', 'j_sequence_start', 'j_sequence_end', 'c_sequence_start', 'c_sequence_end', 'consensus_count', 'umi_count', 'is_cell', 'locus', 'rearrangement_status'\n", " metadata: 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'umi_count_B_VDJ', 'umi_count_B_VJ', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'v_call_B_VDJ_main', 'd_call_B_VDJ_main', 'j_call_B_VDJ_main', 'v_call_B_VJ_main', 'j_call_B_VJ_main', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# read in the airr_rearrangement.tsv file\n", "file_location = (\n", " \"sc5p_v2_hs_PBMC_10k/sc5p_v2_hs_PBMC_10k_b_airr_rearrangement.tsv\"\n", ")\n", "vdj = ddl.read_10x_airr(file_location)\n", "vdj" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you are using non-10x data e.g. Parse Bioscience Evercode, BD Rhapsody, you can use `ddl.read_parse_airr` and `ddl.read_bd_airr` respectively. If you are using other sources of single-cell AIRR data that provides standard AIRR formatted files e.g. SeekGene Biosciences, or just a standard AIRR file, you can use `ddl.read_airr` directly." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will continue with the rest of the filtering part of the analysis to show how it slots smoothly with the rest of the workflow." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Import modules for use with scanpy" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "scanpy==1.10.3 anndata==0.11.3 umap==0.5.7 numpy==2.1.3 scipy==1.15.2 pandas==2.2.3 scikit-learn==1.6.1 statsmodels==0.14.4 igraph==0.11.8 pynndescent==0.5.13\n" ] } ], "source": [ "import pandas as pd\n", "import numpy as np\n", "import scanpy as sc\n", "import warnings\n", "import functools\n", "import seaborn as sns\n", "import scipy.stats\n", "import anndata\n", "\n", "warnings.filterwarnings(\"ignore\")\n", "sc.logging.print_header()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Import the transcriptome data" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "AnnData object with n_obs × n_vars = 10553 × 36601\n", " obs: 'sample_id'\n", " var: 'gene_ids', 'feature_types', 'genome', 'pattern', 'read', 'sequence'" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "adata = sc.read_10x_h5(\n", " \"sc5p_v2_hs_PBMC_10k/filtered_feature_bc_matrix.h5\", gex_only=True\n", ")\n", "adata.obs[\"sample_id\"] = \"sc5p_v2_hs_PBMC_10k\"\n", "adata.var_names_make_unique()\n", "adata" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Run QC on the transcriptome data." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "scrolled": true }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.\n" ] }, { "data": { "text/plain": [ "AnnData object with n_obs × n_vars = 10553 × 36601\n", " obs: 'sample_id', 'n_genes', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'scrublet_score', 'is_doublet', 'filter_rna'\n", " var: 'gene_ids', 'feature_types', 'genome', 'pattern', 'read', 'sequence'" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ddl.pp.recipe_scanpy_qc(adata)\n", "adata" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Run the filtering of bcr data. Note that I'm using the `Dandelion` object as input rather than the pandas dataframe (yes both types of input will works. In fact, a file path to the .tsv will work too)." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Preparing data: 2093it [00:00, 17656.48it/s]\n", "Scanning for poor quality/ambiguous contigs: 100%|██████████| 994/994 [00:00<00:00, 1262.89it/s]\n" ] } ], "source": [ "# The function will return both objects.\n", "vdj, adata = ddl.pp.check_contigs(vdj, adata)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check the output V(D)J table\n", "\n", "The vdj table is returned as a `Dandelion` class object in the `.data` slot; if a file was provided for `filter_bcr` above, a new file will be created in the same folder with the `filtered` prefix. Note that this V(D)J table is indexed based on contigs (sequence_id)." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Dandelion class object with n_obs = 984 and n_contigs = 2028\n", " data: 'cell_id', 'sequence_id', 'sequence', 'sequence_aa', 'productive', 'rev_comp', 'v_call', 'v_cigar', 'd_call', 'd_cigar', 'j_call', 'j_cigar', 'c_call', 'c_cigar', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'junction_length', 'junction_aa_length', 'v_sequence_start', 'v_sequence_end', 'd_sequence_start', 'd_sequence_end', 'j_sequence_start', 'j_sequence_end', 'c_sequence_start', 'c_sequence_end', 'consensus_count', 'umi_count', 'is_cell', 'locus', 'rearrangement_status', 'ambiguous', 'extra'\n", " metadata: 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'umi_count_B_VDJ', 'umi_count_B_VJ', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'v_call_B_VDJ_main', 'd_call_B_VDJ_main', 'j_call_B_VDJ_main', 'v_call_B_VJ_main', 'j_call_B_VJ_main', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "vdj" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check the AnnData object as well\n", "\n", "And the `AnnData` object is indexed based on cells." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "AnnData object with n_obs × n_vars = 10553 × 36601\n", " obs: 'sample_id', 'n_genes', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'scrublet_score', 'is_doublet', 'filter_rna', 'has_contig', 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'umi_count_B_VDJ', 'umi_count_B_VJ', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'v_call_B_VDJ_main', 'd_call_B_VDJ_main', 'j_call_B_VDJ_main', 'v_call_B_VJ_main', 'j_call_B_VJ_main', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'\n", " var: 'gene_ids', 'feature_types', 'genome', 'pattern', 'read', 'sequence'" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "adata" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The number of cells that actually has a matching BCR can be tabluated." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "scrolled": true }, "outputs": [ { "data": { "application/vnd.microsoft.datawrangler.viewer.v0+json": { "columns": [ { "name": "has_contig", "rawType": "object", "type": "string" }, { "name": "Extra pair", "rawType": "int64", "type": "integer" }, { "name": "No_contig", "rawType": "int64", "type": "integer" }, { "name": "Orphan VDJ", "rawType": "int64", "type": "integer" }, { "name": "Orphan VJ", "rawType": "int64", "type": "integer" }, { "name": "Single pair", "rawType": "int64", "type": "integer" } ], "conversionMethod": "pd.DataFrame", "ref": "c5a85728-1a12-414c-b4c0-db0118ddd581", "rows": [ [ "No_contig", "0", "9569", "0", "0", "0" ], [ "True", "79", "0", "5", "16", "884" ] ], "shape": { "columns": 5, "rows": 2 } }, "text/html": [ "
| chain_status | \n", "Extra pair | \n", "No_contig | \n", "Orphan VDJ | \n", "Orphan VJ | \n", "Single pair | \n", "
|---|---|---|---|---|---|
| has_contig | \n", "\n", " | \n", " | \n", " | \n", " | \n", " |
| No_contig | \n", "0 | \n", "9569 | \n", "0 | \n", "0 | \n", "0 | \n", "
| True | \n", "79 | \n", "0 | \n", "5 | \n", "16 | \n", "884 | \n", "
| \n", " | cell_id | \n", "sequence_id | \n", "sequence | \n", "sequence_aa | \n", "productive | \n", "rev_comp | \n", "v_call | \n", "v_cigar | \n", "d_call | \n", "d_cigar | \n", "... | \n", "d_sequence_end | \n", "j_sequence_start | \n", "j_sequence_end | \n", "c_sequence_start | \n", "c_sequence_end | \n", "consensus_count | \n", "umi_count | \n", "is_cell | \n", "locus | \n", "rearrangement_status | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| sequence_id | \n", "\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
| run1_AAACCTGTCATATCGG-1_contig_1 | \n", "run1_AAACCTGTCATATCGG-1 | \n", "run1_AAACCTGTCATATCGG-1_contig_1 | \n", "TGGGGAGGAGTCAGTCCCAACCAGGACACGGCCTGGACATGAGGGT... | \n", "MRVPAQLLGLLLLWLSGARCDIQMTQSPSSLSASVGDRVTITCQAT... | \n", "T | \n", "F | \n", "IGKV1-8 | \n", "38S314M204S | \n", "NaN | \n", "NaN | \n", "... | \n", "NaN | \n", "384 | \n", "420 | \n", "421 | \n", "556 | \n", "9139 | \n", "68 | \n", "T | \n", "IGK | \n", "standard | \n", "
| run1_AAACCTGTCCGTTGTC-1_contig_2 | \n", "run1_AAACCTGTCCGTTGTC-1 | \n", "run1_AAACCTGTCCGTTGTC-1_contig_2 | \n", "ATCACATAACAACCACATTCCTCCTCTAAAGAAGCCCCTGGGAGCA... | \n", "MDWTWRFLFVVAAATGVQSQVQLVQSGAEVKKPGSSVKVSCKASGG... | \n", "T | \n", "F | \n", "IGHV1-69D | \n", "58S353M154S | \n", "IGHD3-22 | \n", "411S31M123S | \n", "... | \n", "442.0 | \n", "445 | \n", "494 | \n", "495 | \n", "565 | \n", "4161 | \n", "51 | \n", "T | \n", "IGH | \n", "standard | \n", "
| run1_AAACCTGTCCGTTGTC-1_contig_1 | \n", "run1_AAACCTGTCCGTTGTC-1 | \n", "run1_AAACCTGTCCGTTGTC-1_contig_1 | \n", "AGGAGTCAGACCCTGTCAGGACACAGCATAGACATGAGGGTCCCCG... | \n", "MRVPAQLLGLLLLWLPGARCAIRMTQSPSSFSASTGDRVTITCRAS... | \n", "T | \n", "F | \n", "IGKV1-8 | \n", "33S345M173S | \n", "NaN | \n", "NaN | \n", "... | \n", "NaN | \n", "378 | \n", "415 | \n", "416 | \n", "551 | \n", "5679 | \n", "43 | \n", "T | \n", "IGK | \n", "standard | \n", "
| run1_AAACCTGTCGAGAACG-1_contig_1 | \n", "run1_AAACCTGTCGAGAACG-1 | \n", "run1_AAACCTGTCGAGAACG-1_contig_1 | \n", "ACTGTGGGGGTAAGAGGTTGTGTCCACCATGGCCTGGACTCCTCTC... | \n", "MAWTPLLLLFLSHCTGSLSQAVLTQPSSLSASPGASGRLTCTLRSD... | \n", "T | \n", "F | \n", "IGLV5-45 | \n", "28S369M245S | \n", "NaN | \n", "NaN | \n", "... | \n", "NaN | \n", "394 | \n", "431 | \n", "432 | \n", "642 | \n", "13160 | \n", "90 | \n", "T | \n", "IGL | \n", "standard | \n", "
| run1_AAACCTGTCGAGAACG-1_contig_2 | \n", "run1_AAACCTGTCGAGAACG-1 | \n", "run1_AAACCTGTCGAGAACG-1_contig_2 | \n", "GGGAGCATCACCCAGCAACCACATCTGTCCTCTAGAGAATCCCCTG... | \n", "MDWTWRILFLVAAATGAHSQVQLVQSGGEVKKPGASVKVSCKASGY... | \n", "T | \n", "F | \n", "IGHV1-2 | \n", "64S353M133S | \n", "NaN | \n", "NaN | \n", "... | \n", "NaN | \n", "430 | \n", "479 | \n", "480 | \n", "550 | \n", "5080 | \n", "47 | \n", "T | \n", "IGH | \n", "standard | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
| run2_TTTGGTTTCAGAGCTT-1_contig_2 | \n", "run2_TTTGGTTTCAGAGCTT-1 | \n", "run2_TTTGGTTTCAGAGCTT-1_contig_2 | \n", "GGGAGAGCCCTGGGGAGGAACTGCTCAGTTAGGACCCAGAGGGAAC... | \n", "MEAPAQLLFLLLLWLPDTTGEIVLTQSPATLSLSPGERATLSCRAS... | \n", "T | \n", "F | \n", "IGKV3-11 | \n", "47S345M170S | \n", "NaN | \n", "NaN | \n", "... | \n", "NaN | \n", "389 | \n", "426 | \n", "427 | \n", "562 | \n", "11867 | \n", "73 | \n", "T | \n", "IGK | \n", "standard | \n", "
| run2_TTTGGTTTCAGTGTTG-1_contig_1 | \n", "run2_TTTGGTTTCAGTGTTG-1 | \n", "run2_TTTGGTTTCAGTGTTG-1_contig_1 | \n", "GGGGTCACAAGAGGCAGCGCTCTCGGGACGTCTCCACCATGGCCTG... | \n", "MAWALLLLTLLTQDTGSWAQSALTQPASVSGSPGQSITISCTGTSS... | \n", "T | \n", "F | \n", "IGLV2-23 | \n", "38S340M262S | \n", "NaN | \n", "NaN | \n", "... | \n", "NaN | \n", "392 | \n", "429 | \n", "430 | \n", "640 | \n", "6497 | \n", "58 | \n", "T | \n", "IGL | \n", "standard | \n", "
| run2_TTTGGTTTCAGTGTTG-1_contig_2 | \n", "run2_TTTGGTTTCAGTGTTG-1 | \n", "run2_TTTGGTTTCAGTGTTG-1_contig_2 | \n", "ATATTTCGTATCTGGGGAGTGACTCCTGTGCCCCACCATGGACACA... | \n", "MDTLCSTLLLLTIPSWVLSQITLKESGPTLVKPTQTLTLTCTFSGF... | \n", "T | \n", "F | \n", "IGHV2-5 | \n", "37S358M122S | \n", "NaN | \n", "NaN | \n", "... | \n", "NaN | \n", "399 | \n", "446 | \n", "447 | \n", "517 | \n", "3530 | \n", "33 | \n", "T | \n", "IGH | \n", "standard | \n", "
| run2_TTTGGTTTCGGTGTCG-1_contig_2 | \n", "run2_TTTGGTTTCGGTGTCG-1 | \n", "run2_TTTGGTTTCGGTGTCG-1_contig_2 | \n", "GGGAGAGCCCTGGGGAGGAACTGCTCAGTTAGGACCCAGAGGGAAC... | \n", "MEAPAQLLFLLLLWLPDTTGEIVLTQSPATLSLSPGERATLSCRAS... | \n", "T | \n", "F | \n", "IGKV3-11 | \n", "47S345M176S | \n", "NaN | \n", "NaN | \n", "... | \n", "NaN | \n", "396 | \n", "432 | \n", "433 | \n", "568 | \n", "3058 | \n", "22 | \n", "T | \n", "IGK | \n", "standard | \n", "
| run2_TTTGGTTTCGGTGTCG-1_contig_1 | \n", "run2_TTTGGTTTCGGTGTCG-1 | \n", "run2_TTTGGTTTCGGTGTCG-1_contig_1 | \n", "GAGAGAGGAGCCTTAGCCCTGGATTCCAAGGCCTATCCACTTGGTG... | \n", "MELGLRWVFLVAILEGVQCEVQLVESGGGLVKPGGSLRLSCAASGF... | \n", "T | \n", "F | \n", "IGHV3-21 | \n", "73S353M145S | \n", "NaN | \n", "NaN | \n", "... | \n", "NaN | \n", "448 | \n", "500 | \n", "501 | \n", "571 | \n", "1026 | \n", "12 | \n", "T | \n", "IGH | \n", "standard | \n", "
4186 rows × 33 columns
\n", "| \n", " | locus_VDJ | \n", "locus_VJ | \n", "productive_VDJ | \n", "productive_VJ | \n", "v_call_VDJ | \n", "d_call_VDJ | \n", "j_call_VDJ | \n", "v_call_VJ | \n", "j_call_VJ | \n", "c_call_VDJ | \n", "... | \n", "d_call_B_VDJ_main | \n", "j_call_B_VDJ_main | \n", "v_call_B_VJ_main | \n", "j_call_B_VJ_main | \n", "isotype | \n", "isotype_status | \n", "locus_status | \n", "chain_status | \n", "rearrangement_status_VDJ | \n", "rearrangement_status_VJ | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| run1_AAACCTGTCATATCGG-1 | \n", "None | \n", "IGK | \n", "None | \n", "T | \n", "None | \n", "None | \n", "None | \n", "IGKV1-8 | \n", "IGKJ4 | \n", "None | \n", "... | \n", "None | \n", "None | \n", "IGKV1-8 | \n", "IGKJ4 | \n", "\n", " | \n", " | Orphan IGK | \n", "Orphan VJ | \n", "None | \n", "standard | \n", "
| run1_AAACCTGTCCGTTGTC-1 | \n", "IGH | \n", "IGK | \n", "T | \n", "T | \n", "IGHV1-69D | \n", "IGHD3-22 | \n", "IGHJ3 | \n", "IGKV1-8 | \n", "IGKJ1 | \n", "IGHM | \n", "... | \n", "IGHD3-22 | \n", "IGHJ3 | \n", "IGKV1-8 | \n", "IGKJ1 | \n", "IgM | \n", "IgM | \n", "IGH + IGK | \n", "Single pair | \n", "standard | \n", "standard | \n", "
| run1_AAACCTGTCGAGAACG-1 | \n", "IGH | \n", "IGL | \n", "T | \n", "T | \n", "IGHV1-2 | \n", "None | \n", "IGHJ3 | \n", "IGLV5-45 | \n", "IGLJ3 | \n", "IGHM | \n", "... | \n", "None | \n", "IGHJ3 | \n", "IGLV5-45 | \n", "IGLJ3 | \n", "IgM | \n", "IgM | \n", "IGH + IGL | \n", "Single pair | \n", "standard | \n", "standard | \n", "
| run1_AAACCTGTCTTGAGAC-1 | \n", "IGH | \n", "IGK | \n", "T | \n", "T | \n", "IGHV5-51 | \n", "None | \n", "IGHJ3 | \n", "IGKV1D-8 | \n", "IGKJ2 | \n", "IGHM | \n", "... | \n", "None | \n", "IGHJ3 | \n", "IGKV1D-8 | \n", "IGKJ2 | \n", "IgM | \n", "IgM | \n", "IGH + IGK | \n", "Single pair | \n", "standard | \n", "standard | \n", "
| run1_AAACGGGAGCGACGTA-1 | \n", "IGH | \n", "IGL | \n", "T | \n", "T | \n", "IGHV4-59 | \n", "None | \n", "IGHJ3 | \n", "IGLV3-19 | \n", "IGLJ2 | \n", "IGHM | \n", "... | \n", "None | \n", "IGHJ3 | \n", "IGLV3-19 | \n", "IGLJ2 | \n", "IgM | \n", "IgM | \n", "IGH + IGL | \n", "Single pair | \n", "standard | \n", "standard | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
| run2_ACGTCAAAGTTTCCTT-1 | \n", "IGH | \n", "None | \n", "T | \n", "None | \n", "IGHV3-21 | \n", "None | \n", "IGHJ4 | \n", "None | \n", "None | \n", "IGHM | \n", "... | \n", "None | \n", "IGHJ4 | \n", "None | \n", "None | \n", "IgM | \n", "IgM | \n", "Orphan IGH | \n", "Orphan VDJ | \n", "standard | \n", "None | \n", "
| run2_CACTCCACAGATGGCA-1 | \n", "IGH | \n", "None | \n", "T | \n", "None | \n", "IGHV5-51 | \n", "None | \n", "IGHJ5 | \n", "None | \n", "None | \n", "IGHM | \n", "... | \n", "None | \n", "IGHJ5 | \n", "None | \n", "None | \n", "IgM | \n", "IgM | \n", "Orphan IGH | \n", "Orphan VDJ | \n", "standard | \n", "None | \n", "
| run2_CGGTTAAGTTTCGCTC-1 | \n", "IGH | \n", "None | \n", "T | \n", "None | \n", "IGHV1-69D | \n", "None | \n", "IGHJ4 | \n", "None | \n", "None | \n", "IGHM | \n", "... | \n", "None | \n", "IGHJ4 | \n", "None | \n", "None | \n", "IgM | \n", "IgM | \n", "Orphan IGH | \n", "Orphan VDJ | \n", "standard | \n", "None | \n", "
| run2_GTATCTTTCGAGAGCA-1 | \n", "IGH | \n", "None | \n", "T | \n", "None | \n", "IGHV3-23 | \n", "IGHD3-3 | \n", "IGHJ4 | \n", "None | \n", "None | \n", "IGHD | \n", "... | \n", "IGHD3-3 | \n", "IGHJ4 | \n", "None | \n", "None | \n", "IgD | \n", "IgD | \n", "Orphan IGH | \n", "Orphan VDJ | \n", "standard | \n", "None | \n", "
| run2_TGACTTTGTTATCGGT-1 | \n", "IGH | \n", "None | \n", "T | \n", "None | \n", "IGHV1-69D | \n", "None | \n", "IGHJ3 | \n", "None | \n", "None | \n", "IGHM | \n", "... | \n", "None | \n", "IGHJ3 | \n", "None | \n", "None | \n", "IgM | \n", "IgM | \n", "Orphan IGH | \n", "Orphan VDJ | \n", "standard | \n", "None | \n", "
1988 rows × 44 columns
\n", "