Reading 10X Cell Ranger output directly
If for whatever reason you’ve decided to skip the reannotation/preprocessing, you can read the files directly from the Cell Ranger output folder with Dandelion’s ddl.read_10x_vdj, which accepts the *_contig_annotations.csv or all_contig_annotations.json file(s) as input. If reading with the .csv file, and the .fasta file and/or .json file(s) are in the same folder, ddl.read_10x_vdj will try to extract additional information not found in the .csv file e.g.
contig sequences.
From Cell Ranger V4 onwards, there is also an airr_rearrangement.tsv file that can be used directly with Dandelion. However, doing so will miss out on the reannotation steps but that is entirely up to you.
Import dandelion module
[1]:
import os
import dandelion as ddl
ddl.set_backend("base")
# new function to setup tutorial data
from dandelion.tutorial import setup_dandelion_tutorial_bcr
setup_dandelion_tutorial_bcr()
# change directory to somewhere more workable
os.chdir("dandelion_tutorial")
# ddl.logging.print_versions()
folder_location = "sc5p_v2_hs_PBMC_10k_b"
# or file_location = 'sc5p_v2_hs_PBMC_10k/'
vdj = ddl.read_10x_vdj(folder_location, filename_prefix="filtered")
vdj
[1]:
Dandelion class object with n_obs = 994 and n_contigs = 2601
data: 'cell_id', 'is_cell_10x', 'sequence_id', 'high_confidence_10x', 'sequence_length_10x', 'locus', 'v_call', 'd_call', 'j_call', 'c_call', 'complete_vdj', 'productive', 'junction_aa', 'junction', 'consensus_count', 'umi_count', 'clone_id', 'raw_consensus_id_10x', 'sequence', 'rearrangement_status'
metadata: 'clone_id', 'clone_id_rank', 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'umi_count_B_VDJ', 'umi_count_B_VJ', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'v_call_B_VDJ_main', 'd_call_B_VDJ_main', 'j_call_B_VDJ_main', 'v_call_B_VJ_main', 'j_call_B_VJ_main', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'
With ddl.read_10x_airr:
[2]:
# read in the airr_rearrangement.tsv file
file_location = "sc5p_v2_hs_PBMC_10k_b/airr_rearrangement.tsv"
vdj = ddl.read_10x_airr(file_location)
vdj
[2]:
Dandelion class object with n_obs = 994 and n_contigs = 2093
data: 'cell_id', 'sequence_id', 'sequence', 'sequence_aa', 'productive', 'rev_comp', 'v_call', 'v_cigar', 'd_call', 'd_cigar', 'j_call', 'j_cigar', 'c_call', 'c_cigar', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'junction_length', 'junction_aa_length', 'v_sequence_start', 'v_sequence_end', 'd_sequence_start', 'd_sequence_end', 'j_sequence_start', 'j_sequence_end', 'c_sequence_start', 'c_sequence_end', 'consensus_count', 'umi_count', 'is_cell', 'locus', 'rearrangement_status'
metadata: 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'umi_count_B_VDJ', 'umi_count_B_VJ', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'v_call_B_VDJ_main', 'd_call_B_VDJ_main', 'j_call_B_VDJ_main', 'v_call_B_VJ_main', 'j_call_B_VJ_main', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'
If you are using non-10x data e.g. Parse Bioscience Evercode, BD Rhapsody, you can use ddl.read_parse_airr and ddl.read_bd_airr respectively. If you are using other sources of single-cell AIRR data that provides standard AIRR formatted files e.g. SeekGene Biosciences, or just a standard AIRR file, you can use ddl.read_airr directly.
We will continue with the rest of the filtering part of the analysis to show how it slots smoothly with the rest of the workflow.
Import modules for use with scanpy
[3]:
import pandas as pd
import numpy as np
import scanpy as sc
import warnings
import functools
import seaborn as sns
import scipy.stats
import anndata
warnings.filterwarnings("ignore")
sc.logging.print_header()
[3]:
Import the transcriptome data
[4]:
adata = sc.read_10x_h5(
"sc5p_v2_hs_PBMC_10k_b/filtered_feature_bc_matrix.h5", gex_only=True
)
adata.obs["sample_id"] = "sc5p_v2_hs_PBMC_10k_b"
adata.var_names_make_unique()
adata
[4]:
AnnData object with n_obs × n_vars = 10553 × 36601
obs: 'sample_id'
var: 'gene_ids', 'feature_types', 'genome', 'pattern', 'read', 'sequence'
Run QC on the transcriptome data.
[5]:
ddl.pp.recipe_scanpy_qc(adata)
adata
OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
[5]:
AnnData object with n_obs × n_vars = 10553 × 36601
obs: 'sample_id', 'n_genes', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'scrublet_score', 'is_doublet', 'filter_rna'
var: 'gene_ids', 'feature_types', 'genome', 'pattern', 'read', 'sequence'
Run the filtering of bcr data. Note that I’m using the Dandelion object as input rather than the pandas dataframe (yes both types of input will works. In fact, a file path to the .tsv will work too).
[6]:
# The function will return both objects.
vdj, adata = ddl.pp.check_contigs(vdj, adata)
Preparing data: 2093it [00:00, 18173.44it/s]
Scanning for poor quality/ambiguous contigs: 100%|██████████| 994/994 [00:01<00:00, 581.79it/s]
Check the output V(D)J table
The vdj table is returned as a Dandelion class object in the .data slot; if a file was provided for filter_bcr above, a new file will be created in the same folder with the filtered prefix. Note that this V(D)J table is indexed based on contigs (sequence_id).
[7]:
vdj
[7]:
Dandelion class object with n_obs = 984 and n_contigs = 2028
data: 'cell_id', 'sequence_id', 'sequence', 'sequence_aa', 'productive', 'rev_comp', 'v_call', 'v_cigar', 'd_call', 'd_cigar', 'j_call', 'j_cigar', 'c_call', 'c_cigar', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'junction_length', 'junction_aa_length', 'v_sequence_start', 'v_sequence_end', 'd_sequence_start', 'd_sequence_end', 'j_sequence_start', 'j_sequence_end', 'c_sequence_start', 'c_sequence_end', 'consensus_count', 'umi_count', 'is_cell', 'locus', 'rearrangement_status', 'ambiguous', 'extra'
metadata: 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'umi_count_B_VDJ', 'umi_count_B_VJ', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'v_call_B_VDJ_main', 'd_call_B_VDJ_main', 'j_call_B_VDJ_main', 'v_call_B_VJ_main', 'j_call_B_VJ_main', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'
Check the AnnData object as well
And the AnnData object is indexed based on cells.
[8]:
adata
[8]:
AnnData object with n_obs × n_vars = 10553 × 36601
obs: 'sample_id', 'n_genes', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'scrublet_score', 'is_doublet', 'filter_rna', 'has_contig', 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'umi_count_B_VDJ', 'umi_count_B_VJ', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'v_call_B_VDJ_main', 'd_call_B_VDJ_main', 'j_call_B_VDJ_main', 'v_call_B_VJ_main', 'j_call_B_VJ_main', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'
var: 'gene_ids', 'feature_types', 'genome', 'pattern', 'read', 'sequence'
The number of cells that actually has a matching BCR can be tabluated.
[9]:
pd.crosstab(adata.obs["has_contig"], adata.obs["chain_status"])
[9]:
| chain_status | Extra pair | Orphan VDJ | Orphan VJ | Single pair |
|---|---|---|---|---|
| has_contig | ||||
| True | 79 | 5 | 16 | 884 |
Now actually filter the AnnData object and run through a standard workflow starting by filtering genes and normalizing the data
Because the ‘filtered’ AnnData object was returned as a filtered but otherwise unprocessed object, we still need to normalize and run through the usual process here. The following is just a standard scanpy workflow.
[10]:
adata = adata[
adata.obs["filter_rna"] == "False"
] # from ddl.pp.recipe_scanpy_qc
# filter genes
sc.pp.filter_genes(adata, min_cells=3)
# Normalize the counts
sc.pp.normalize_total(adata, target_sum=1e4)
# Logarithmize the data
sc.pp.log1p(adata)
# Stash the normalised counts
adata.raw = adata
Identify highly-variable genes
[11]:
sc.pp.highly_variable_genes(adata, min_mean=0.0125, max_mean=3, min_disp=0.5)
sc.pl.highly_variable_genes(adata)
Filter the genes to only those marked as highly-variable
[12]:
adata = adata[:, adata.var.highly_variable]
Regress out effects of total counts per cell and the percentage of mitochondrial genes expressed. Scale the data to unit variance.
[13]:
sc.pp.regress_out(adata, ["total_counts", "pct_counts_mt"])
sc.pp.scale(adata, max_value=10)
Run PCA
[14]:
sc.tl.pca(adata, svd_solver="arpack")
sc.pl.pca_variance_ratio(adata, log=True, n_pcs=50)
Computing the neighborhood graph, umap and clusters
[15]:
# Computing the neighborhood graph
sc.pp.neighbors(adata)
# Embedding the neighborhood graph
sc.tl.umap(adata)
# Clustering the neighborhood graph
sc.tl.leiden(adata)
Visualizing the clusters and whether or not there’s a corresponding BCR
[16]:
sc.pl.umap(adata, color=["leiden", "chain_status"])
Visualizing some B cell genes
[17]:
sc.pl.umap(adata, color=["IGHM", "JCHAIN"])
Save AnnData
We can save this AnnData object for now.
[18]:
adata.write("adata2.h5ad", compression="gzip")
Save dandelion
To save the vdj object, we have two options - either save the .data and .metadata slots with pandas’ functions:
[19]:
vdj.data.to_csv("filtered_vdj_table2.tsv", sep="\t")
Or save as h5ddl format, which will save the entire Dandelion object and all its slots. This is the recommended way to save the Dandelion object as it will preserve all the information and allow you to load it back in with all the same functionality. .zipddl is not available for the base backend.
[20]:
vdj.write_h5ddl("dandelion_results2.h5ddl")
Concatenating multiple bcr objects
It is quite common that one might be trying to analyse data from multiple samples. In that case, dandelion has a concat function to merge the data.
We will simulate a second object but reading in the same file.
[21]:
vdj1 = ddl.read_10x_airr(file_location)
vdj2 = ddl.read_10x_airr(file_location)
Before you merge the objects, make sure that the “cell_id” and “sequence_id” are distinct so that you can distinguish them later
[22]:
vdj1.add_cell_prefix("run1_")
vdj2.add_cell_prefix("run2_")
[23]:
vdj_merged = ddl.tl.concat([vdj1, vdj2])
vdj_merged
[23]:
Dandelion class object with n_obs = 1988 and n_contigs = 4186
data: 'cell_id', 'sequence_id', 'sequence', 'sequence_aa', 'productive', 'rev_comp', 'v_call', 'v_cigar', 'd_call', 'd_cigar', 'j_call', 'j_cigar', 'c_call', 'c_cigar', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'junction_length', 'junction_aa_length', 'v_sequence_start', 'v_sequence_end', 'd_sequence_start', 'd_sequence_end', 'j_sequence_start', 'j_sequence_end', 'c_sequence_start', 'c_sequence_end', 'consensus_count', 'umi_count', 'is_cell', 'locus', 'rearrangement_status'
metadata: 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'umi_count_B_VDJ', 'umi_count_B_VJ', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'v_call_B_VDJ_main', 'd_call_B_VDJ_main', 'j_call_B_VDJ_main', 'v_call_B_VJ_main', 'j_call_B_VJ_main', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'
[24]:
vdj_merged.data
[24]:
| cell_id | sequence_id | sequence | sequence_aa | productive | rev_comp | v_call | v_cigar | d_call | d_cigar | ... | d_sequence_end | j_sequence_start | j_sequence_end | c_sequence_start | c_sequence_end | consensus_count | umi_count | is_cell | locus | rearrangement_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| sequence_id | |||||||||||||||||||||
| run1_AAACCTGTCATATCGG-1_contig_1 | run1_AAACCTGTCATATCGG-1 | run1_AAACCTGTCATATCGG-1_contig_1 | TGGGGAGGAGTCAGTCCCAACCAGGACACGGCCTGGACATGAGGGT... | MRVPAQLLGLLLLWLSGARCDIQMTQSPSSLSASVGDRVTITCQAT... | T | F | IGKV1-8 | 38S314M204S | NaN | NaN | ... | NaN | 384 | 420 | 421 | 556 | 9139 | 68 | T | IGK | standard |
| run1_AAACCTGTCCGTTGTC-1_contig_2 | run1_AAACCTGTCCGTTGTC-1 | run1_AAACCTGTCCGTTGTC-1_contig_2 | ATCACATAACAACCACATTCCTCCTCTAAAGAAGCCCCTGGGAGCA... | MDWTWRFLFVVAAATGVQSQVQLVQSGAEVKKPGSSVKVSCKASGG... | T | F | IGHV1-69D | 58S353M154S | IGHD3-22 | 411S31M123S | ... | 442.0 | 445 | 494 | 495 | 565 | 4161 | 51 | T | IGH | standard |
| run1_AAACCTGTCCGTTGTC-1_contig_1 | run1_AAACCTGTCCGTTGTC-1 | run1_AAACCTGTCCGTTGTC-1_contig_1 | AGGAGTCAGACCCTGTCAGGACACAGCATAGACATGAGGGTCCCCG... | MRVPAQLLGLLLLWLPGARCAIRMTQSPSSFSASTGDRVTITCRAS... | T | F | IGKV1-8 | 33S345M173S | NaN | NaN | ... | NaN | 378 | 415 | 416 | 551 | 5679 | 43 | T | IGK | standard |
| run1_AAACCTGTCGAGAACG-1_contig_1 | run1_AAACCTGTCGAGAACG-1 | run1_AAACCTGTCGAGAACG-1_contig_1 | ACTGTGGGGGTAAGAGGTTGTGTCCACCATGGCCTGGACTCCTCTC... | MAWTPLLLLFLSHCTGSLSQAVLTQPSSLSASPGASGRLTCTLRSD... | T | F | IGLV5-45 | 28S369M245S | NaN | NaN | ... | NaN | 394 | 431 | 432 | 642 | 13160 | 90 | T | IGL | standard |
| run1_AAACCTGTCGAGAACG-1_contig_2 | run1_AAACCTGTCGAGAACG-1 | run1_AAACCTGTCGAGAACG-1_contig_2 | GGGAGCATCACCCAGCAACCACATCTGTCCTCTAGAGAATCCCCTG... | MDWTWRILFLVAAATGAHSQVQLVQSGGEVKKPGASVKVSCKASGY... | T | F | IGHV1-2 | 64S353M133S | NaN | NaN | ... | NaN | 430 | 479 | 480 | 550 | 5080 | 47 | T | IGH | standard |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| run2_TTTGGTTTCAGAGCTT-1_contig_2 | run2_TTTGGTTTCAGAGCTT-1 | run2_TTTGGTTTCAGAGCTT-1_contig_2 | GGGAGAGCCCTGGGGAGGAACTGCTCAGTTAGGACCCAGAGGGAAC... | MEAPAQLLFLLLLWLPDTTGEIVLTQSPATLSLSPGERATLSCRAS... | T | F | IGKV3-11 | 47S345M170S | NaN | NaN | ... | NaN | 389 | 426 | 427 | 562 | 11867 | 73 | T | IGK | standard |
| run2_TTTGGTTTCAGTGTTG-1_contig_1 | run2_TTTGGTTTCAGTGTTG-1 | run2_TTTGGTTTCAGTGTTG-1_contig_1 | GGGGTCACAAGAGGCAGCGCTCTCGGGACGTCTCCACCATGGCCTG... | MAWALLLLTLLTQDTGSWAQSALTQPASVSGSPGQSITISCTGTSS... | T | F | IGLV2-23 | 38S340M262S | NaN | NaN | ... | NaN | 392 | 429 | 430 | 640 | 6497 | 58 | T | IGL | standard |
| run2_TTTGGTTTCAGTGTTG-1_contig_2 | run2_TTTGGTTTCAGTGTTG-1 | run2_TTTGGTTTCAGTGTTG-1_contig_2 | ATATTTCGTATCTGGGGAGTGACTCCTGTGCCCCACCATGGACACA... | MDTLCSTLLLLTIPSWVLSQITLKESGPTLVKPTQTLTLTCTFSGF... | T | F | IGHV2-5 | 37S358M122S | NaN | NaN | ... | NaN | 399 | 446 | 447 | 517 | 3530 | 33 | T | IGH | standard |
| run2_TTTGGTTTCGGTGTCG-1_contig_2 | run2_TTTGGTTTCGGTGTCG-1 | run2_TTTGGTTTCGGTGTCG-1_contig_2 | GGGAGAGCCCTGGGGAGGAACTGCTCAGTTAGGACCCAGAGGGAAC... | MEAPAQLLFLLLLWLPDTTGEIVLTQSPATLSLSPGERATLSCRAS... | T | F | IGKV3-11 | 47S345M176S | NaN | NaN | ... | NaN | 396 | 432 | 433 | 568 | 3058 | 22 | T | IGK | standard |
| run2_TTTGGTTTCGGTGTCG-1_contig_1 | run2_TTTGGTTTCGGTGTCG-1 | run2_TTTGGTTTCGGTGTCG-1_contig_1 | GAGAGAGGAGCCTTAGCCCTGGATTCCAAGGCCTATCCACTTGGTG... | MELGLRWVFLVAILEGVQCEVQLVESGGGLVKPGGSLRLSCAASGF... | T | F | IGHV3-21 | 73S353M145S | NaN | NaN | ... | NaN | 448 | 500 | 501 | 571 | 1026 | 12 | T | IGH | standard |
4186 rows × 33 columns
[25]:
vdj_merged.metadata
[25]:
| locus_VDJ | locus_VJ | productive_VDJ | productive_VJ | v_call_VDJ | d_call_VDJ | j_call_VDJ | v_call_VJ | j_call_VJ | c_call_VDJ | ... | d_call_B_VDJ_main | j_call_B_VDJ_main | v_call_B_VJ_main | j_call_B_VJ_main | isotype | isotype_status | locus_status | chain_status | rearrangement_status_VDJ | rearrangement_status_VJ | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| run1_AAACCTGTCATATCGG-1 | IGK | T | IGKV1-8 | IGKJ4 | ... | None | None | IGKV1-8 | IGKJ4 | Orphan IGK | Orphan VJ | None | standard | ||||||||
| run1_AAACCTGTCCGTTGTC-1 | IGH | IGK | T | T | IGHV1-69D | IGHD3-22 | IGHJ3 | IGKV1-8 | IGKJ1 | IGHM | ... | IGHD3-22 | IGHJ3 | IGKV1-8 | IGKJ1 | IgM | IgM | IGH + IGK | Single pair | standard | standard |
| run1_AAACCTGTCGAGAACG-1 | IGH | IGL | T | T | IGHV1-2 | IGHJ3 | IGLV5-45 | IGLJ3 | IGHM | ... | None | IGHJ3 | IGLV5-45 | IGLJ3 | IgM | IgM | IGH + IGL | Single pair | standard | standard | |
| run1_AAACCTGTCTTGAGAC-1 | IGH | IGK | T | T | IGHV5-51 | IGHJ3 | IGKV1D-8 | IGKJ2 | IGHM | ... | None | IGHJ3 | IGKV1D-8 | IGKJ2 | IgM | IgM | IGH + IGK | Single pair | standard | standard | |
| run1_AAACGGGAGCGACGTA-1 | IGH | IGL | T | T | IGHV4-59 | IGHJ3 | IGLV3-19 | IGLJ2 | IGHM | ... | None | IGHJ3 | IGLV3-19 | IGLJ2 | IgM | IgM | IGH + IGL | Single pair | standard | standard | |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| run2_ACGTCAAAGTTTCCTT-1 | IGH | T | IGHV3-21 | IGHJ4 | IGHM | ... | None | IGHJ4 | None | None | IgM | IgM | Orphan IGH | Orphan VDJ | standard | None | |||||
| run2_CACTCCACAGATGGCA-1 | IGH | T | IGHV5-51 | IGHJ5 | IGHM | ... | None | IGHJ5 | None | None | IgM | IgM | Orphan IGH | Orphan VDJ | standard | None | |||||
| run2_CGGTTAAGTTTCGCTC-1 | IGH | T | IGHV1-69D | IGHJ4 | IGHM | ... | None | IGHJ4 | None | None | IgM | IgM | Orphan IGH | Orphan VDJ | standard | None | |||||
| run2_GTATCTTTCGAGAGCA-1 | IGH | T | IGHV3-23 | IGHD3-3 | IGHJ4 | IGHD | ... | IGHD3-3 | IGHJ4 | None | None | IgD | IgD | Orphan IGH | Orphan VDJ | standard | None | ||||
| run2_TGACTTTGTTATCGGT-1 | IGH | T | IGHV1-69D | IGHJ3 | IGHM | ... | None | IGHJ3 | None | None | IgM | IgM | Orphan IGH | Orphan VDJ | standard | None |
1988 rows × 44 columns
[ ]: