Reading 10X Cell Ranger output directly

If for whatever reason you’ve decided to skip the reannotation/preprocessing, you can read the files directly from the Cell Ranger output folder with Dandelion’s ddl.read_10x_vdj, which accepts the *_contig_annotations.csv or all_contig_annotations.json file(s) as input. If reading with the .csv file, and the .fasta file and/or .json file(s) are in the same folder, ddl.read_10x_vdj will try to extract additional information not found in the .csv file e.g. contig sequences.

From Cell Ranger V4 onwards, there is also an airr_rearrangement.tsv file that can be used directly with Dandelion. However, doing so will miss out on the reannotation steps but that is entirely up to you.

We will download the airr_rearrangement.tsv file from here:

# bash
wget https://cf.10xgenomics.com/samples/cell-vdj/4.0.0/sc5p_v2_hs_PBMC_10k/sc5p_v2_hs_PBMC_10k_b_filtered_contig_annotations.csv
wget https://cf.10xgenomics.com/samples/cell-vdj/4.0.0/sc5p_v2_hs_PBMC_10k/sc5p_v2_hs_PBMC_10k_b_filtered_contig.fasta
# wget https://cf.10xgenomics.com/samples/cell-vdj/4.0.0/sc5p_v2_hs_PBMC_10k/sc5p_v2_hs_PBMC_10k_b_all_contig_annotations.json
wget https://cf.10xgenomics.com/samples/cell-vdj/4.0.0/sc5p_v2_hs_PBMC_10k/sc5p_v2_hs_PBMC_10k_b_airr_rearrangement.tsv

Import dandelion module

[1]:

import os
import dandelion as ddl

# change directory to somewhere more workable
os.chdir(os.path.expanduser("~/Downloads/dandelion_tutorial/"))
ddl.logging.print_versions()

dandelion==0.3.4.dev30 pandas==2.0.1 numpy==1.24.3 matplotlib==3.7.1 networkx==3.1 scipy==1.11.2

With ddl.read_10x_vdj:

[2]:

folder_location = "sc5p_v2_hs_PBMC_10k"
# or file_location = 'sc5p_v2_hs_PBMC_10k/'
vdj = ddl.read_10x_vdj(
    folder_location, filename_prefix="sc5p_v2_hs_PBMC_10k_b_filtered"
)
vdj

[2]:

Dandelion class object with n_obs = 994 and n_contigs = 2601
    data: 'cell_id', 'is_cell_10x', 'sequence_id', 'high_confidence_10x', 'sequence_length_10x', 'locus', 'v_call', 'd_call', 'j_call', 'c_call', 'complete_vdj', 'productive', 'junction_aa', 'junction', 'consensus_count', 'duplicate_count', 'clone_id', 'raw_consensus_id_10x', 'sequence', 'rearrangement_status'
    metadata: 'clone_id', 'clone_id_by_size', 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'duplicate_count_B_VDJ', 'duplicate_count_B_VJ', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'v_call_B_VDJ_main', 'd_call_B_VDJ_main', 'j_call_B_VDJ_main', 'v_call_B_VJ_main', 'j_call_B_VJ_main', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'

With ddl.read_10x_airr:

[3]:

# read in the airr_rearrangement.tsv file
file_location = (
    "sc5p_v2_hs_PBMC_10k/sc5p_v2_hs_PBMC_10k_b_airr_rearrangement.tsv"
)
vdj = ddl.read_10x_airr(file_location)
vdj

[3]:

Dandelion class object with n_obs = 994 and n_contigs = 2093
    data: 'cell_id', 'sequence_id', 'sequence', 'sequence_aa', 'productive', 'rev_comp', 'v_call', 'v_cigar', 'd_call', 'd_cigar', 'j_call', 'j_cigar', 'c_call', 'c_cigar', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'junction_length', 'junction_aa_length', 'v_sequence_start', 'v_sequence_end', 'd_sequence_start', 'd_sequence_end', 'j_sequence_start', 'j_sequence_end', 'c_sequence_start', 'c_sequence_end', 'consensus_count', 'duplicate_count', 'is_cell', 'locus', 'rearrangement_status'
    metadata: 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'duplicate_count_B_VDJ', 'duplicate_count_B_VJ', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'v_call_B_VDJ_main', 'd_call_B_VDJ_main', 'j_call_B_VDJ_main', 'v_call_B_VJ_main', 'j_call_B_VJ_main', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'

We will continue with the rest of the filtering part of the analysis to show how it slots smoothly with the rest of the workflow.

Import modules for use with scanpy

[4]:

import pandas as pd
import numpy as np
import scanpy as sc
import warnings
import functools
import seaborn as sns
import scipy.stats
import anndata

warnings.filterwarnings("ignore")
sc.logging.print_header()

scanpy==1.9.3 anndata==0.9.1 umap==0.5.3 numpy==1.24.3 scipy==1.11.2 pandas==2.0.1 scikit-learn==1.3.0 statsmodels==0.14.0 python-igraph==0.10.6 pynndescent==0.5.10

Import the transcriptome data

[5]:

adata = sc.read_10x_h5(
    "sc5p_v2_hs_PBMC_10k/filtered_feature_bc_matrix.h5", gex_only=True
)
adata.obs["sample_id"] = "sc5p_v2_hs_PBMC_10k"
adata.var_names_make_unique()
adata

[5]:

AnnData object with n_obs × n_vars = 10553 × 36601
    obs: 'sample_id'
    var: 'gene_ids', 'feature_types', 'genome'

Run QC on the transcriptome data.

[6]:

ddl.pp.recipe_scanpy_qc(adata)
adata

[6]:

AnnData object with n_obs × n_vars = 10553 × 36601
    obs: 'sample_id', 'n_genes', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'scrublet_score', 'is_doublet', 'filter_rna'
    var: 'gene_ids', 'feature_types', 'genome'

Run the filtering of bcr data. Note that I’m using the Dandelion object as input rather than the pandas dataframe (yes both types of input will works. In fact, a file path to the .tsv will work too).

[7]:

# The function will return both objects.
vdj, adata = ddl.pp.check_contigs(vdj, adata)

Preparing data: 2093it [00:00, 16146.33it/s]
Scanning for poor quality/ambiguous contigs: 100%|██████████| 994/994 [00:01<00:00, 977.40it/s]

Check the output V(D)J table

The vdj table is returned as a Dandelion class object in the .data slot; if a file was provided for filter_bcr above, a new file will be created in the same folder with the filtered prefix. Note that this V(D)J table is indexed based on contigs (sequence_id).

[8]:

vdj

[8]:

Dandelion class object with n_obs = 984 and n_contigs = 2093
    data: 'cell_id', 'sequence_id', 'sequence', 'sequence_aa', 'productive', 'rev_comp', 'v_call', 'v_cigar', 'd_call', 'd_cigar', 'j_call', 'j_cigar', 'c_call', 'c_cigar', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'junction_length', 'junction_aa_length', 'v_sequence_start', 'v_sequence_end', 'd_sequence_start', 'd_sequence_end', 'j_sequence_start', 'j_sequence_end', 'c_sequence_start', 'c_sequence_end', 'consensus_count', 'duplicate_count', 'is_cell', 'locus', 'rearrangement_status', 'ambiguous'
    metadata: 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'duplicate_count_B_VDJ', 'duplicate_count_B_VJ', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'v_call_B_VDJ_main', 'd_call_B_VDJ_main', 'j_call_B_VDJ_main', 'v_call_B_VJ_main', 'j_call_B_VJ_main', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'

Check the AnnData object as well

And the AnnData object is indexed based on cells.

[9]:

adata

[9]:

AnnData object with n_obs × n_vars = 10553 × 36601
    obs: 'sample_id', 'n_genes', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'scrublet_score', 'is_doublet', 'filter_rna', 'has_contig', 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'duplicate_count_B_VDJ', 'duplicate_count_B_VJ', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'v_call_B_VDJ_main', 'd_call_B_VDJ_main', 'j_call_B_VDJ_main', 'v_call_B_VJ_main', 'j_call_B_VJ_main', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'
    var: 'gene_ids', 'feature_types', 'genome'

The number of cells that actually has a matching BCR can be tabluated.

[10]:

pd.crosstab(adata.obs["has_contig"], adata.obs["chain_status"])

[10]:

chain_status	Extra pair	Extra pair-exception	No_contig	Orphan VDJ	Orphan VJ	Single pair
has_contig
No_contig	0	0	9569	0	0	0
True	57	2	0	5	39	881

Now actually filter the AnnData object and run through a standard workflow starting by filtering genes and normalizing the data

Because the ‘filtered’ AnnData object was returned as a filtered but otherwise unprocessed object, we still need to normalize and run through the usual process here. The following is just a standard scanpy workflow.

[11]:

adata = adata[
    adata.obs["filter_rna"] == "False"
]  # from ddl.pp.recipe_scanpy_qc
# filter genes
sc.pp.filter_genes(adata, min_cells=3)
# Normalize the counts
sc.pp.normalize_total(adata, target_sum=1e4)
# Logarithmize the data
sc.pp.log1p(adata)
# Stash the normalised counts
adata.raw = adata

Identify highly-variable genes

[12]:

sc.pp.highly_variable_genes(adata, min_mean=0.0125, max_mean=3, min_disp=0.5)
sc.pl.highly_variable_genes(adata)

../_images/notebooks_1b_dandelion_noreannotation-10x_data_25_0.png

Filter the genes to only those marked as highly-variable

[13]:

adata = adata[:, adata.var.highly_variable]

Regress out effects of total counts per cell and the percentage of mitochondrial genes expressed. Scale the data to unit variance.

[14]:

sc.pp.regress_out(adata, ["total_counts", "pct_counts_mt"])
sc.pp.scale(adata, max_value=10)

Run PCA

[15]:

sc.tl.pca(adata, svd_solver="arpack")
sc.pl.pca_variance_ratio(adata, log=True, n_pcs=50)

../_images/notebooks_1b_dandelion_noreannotation-10x_data_31_0.png

Computing the neighborhood graph, umap and clusters

[16]:

# Computing the neighborhood graph
sc.pp.neighbors(adata)
# Embedding the neighborhood graph
sc.tl.umap(adata)
# Clustering the neighborhood graph
sc.tl.leiden(adata)

Visualizing the clusters and whether or not there’s a corresponding BCR

[17]:

sc.pl.umap(adata, color=["leiden", "chain_status"])

../_images/notebooks_1b_dandelion_noreannotation-10x_data_35_0.png

Visualizing some B cell genes

[18]:

sc.pl.umap(adata, color=["IGHM", "JCHAIN"])

../_images/notebooks_1b_dandelion_noreannotation-10x_data_37_0.png

Save AnnData

We can save this AnnData object for now.

[19]:

adata.write("adata2.h5ad", compression="gzip")

Save dandelion

To save the vdj object, we have two options - either save the .data and .metadata slots with pandas’ functions:

[20]:

vdj.data.to_csv("filtered_vdj_table2.tsv", sep="\t")

[21]:

vdj.write_h5ddl("dandelion_results2.h5ddl", complib="bzip2")

Concatenating multiple bcr objects

It is quite common that one might be trying to analyse data from multiple samples. In that case, dandelion has a concat function to merge the data.

We will simulate a second object but reading in the same file.

[22]:

vdj1 = ddl.read_10x_airr(file_location)
vdj2 = ddl.read_10x_airr(file_location)

Before you merge the objects, make sure that the “cell_id” and “sequence_id” are distinct so that you can distinguish them later

[23]:

# note: if the you reannotated the data with the full processing pipeline i.e. ran ddl.pp.format_fasta, this is already taken care of.
vdj1.data.cell_id = ["run1" + "_" + x for x in vdj1.data.cell_id]
vdj2.data.cell_id = ["run2" + "_" + x for x in vdj2.data.cell_id]

# probably don't need to modify the sequence_id. the concat function will check for unique and append -0, -1  etc when it encounters non-unique sequence_ids
# vdj1.data.sequence_id = ["run1" + '_' + x for x in vdj1.data.sequence_id]
# vdj2.data.sequence_id = ["run2" + '_' + x for x in vdj2.data.sequence_id]
# you can also reinitialise just in case so that the info is propagated properly (unnecessary here)
# vdj1 = ddl.Dandelion(vdj1.data)
# vdj2 = ddl.Dandelion(vdj2.data)

[24]:

vdj_merged = ddl.concat([vdj1, vdj2])
vdj_merged

[24]:

Dandelion class object with n_obs = 1988 and n_contigs = 4186
    data: 'cell_id', 'sequence_id', 'sequence', 'sequence_aa', 'productive', 'rev_comp', 'v_call', 'v_cigar', 'd_call', 'd_cigar', 'j_call', 'j_cigar', 'c_call', 'c_cigar', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'junction_length', 'junction_aa_length', 'v_sequence_start', 'v_sequence_end', 'd_sequence_start', 'd_sequence_end', 'j_sequence_start', 'j_sequence_end', 'c_sequence_start', 'c_sequence_end', 'consensus_count', 'duplicate_count', 'is_cell', 'locus', 'rearrangement_status'
    metadata: 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'duplicate_count_B_VDJ', 'duplicate_count_B_VJ', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'v_call_B_VDJ_main', 'd_call_B_VDJ_main', 'j_call_B_VDJ_main', 'v_call_B_VJ_main', 'j_call_B_VJ_main', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'

[25]:

vdj_merged.data

[25]:

	cell_id	sequence_id	sequence	sequence_aa	productive	rev_comp	v_call	v_cigar	d_call	d_cigar	...	d_sequence_end	j_sequence_start	j_sequence_end	c_sequence_start	c_sequence_end	consensus_count	duplicate_count	is_cell	locus	rearrangement_status
sequence_id
AAACCTGTCATATCGG-1_contig_1-0	run1_AAACCTGTCATATCGG-1	AAACCTGTCATATCGG-1_contig_1-0	TGGGGAGGAGTCAGTCCCAACCAGGACACGGCCTGGACATGAGGGT...	MRVPAQLLGLLLLWLSGARCDIQMTQSPSSLSASVGDRVTITCQAT...	T	F	IGKV1-8	38S314M204S	NaN	NaN	...	NaN	384	420	421	556	9139	68	T	IGK	standard
AAACCTGTCCGTTGTC-1_contig_2-0	run1_AAACCTGTCCGTTGTC-1	AAACCTGTCCGTTGTC-1_contig_2-0	ATCACATAACAACCACATTCCTCCTCTAAAGAAGCCCCTGGGAGCA...	MDWTWRFLFVVAAATGVQSQVQLVQSGAEVKKPGSSVKVSCKASGG...	T	F	IGHV1-69D	58S353M154S	IGHD3-22	411S31M123S	...	442.0	445	494	495	565	4161	51	T	IGH	standard
AAACCTGTCCGTTGTC-1_contig_1-0	run1_AAACCTGTCCGTTGTC-1	AAACCTGTCCGTTGTC-1_contig_1-0	AGGAGTCAGACCCTGTCAGGACACAGCATAGACATGAGGGTCCCCG...	MRVPAQLLGLLLLWLPGARCAIRMTQSPSSFSASTGDRVTITCRAS...	T	F	IGKV1-8	33S345M173S	NaN	NaN	...	NaN	378	415	416	551	5679	43	T	IGK	standard
AAACCTGTCGAGAACG-1_contig_1-0	run1_AAACCTGTCGAGAACG-1	AAACCTGTCGAGAACG-1_contig_1-0	ACTGTGGGGGTAAGAGGTTGTGTCCACCATGGCCTGGACTCCTCTC...	MAWTPLLLLFLSHCTGSLSQAVLTQPSSLSASPGASGRLTCTLRSD...	T	F	IGLV5-45	28S369M245S	NaN	NaN	...	NaN	394	431	432	642	13160	90	T	IGL	standard
AAACCTGTCGAGAACG-1_contig_2-0	run1_AAACCTGTCGAGAACG-1	AAACCTGTCGAGAACG-1_contig_2-0	GGGAGCATCACCCAGCAACCACATCTGTCCTCTAGAGAATCCCCTG...	MDWTWRILFLVAAATGAHSQVQLVQSGGEVKKPGASVKVSCKASGY...	T	F	IGHV1-2	64S353M133S	NaN	NaN	...	NaN	430	479	480	550	5080	47	T	IGH	standard
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
TTTGGTTTCAGAGCTT-1_contig_2-1	run2_TTTGGTTTCAGAGCTT-1	TTTGGTTTCAGAGCTT-1_contig_2-1	GGGAGAGCCCTGGGGAGGAACTGCTCAGTTAGGACCCAGAGGGAAC...	MEAPAQLLFLLLLWLPDTTGEIVLTQSPATLSLSPGERATLSCRAS...	T	F	IGKV3-11	47S345M170S	NaN	NaN	...	NaN	389	426	427	562	11867	73	T	IGK	standard
TTTGGTTTCAGTGTTG-1_contig_1-1	run2_TTTGGTTTCAGTGTTG-1	TTTGGTTTCAGTGTTG-1_contig_1-1	GGGGTCACAAGAGGCAGCGCTCTCGGGACGTCTCCACCATGGCCTG...	MAWALLLLTLLTQDTGSWAQSALTQPASVSGSPGQSITISCTGTSS...	T	F	IGLV2-23	38S340M262S	NaN	NaN	...	NaN	392	429	430	640	6497	58	T	IGL	standard
TTTGGTTTCAGTGTTG-1_contig_2-1	run2_TTTGGTTTCAGTGTTG-1	TTTGGTTTCAGTGTTG-1_contig_2-1	ATATTTCGTATCTGGGGAGTGACTCCTGTGCCCCACCATGGACACA...	MDTLCSTLLLLTIPSWVLSQITLKESGPTLVKPTQTLTLTCTFSGF...	T	F	IGHV2-5	37S358M122S	NaN	NaN	...	NaN	399	446	447	517	3530	33	T	IGH	standard
TTTGGTTTCGGTGTCG-1_contig_2-1	run2_TTTGGTTTCGGTGTCG-1	TTTGGTTTCGGTGTCG-1_contig_2-1	GGGAGAGCCCTGGGGAGGAACTGCTCAGTTAGGACCCAGAGGGAAC...	MEAPAQLLFLLLLWLPDTTGEIVLTQSPATLSLSPGERATLSCRAS...	T	F	IGKV3-11	47S345M176S	NaN	NaN	...	NaN	396	432	433	568	3058	22	T	IGK	standard
TTTGGTTTCGGTGTCG-1_contig_1-1	run2_TTTGGTTTCGGTGTCG-1	TTTGGTTTCGGTGTCG-1_contig_1-1	GAGAGAGGAGCCTTAGCCCTGGATTCCAAGGCCTATCCACTTGGTG...	MELGLRWVFLVAILEGVQCEVQLVESGGGLVKPGGSLRLSCAASGF...	T	F	IGHV3-21	73S353M145S	NaN	NaN	...	NaN	448	500	501	571	1026	12	T	IGH	standard

4186 rows × 33 columns

[26]:

vdj_merged.metadata

[26]:

	locus_VDJ	locus_VJ	productive_VDJ	productive_VJ	v_call_VDJ	d_call_VDJ	j_call_VDJ	v_call_VJ	j_call_VJ	c_call_VDJ	...	d_call_B_VDJ_main	j_call_B_VDJ_main	v_call_B_VJ_main	j_call_B_VJ_main	isotype	isotype_status	locus_status	chain_status	rearrangement_status_VDJ	rearrangement_status_VJ
run1_AAACCTGTCATATCGG-1	None	IGK	None	T	None	None	None	IGKV1-8	IGKJ4	None	...	None	None	IGKV1-8	IGKJ4	None	None	Orphan IGK	Orphan VJ	None	standard
run1_AAACCTGTCCGTTGTC-1	IGH	IGK	T	T	IGHV1-69D	IGHD3-22	IGHJ3	IGKV1-8	IGKJ1	IGHM	...	IGHD3-22	IGHJ3	IGKV1-8	IGKJ1	IgM	IgM	IGH + IGK	Single pair	standard	standard
run1_AAACCTGTCGAGAACG-1	IGH	IGL	T	T	IGHV1-2	None	IGHJ3	IGLV5-45	IGLJ3	IGHM	...	None	IGHJ3	IGLV5-45	IGLJ3	IgM	IgM	IGH + IGL	Single pair	standard	standard
run1_AAACCTGTCTTGAGAC-1	IGH	IGK	T	T	IGHV5-51	None	IGHJ3	IGKV1D-8	IGKJ2	IGHM	...	None	IGHJ3	IGKV1D-8	IGKJ2	IgM	IgM	IGH + IGK	Single pair	standard	standard
run1_AAACGGGAGCGACGTA-1	IGH	IGL	T	T	IGHV4-59	None	IGHJ3	IGLV3-19	IGLJ2	IGHM	...	None	IGHJ3	IGLV3-19	IGLJ2	IgM	IgM	IGH + IGL	Single pair	standard	standard
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
run2_ACGTCAAAGTTTCCTT-1	IGH	None	T	None	IGHV3-21	None	IGHJ4	None	None	IGHM	...	None	IGHJ4	None	None	IgM	IgM	Orphan IGH	Orphan VDJ	standard	None
run2_CACTCCACAGATGGCA-1	IGH	None	T	None	IGHV5-51	None	IGHJ5	None	None	IGHM	...	None	IGHJ5	None	None	IgM	IgM	Orphan IGH	Orphan VDJ	standard	None
run2_CGGTTAAGTTTCGCTC-1	IGH	None	T	None	IGHV1-69D	None	IGHJ4	None	None	IGHM	...	None	IGHJ4	None	None	IgM	IgM	Orphan IGH	Orphan VDJ	standard	None
run2_GTATCTTTCGAGAGCA-1	IGH	None	T	None	IGHV3-23	IGHD3-3	IGHJ4	None	None	IGHD	...	IGHD3-3	IGHJ4	None	None	IgD	IgD	Orphan IGH	Orphan VDJ	standard	None
run2_TGACTTTGTTATCGGT-1	IGH	None	T	None	IGHV1-69D	None	IGHJ3	None	None	IGHM	...	None	IGHJ3	None	None	IgM	IgM	Orphan IGH	Orphan VDJ	standard	None

1988 rows × 44 columns

[ ]: