Dandelion class

Much of the functions and utility of the dandelion package revolves around the Dandelion class object. The class will act as an intermediary object for storage and flexible interaction with other tools. This section will run through a quick primer to the Dandelion class.

Import modules

[1]:
import os

os.chdir("dandelion_tutorial/")
import dandelion as ddl

ddl.set_backend("base")

ddl.logging.print_versions()
dandelion==1.0.0a1.dev36 pandas==2.3.3 numpy==2.3.5 matplotlib==3.10.6 networkx==3.6.1 scipy==1.15.2
[2]:
vdj = ddl.read_h5ddl("dandelion_results.h5ddl")
# let's run find_clones again as this was not stored.
ddl.tl.find_clones(vdj)
vdj
Finding clones based on B cell VDJ chains using junction_aa: 100%|██████████| 228/228 [00:00<00:00, 3802.21it/s]
Finding clones based on B cell VJ chains using junction_aa: 100%|██████████| 213/213 [00:00<00:00, 4478.26it/s]
Refining clone assignment based on VJ chain pairing : 100%|██████████| 2493/2493 [00:00<00:00, 333420.49it/s]
[2]:
Dandelion class object with n_obs = 2493 and n_contigs = 7355
    data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'c_call', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end', 'v_score', 'v_identity', 'v_support', 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2', 'cdr3', 'cell_id', 'consensus_count', 'umi_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'j_support_igblastn', 'j_score_igblastn', 'j_call_igblastn', 'j_call_blastn', 'j_identity_blastn', 'j_alignment_length_blastn', 'j_number_of_mismatches_blastn', 'j_number_of_gap_openings_blastn', 'j_sequence_start_blastn', 'j_sequence_end_blastn', 'j_germline_start_blastn', 'j_germline_end_blastn', 'j_support_blastn', 'j_score_blastn', 'j_sequence_alignment_blastn', 'j_germline_alignment_blastn', 'j_source', 'd_support_igblastn', 'd_score_igblastn', 'd_call_igblastn', 'd_call_blastn', 'd_identity_blastn', 'd_alignment_length_blastn', 'd_number_of_mismatches_blastn', 'd_number_of_gap_openings_blastn', 'd_sequence_start_blastn', 'd_sequence_end_blastn', 'd_germline_start_blastn', 'd_germline_end_blastn', 'd_support_blastn', 'd_score_blastn', 'd_sequence_alignment_blastn', 'd_germline_alignment_blastn', 'd_source', 'v_call_genotyped', 'germline_alignment_d_mask', 'sample_id', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_sequence_end', 'c_score', 'c_identity', 'c_call_10x', 'junction_aa_length', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa', 'sequence_alignment_aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'complete_vdj', 'j_call_multimappers', 'j_call_multiplicity', 'j_call_sequence_start_multimappers', 'j_call_sequence_end_multimappers', 'j_call_support_multimappers', 'mu_count', 'ambiguous', 'extra', 'rearrangement_status', 'clone_id'
    metadata: 'clone_id', 'clone_id_rank', 'sample_id', 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'umi_count_B_VDJ', 'umi_count_B_VJ', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'v_call_B_VDJ_main', 'd_call_B_VDJ_main', 'j_call_B_VDJ_main', 'v_call_B_VJ_main', 'j_call_B_VJ_main', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'

Essentially, the .data slot holds the AIRR contig table while the .metadata holds a collapsed version that is compatible with combining with AnnData’s .obs slot. You can retrieve these slots like a typical class object; for example, if I want the metadata:

[3]:
vdj.metadata
[3]:
clone_id clone_id_rank sample_id locus_VDJ locus_VJ productive_VDJ productive_VJ v_call_VDJ d_call_VDJ j_call_VDJ ... d_call_B_VDJ_main j_call_B_VDJ_main v_call_B_VJ_main j_call_B_VJ_main isotype isotype_status locus_status chain_status rearrangement_status_VDJ rearrangement_status_VJ
sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG B_VJ_187_2_3 135 sc5p_v2_hs_PBMC_10k_b None IGK None T None None None ... None None IGKV1-33,IGKV1D-33 IGKJ4 None None Orphan IGK Orphan VJ None standard
sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC B_VDJ_220_3_2_VJ_4_2_1 2196 sc5p_v2_hs_PBMC_10k_b IGH IGK T T IGHV1-69D,IGHV1-69 IGHD3-22 IGHJ3 ... IGHD3-22 IGHJ3 IGKV1-8 IGKJ1 IgM IgM IGH + IGK Single pair standard standard
sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG B_VDJ_85_1_1_VJ_194_1_1 1750 sc5p_v2_hs_PBMC_10k_b IGH IGL T T IGHV1-2 None IGHJ3 ... None IGHJ3 IGLV5-45 IGLJ3 IgM IgM IGH + IGL Single pair standard standard
sc5p_v2_hs_PBMC_10k_b_AAACCTGTCTTGAGAC B_VDJ_147_4_6_VJ_50_1_1 1751 sc5p_v2_hs_PBMC_10k_b IGH IGK T T IGHV5-51 None IGHJ3 ... None IGHJ3 IGKV1D-8 IGKJ2 IgM IgM IGH + IGK Single pair standard standard
sc5p_v2_hs_PBMC_10k_b_AAACGGGAGCGACGTA B_VDJ_145_2_1_VJ_56_2_1 1752 sc5p_v2_hs_PBMC_10k_b IGH IGL T T IGHV4-4 IGHD6-13 IGHJ3 ... IGHD6-13 IGHJ3 IGLV3-19 IGLJ2,IGLJ3 IgM IgM IGH + IGL Single pair standard standard
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
vdj_v1_hs_pbmc3_b_TTTCCTCAGCGCTTAT B_VDJ_66_5_4_VJ_183_1_1 895 vdj_v1_hs_pbmc3_b IGH IGK T T IGHV3-30 IGHD4-17 IGHJ6 ... IGHD4-17 IGHJ6 IGKV2-30 IGKJ2 IgM IgM IGH + IGK Single pair standard standard
vdj_v1_hs_pbmc3_b_TTTCCTCAGGGAAACA B_VDJ_205_2_1_VJ_105_4_1 896 vdj_v1_hs_pbmc3_b IGH IGK T T IGHV4-61 IGHD6-13 IGHJ2 ... IGHD6-13 IGHJ2 IGKV1-39,IGKV1D-39 IGKJ1 IgM IgM IGH + IGK Single pair standard standard
vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC B_VDJ_120_5_2_VJ_58_2_1 897 vdj_v1_hs_pbmc3_b IGH IGK T T IGHV1-46 IGHD2-15 IGHJ5 ... IGHD2-15 IGHJ5 IGKV1-39,IGKV1D-39 IGKJ2 IgM IgM IGH + IGK Single pair standard standard
vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG B_VDJ_197_4_2_VJ_70_3_1 898 vdj_v1_hs_pbmc3_b IGH IGL T T IGHV1-69D,IGHV1-69 IGHD2-15 IGHJ6 ... IGHD2-15 IGHJ6 IGLV1-47 IGLJ3 IgM IgM IGH + IGL Single pair standard standard
vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG B_VDJ_206_6_7_VJ_87_3_1 2598 vdj_v1_hs_pbmc3_b IGH IGL T T IGHV3-23,IGHV3-23D None IGHJ4 ... None IGHJ4 IGLV2-11 IGLJ2,IGLJ3 IgM IgM IGH + IGL Single pair standard standard

2493 rows × 47 columns

slicing

You can slice the Dandelion object via the .data or .metadata via their indices, with the behavior similar to how it is in pandas DataFrame and AnnData.

slicing .data

[4]:
# get the largest clone
largest_clone = vdj.data["clone_id"].value_counts().idxmax()

vdj[vdj.data["clone_id"] == largest_clone]
[4]:
Dandelion class object with n_obs = 626 and n_contigs = 714
    data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'c_call', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end', 'v_score', 'v_identity', 'v_support', 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2', 'cdr3', 'cell_id', 'consensus_count', 'umi_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'j_support_igblastn', 'j_score_igblastn', 'j_call_igblastn', 'j_call_blastn', 'j_identity_blastn', 'j_alignment_length_blastn', 'j_number_of_mismatches_blastn', 'j_number_of_gap_openings_blastn', 'j_sequence_start_blastn', 'j_sequence_end_blastn', 'j_germline_start_blastn', 'j_germline_end_blastn', 'j_support_blastn', 'j_score_blastn', 'j_sequence_alignment_blastn', 'j_germline_alignment_blastn', 'j_source', 'd_support_igblastn', 'd_score_igblastn', 'd_call_igblastn', 'd_call_blastn', 'd_identity_blastn', 'd_alignment_length_blastn', 'd_number_of_mismatches_blastn', 'd_number_of_gap_openings_blastn', 'd_sequence_start_blastn', 'd_sequence_end_blastn', 'd_germline_start_blastn', 'd_germline_end_blastn', 'd_support_blastn', 'd_score_blastn', 'd_sequence_alignment_blastn', 'd_germline_alignment_blastn', 'd_source', 'v_call_genotyped', 'germline_alignment_d_mask', 'sample_id', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_sequence_end', 'c_score', 'c_identity', 'c_call_10x', 'junction_aa_length', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa', 'sequence_alignment_aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'complete_vdj', 'j_call_multimappers', 'j_call_multiplicity', 'j_call_sequence_start_multimappers', 'j_call_sequence_end_multimappers', 'j_call_support_multimappers', 'mu_count', 'ambiguous', 'extra', 'rearrangement_status', 'clone_id'
    metadata: 'clone_id', 'clone_id_rank', 'sample_id', 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'umi_count_B_VDJ', 'umi_count_B_VJ', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'v_call_B_VDJ_main', 'd_call_B_VDJ_main', 'j_call_B_VDJ_main', 'v_call_B_VJ_main', 'j_call_B_VJ_main', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'
[5]:
vdj[
    vdj.data_names.isin(
        [
            "sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG_contig_1",
            "sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_contig_2",
            "sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_contig_1",
            "sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_contig_1",
            "sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_contig_2",
        ]
    )
]
[5]:
Dandelion class object with n_obs = 3 and n_contigs = 5
    data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'c_call', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end', 'v_score', 'v_identity', 'v_support', 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2', 'cdr3', 'cell_id', 'consensus_count', 'umi_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'j_support_igblastn', 'j_score_igblastn', 'j_call_igblastn', 'j_call_blastn', 'j_identity_blastn', 'j_alignment_length_blastn', 'j_number_of_mismatches_blastn', 'j_number_of_gap_openings_blastn', 'j_sequence_start_blastn', 'j_sequence_end_blastn', 'j_germline_start_blastn', 'j_germline_end_blastn', 'j_support_blastn', 'j_score_blastn', 'j_sequence_alignment_blastn', 'j_germline_alignment_blastn', 'j_source', 'd_support_igblastn', 'd_score_igblastn', 'd_call_igblastn', 'd_call_blastn', 'd_identity_blastn', 'd_alignment_length_blastn', 'd_number_of_mismatches_blastn', 'd_number_of_gap_openings_blastn', 'd_sequence_start_blastn', 'd_sequence_end_blastn', 'd_germline_start_blastn', 'd_germline_end_blastn', 'd_support_blastn', 'd_score_blastn', 'd_sequence_alignment_blastn', 'd_germline_alignment_blastn', 'd_source', 'v_call_genotyped', 'germline_alignment_d_mask', 'sample_id', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_sequence_end', 'c_score', 'c_identity', 'c_call_10x', 'junction_aa_length', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa', 'sequence_alignment_aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'complete_vdj', 'j_call_multimappers', 'j_call_multiplicity', 'j_call_sequence_start_multimappers', 'j_call_sequence_end_multimappers', 'j_call_support_multimappers', 'mu_count', 'ambiguous', 'extra', 'rearrangement_status', 'clone_id'
    metadata: 'clone_id', 'clone_id_rank', 'sample_id', 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'umi_count_B_VDJ', 'umi_count_B_VJ', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'v_call_B_VDJ_main', 'd_call_B_VDJ_main', 'j_call_B_VDJ_main', 'v_call_B_VJ_main', 'j_call_B_VJ_main', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'

slicing .metadata

[6]:
vdj[vdj.metadata["productive_VDJ"].isin(["T", "T|T"])]
[6]:
Dandelion class object with n_obs = 2334 and n_contigs = 5557
    data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'c_call', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end', 'v_score', 'v_identity', 'v_support', 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2', 'cdr3', 'cell_id', 'consensus_count', 'umi_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'j_support_igblastn', 'j_score_igblastn', 'j_call_igblastn', 'j_call_blastn', 'j_identity_blastn', 'j_alignment_length_blastn', 'j_number_of_mismatches_blastn', 'j_number_of_gap_openings_blastn', 'j_sequence_start_blastn', 'j_sequence_end_blastn', 'j_germline_start_blastn', 'j_germline_end_blastn', 'j_support_blastn', 'j_score_blastn', 'j_sequence_alignment_blastn', 'j_germline_alignment_blastn', 'j_source', 'd_support_igblastn', 'd_score_igblastn', 'd_call_igblastn', 'd_call_blastn', 'd_identity_blastn', 'd_alignment_length_blastn', 'd_number_of_mismatches_blastn', 'd_number_of_gap_openings_blastn', 'd_sequence_start_blastn', 'd_sequence_end_blastn', 'd_germline_start_blastn', 'd_germline_end_blastn', 'd_support_blastn', 'd_score_blastn', 'd_sequence_alignment_blastn', 'd_germline_alignment_blastn', 'd_source', 'v_call_genotyped', 'germline_alignment_d_mask', 'sample_id', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_sequence_end', 'c_score', 'c_identity', 'c_call_10x', 'junction_aa_length', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa', 'sequence_alignment_aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'complete_vdj', 'j_call_multimappers', 'j_call_multiplicity', 'j_call_sequence_start_multimappers', 'j_call_sequence_end_multimappers', 'j_call_support_multimappers', 'mu_count', 'ambiguous', 'extra', 'rearrangement_status', 'clone_id'
    metadata: 'clone_id', 'clone_id_rank', 'sample_id', 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'umi_count_B_VDJ', 'umi_count_B_VJ', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'v_call_B_VDJ_main', 'd_call_B_VDJ_main', 'j_call_B_VDJ_main', 'v_call_B_VJ_main', 'j_call_B_VJ_main', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'
[7]:
vdj[vdj.metadata_names == "vdj_v1_hs_pbmc3_b_TTTCCTCAGCGCTTAT"]
[7]:
Dandelion class object with n_obs = 1 and n_contigs = 2
    data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'c_call', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end', 'v_score', 'v_identity', 'v_support', 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2', 'cdr3', 'cell_id', 'consensus_count', 'umi_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'j_support_igblastn', 'j_score_igblastn', 'j_call_igblastn', 'j_call_blastn', 'j_identity_blastn', 'j_alignment_length_blastn', 'j_number_of_mismatches_blastn', 'j_number_of_gap_openings_blastn', 'j_sequence_start_blastn', 'j_sequence_end_blastn', 'j_germline_start_blastn', 'j_germline_end_blastn', 'j_support_blastn', 'j_score_blastn', 'j_sequence_alignment_blastn', 'j_germline_alignment_blastn', 'j_source', 'd_support_igblastn', 'd_score_igblastn', 'd_call_igblastn', 'd_call_blastn', 'd_identity_blastn', 'd_alignment_length_blastn', 'd_number_of_mismatches_blastn', 'd_number_of_gap_openings_blastn', 'd_sequence_start_blastn', 'd_sequence_end_blastn', 'd_germline_start_blastn', 'd_germline_end_blastn', 'd_support_blastn', 'd_score_blastn', 'd_sequence_alignment_blastn', 'd_germline_alignment_blastn', 'd_source', 'v_call_genotyped', 'germline_alignment_d_mask', 'sample_id', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_sequence_end', 'c_score', 'c_identity', 'c_call_10x', 'junction_aa_length', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa', 'sequence_alignment_aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'complete_vdj', 'j_call_multimappers', 'j_call_multiplicity', 'j_call_sequence_start_multimappers', 'j_call_sequence_end_multimappers', 'j_call_support_multimappers', 'mu_count', 'ambiguous', 'extra', 'rearrangement_status', 'clone_id'
    metadata: 'clone_id', 'clone_id_rank', 'sample_id', 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'umi_count_B_VDJ', 'umi_count_B_VJ', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'v_call_B_VDJ_main', 'd_call_B_VDJ_main', 'j_call_B_VDJ_main', 'v_call_B_VJ_main', 'j_call_B_VJ_main', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'

copy

You can deep copy the Dandelion object to another variable which will inherit all slots:

[8]:
vdj2 = vdj.copy()
vdj2.metadata
[8]:
clone_id clone_id_rank sample_id locus_VDJ locus_VJ productive_VDJ productive_VJ v_call_VDJ d_call_VDJ j_call_VDJ ... d_call_B_VDJ_main j_call_B_VDJ_main v_call_B_VJ_main j_call_B_VJ_main isotype isotype_status locus_status chain_status rearrangement_status_VDJ rearrangement_status_VJ
sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG B_VJ_187_2_3 135 sc5p_v2_hs_PBMC_10k_b None IGK None T None None None ... None None IGKV1-33,IGKV1D-33 IGKJ4 None None Orphan IGK Orphan VJ None standard
sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC B_VDJ_220_3_2_VJ_4_2_1 2196 sc5p_v2_hs_PBMC_10k_b IGH IGK T T IGHV1-69D,IGHV1-69 IGHD3-22 IGHJ3 ... IGHD3-22 IGHJ3 IGKV1-8 IGKJ1 IgM IgM IGH + IGK Single pair standard standard
sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG B_VDJ_85_1_1_VJ_194_1_1 1750 sc5p_v2_hs_PBMC_10k_b IGH IGL T T IGHV1-2 None IGHJ3 ... None IGHJ3 IGLV5-45 IGLJ3 IgM IgM IGH + IGL Single pair standard standard
sc5p_v2_hs_PBMC_10k_b_AAACCTGTCTTGAGAC B_VDJ_147_4_6_VJ_50_1_1 1751 sc5p_v2_hs_PBMC_10k_b IGH IGK T T IGHV5-51 None IGHJ3 ... None IGHJ3 IGKV1D-8 IGKJ2 IgM IgM IGH + IGK Single pair standard standard
sc5p_v2_hs_PBMC_10k_b_AAACGGGAGCGACGTA B_VDJ_145_2_1_VJ_56_2_1 1752 sc5p_v2_hs_PBMC_10k_b IGH IGL T T IGHV4-4 IGHD6-13 IGHJ3 ... IGHD6-13 IGHJ3 IGLV3-19 IGLJ2,IGLJ3 IgM IgM IGH + IGL Single pair standard standard
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
vdj_v1_hs_pbmc3_b_TTTCCTCAGCGCTTAT B_VDJ_66_5_4_VJ_183_1_1 895 vdj_v1_hs_pbmc3_b IGH IGK T T IGHV3-30 IGHD4-17 IGHJ6 ... IGHD4-17 IGHJ6 IGKV2-30 IGKJ2 IgM IgM IGH + IGK Single pair standard standard
vdj_v1_hs_pbmc3_b_TTTCCTCAGGGAAACA B_VDJ_205_2_1_VJ_105_4_1 896 vdj_v1_hs_pbmc3_b IGH IGK T T IGHV4-61 IGHD6-13 IGHJ2 ... IGHD6-13 IGHJ2 IGKV1-39,IGKV1D-39 IGKJ1 IgM IgM IGH + IGK Single pair standard standard
vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC B_VDJ_120_5_2_VJ_58_2_1 897 vdj_v1_hs_pbmc3_b IGH IGK T T IGHV1-46 IGHD2-15 IGHJ5 ... IGHD2-15 IGHJ5 IGKV1-39,IGKV1D-39 IGKJ2 IgM IgM IGH + IGK Single pair standard standard
vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG B_VDJ_197_4_2_VJ_70_3_1 898 vdj_v1_hs_pbmc3_b IGH IGL T T IGHV1-69D,IGHV1-69 IGHD2-15 IGHJ6 ... IGHD2-15 IGHJ6 IGLV1-47 IGLJ3 IgM IgM IGH + IGL Single pair standard standard
vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG B_VDJ_206_6_7_VJ_87_3_1 2598 vdj_v1_hs_pbmc3_b IGH IGL T T IGHV3-23,IGHV3-23D None IGHJ4 ... None IGHJ4 IGLV2-11 IGLJ2,IGLJ3 IgM IgM IGH + IGL Single pair standard standard

2493 rows × 47 columns

Retrieving entries with update_metadata

The .metadata slot in Dandelion class automatically initializes whenever the .data slot is filled. However, it only returns a standard number of columns that are pre-specified. To retrieve other columns from the .data slot, we can update the metadata with ddl.update_metadata and specify the options retrieve and retrieve_mode.

The following modes determine how the retrieval is completed:

split and unique only - splits the retrieval into VDJ and VJ chains. A | will separate unique element.

split and merge - splits the retrieval into VDJ and VJ chains. A | will separate every element.

merge and unique only - smiliar to above but merged into a single column.

split - split retrieval into individual columns for each contig.

merge - merge retrieval into a single column where a | will separate every element.

For numerical columns, there’s additional options:

split and sum - splits the retrieval into VDJ and VJ chains and sum separately.

split and average - smiliar to above but average instead of sum.

sum - sum the retrievals into a single column.

average - averages the retrievals into a single column.

If retrieve_mode is not specified, it will default to split and merge

Example: retrieving fwr1 sequences

[9]:
vdj.update_metadata(retrieve="fwr1")
vdj
[9]:
Dandelion class object with n_obs = 2493 and n_contigs = 7355
    data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'c_call', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end', 'v_score', 'v_identity', 'v_support', 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2', 'cdr3', 'cell_id', 'consensus_count', 'umi_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'j_support_igblastn', 'j_score_igblastn', 'j_call_igblastn', 'j_call_blastn', 'j_identity_blastn', 'j_alignment_length_blastn', 'j_number_of_mismatches_blastn', 'j_number_of_gap_openings_blastn', 'j_sequence_start_blastn', 'j_sequence_end_blastn', 'j_germline_start_blastn', 'j_germline_end_blastn', 'j_support_blastn', 'j_score_blastn', 'j_sequence_alignment_blastn', 'j_germline_alignment_blastn', 'j_source', 'd_support_igblastn', 'd_score_igblastn', 'd_call_igblastn', 'd_call_blastn', 'd_identity_blastn', 'd_alignment_length_blastn', 'd_number_of_mismatches_blastn', 'd_number_of_gap_openings_blastn', 'd_sequence_start_blastn', 'd_sequence_end_blastn', 'd_germline_start_blastn', 'd_germline_end_blastn', 'd_support_blastn', 'd_score_blastn', 'd_sequence_alignment_blastn', 'd_germline_alignment_blastn', 'd_source', 'v_call_genotyped', 'germline_alignment_d_mask', 'sample_id', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_sequence_end', 'c_score', 'c_identity', 'c_call_10x', 'junction_aa_length', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa', 'sequence_alignment_aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'complete_vdj', 'j_call_multimappers', 'j_call_multiplicity', 'j_call_sequence_start_multimappers', 'j_call_sequence_end_multimappers', 'j_call_support_multimappers', 'mu_count', 'ambiguous', 'extra', 'rearrangement_status', 'clone_id'
    metadata: 'clone_id', 'clone_id_rank', 'sample_id', 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'umi_count_B_VDJ', 'umi_count_B_VJ', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'v_call_B_VDJ_main', 'd_call_B_VDJ_main', 'j_call_B_VDJ_main', 'v_call_B_VJ_main', 'j_call_B_VJ_main', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ', 'fwr1_VJ', 'fwr1_VDJ'

Note the additional fwr1 VDJ and VJ columns in the metadata slot.

By default, dandelion will not try to merge numerical columns as it can create mixed dtype columns.

There is a new sub-function that will try and retrieve frequently used columns such as np1_length, np2_length:

[10]:
vdj.update_plus()
vdj
/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.12/site-packages/numpy/_core/fromnumeric.py:3860: RuntimeWarning: Mean of empty slice.
/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.12/site-packages/numpy/_core/_methods.py:144: RuntimeWarning: invalid value encountered in scalar divide
[10]:
Dandelion class object with n_obs = 2493 and n_contigs = 7355
    data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'c_call', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end', 'v_score', 'v_identity', 'v_support', 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2', 'cdr3', 'cell_id', 'consensus_count', 'umi_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'j_support_igblastn', 'j_score_igblastn', 'j_call_igblastn', 'j_call_blastn', 'j_identity_blastn', 'j_alignment_length_blastn', 'j_number_of_mismatches_blastn', 'j_number_of_gap_openings_blastn', 'j_sequence_start_blastn', 'j_sequence_end_blastn', 'j_germline_start_blastn', 'j_germline_end_blastn', 'j_support_blastn', 'j_score_blastn', 'j_sequence_alignment_blastn', 'j_germline_alignment_blastn', 'j_source', 'd_support_igblastn', 'd_score_igblastn', 'd_call_igblastn', 'd_call_blastn', 'd_identity_blastn', 'd_alignment_length_blastn', 'd_number_of_mismatches_blastn', 'd_number_of_gap_openings_blastn', 'd_sequence_start_blastn', 'd_sequence_end_blastn', 'd_germline_start_blastn', 'd_germline_end_blastn', 'd_support_blastn', 'd_score_blastn', 'd_sequence_alignment_blastn', 'd_germline_alignment_blastn', 'd_source', 'v_call_genotyped', 'germline_alignment_d_mask', 'sample_id', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_sequence_end', 'c_score', 'c_identity', 'c_call_10x', 'junction_aa_length', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa', 'sequence_alignment_aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'complete_vdj', 'j_call_multimappers', 'j_call_multiplicity', 'j_call_sequence_start_multimappers', 'j_call_sequence_end_multimappers', 'j_call_support_multimappers', 'mu_count', 'ambiguous', 'extra', 'rearrangement_status', 'clone_id'
    metadata: 'clone_id', 'clone_id_rank', 'sample_id', 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'umi_count_B_VDJ', 'umi_count_B_VJ', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'v_call_B_VDJ_main', 'd_call_B_VDJ_main', 'j_call_B_VDJ_main', 'v_call_B_VJ_main', 'j_call_B_VJ_main', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ', 'fwr1_VJ', 'fwr1_VDJ', 'mu_count_VDJ', 'mu_count_VJ', 'mu_count', 'junction_length_VDJ', 'junction_length_VJ', 'junction_aa_length_VDJ', 'junction_aa_length_VJ', 'np1_length_VDJ', 'np1_length_VJ', 'np2_length_VDJ'

Renaming barcodes

You can now use a simple function to rename the barcodes (both sequence and cell ids at the same time). This is useful for when you want to rename the barcodes to a more meaningful name. This only works on the indices that were initially used to create the Dandelion object. So if you have run the function once already, it doesn’t continuously add the prefix/suffix to the new indices. It just updates based on the original indices.

[11]:
# original
print(vdj.data[["sequence_id", "cell_id"]]), print(vdj.metadata_names)
                                                                                     sequence_id  \
sequence_id
sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG_contig_1  sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG_contig_1
sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_contig_2  sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_contig_2
sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_contig_1  sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_contig_1
sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_contig_1  sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_contig_1
sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_contig_2  sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_contig_2
...                                                                                          ...
vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC_contig_1          vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC_contig_1
vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_2          vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_2
vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_1          vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_1
vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_2          vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_2
vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_1          vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_1

                                                                                cell_id
sequence_id
sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG_contig_1  sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG
sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_contig_2  sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC
sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_contig_1  sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC
sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_contig_1  sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG
sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_contig_2  sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG
...                                                                                 ...
vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC_contig_1          vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC
vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_2          vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG
vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_1          vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG
vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_2          vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG
vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_1          vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG

[7355 rows x 2 columns]
Index(['sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG',
       'sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC',
       'sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG',
       'sc5p_v2_hs_PBMC_10k_b_AAACCTGTCTTGAGAC',
       'sc5p_v2_hs_PBMC_10k_b_AAACGGGAGCGACGTA',
       'sc5p_v2_hs_PBMC_10k_b_AAACGGGCACTGTTAG',
       'sc5p_v2_hs_PBMC_10k_b_AAAGATGAGGATGCGT',
       'sc5p_v2_hs_PBMC_10k_b_AAAGATGGTCGAATCT',
       'sc5p_v2_hs_PBMC_10k_b_AAAGATGGTGAGGGAG',
       'sc5p_v2_hs_PBMC_10k_b_AAAGTAGCAGATCCAT',
       ...
       'vdj_v1_hs_pbmc3_b_TTTACTGTCAGCTGGC',
       'vdj_v1_hs_pbmc3_b_TTTATGCGTCAGAATA',
       'vdj_v1_hs_pbmc3_b_TTTATGCTCAGGATCT',
       'vdj_v1_hs_pbmc3_b_TTTATGCTCCTAGAAC',
       'vdj_v1_hs_pbmc3_b_TTTCCTCAGCAATATG',
       'vdj_v1_hs_pbmc3_b_TTTCCTCAGCGCTTAT',
       'vdj_v1_hs_pbmc3_b_TTTCCTCAGGGAAACA',
       'vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC',
       'vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG',
       'vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG'],
      dtype='object', length=2493)
[11]:
(None, None)
[12]:
# let's add a 'test-' as a prefix. There's also the suffix option
vdj.add_sequence_prefix("test", sep="-")
print(vdj.data[["sequence_id", "cell_id"]]), print(vdj.metadata_names)
                                                                                          sequence_id  \
sequence_id
test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG_con...  test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG_co...
test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_con...  test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_co...
test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_con...  test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_co...
test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_con...  test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_co...
test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_con...  test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_co...
...                                                                                               ...
test-vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC_contig_1     test-vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC_contig_1
test-vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_2     test-vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_2
test-vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_1     test-vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_1
test-vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_2     test-vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_2
test-vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_1     test-vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_1

                                                                                        cell_id
sequence_id
test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG_con...  test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG
test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_con...  test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC
test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_con...  test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC
test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_con...  test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG
test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_con...  test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG
...                                                                                         ...
test-vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC_contig_1        test-vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC
test-vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_2        test-vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG
test-vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_1        test-vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG
test-vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_2        test-vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG
test-vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_1        test-vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG

[7355 rows x 2 columns]
Index(['test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG',
       'test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC',
       'test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG',
       'test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCTTGAGAC',
       'test-sc5p_v2_hs_PBMC_10k_b_AAACGGGAGCGACGTA',
       'test-sc5p_v2_hs_PBMC_10k_b_AAACGGGCACTGTTAG',
       'test-sc5p_v2_hs_PBMC_10k_b_AAAGATGAGGATGCGT',
       'test-sc5p_v2_hs_PBMC_10k_b_AAAGATGGTCGAATCT',
       'test-sc5p_v2_hs_PBMC_10k_b_AAAGATGGTGAGGGAG',
       'test-sc5p_v2_hs_PBMC_10k_b_AAAGTAGCAGATCCAT',
       ...
       'test-vdj_v1_hs_pbmc3_b_TTTACTGTCAGCTGGC',
       'test-vdj_v1_hs_pbmc3_b_TTTATGCGTCAGAATA',
       'test-vdj_v1_hs_pbmc3_b_TTTATGCTCAGGATCT',
       'test-vdj_v1_hs_pbmc3_b_TTTATGCTCCTAGAAC',
       'test-vdj_v1_hs_pbmc3_b_TTTCCTCAGCAATATG',
       'test-vdj_v1_hs_pbmc3_b_TTTCCTCAGCGCTTAT',
       'test-vdj_v1_hs_pbmc3_b_TTTCCTCAGGGAAACA',
       'test-vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC',
       'test-vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG',
       'test-vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG'],
      dtype='object', length=2493)
[12]:
(None, None)
[13]:
len(vdj._original_cell_ids.unique())
[13]:
3158
[14]:
vdj._metadata.index
[14]:
Index(['test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG',
       'test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC',
       'test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG',
       'test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCTTGAGAC',
       'test-sc5p_v2_hs_PBMC_10k_b_AAACGGGAGCGACGTA',
       'test-sc5p_v2_hs_PBMC_10k_b_AAACGGGCACTGTTAG',
       'test-sc5p_v2_hs_PBMC_10k_b_AAAGATGAGGATGCGT',
       'test-sc5p_v2_hs_PBMC_10k_b_AAAGATGGTCGAATCT',
       'test-sc5p_v2_hs_PBMC_10k_b_AAAGATGGTGAGGGAG',
       'test-sc5p_v2_hs_PBMC_10k_b_AAAGTAGCAGATCCAT',
       ...
       'test-vdj_v1_hs_pbmc3_b_TTTACTGTCAGCTGGC',
       'test-vdj_v1_hs_pbmc3_b_TTTATGCGTCAGAATA',
       'test-vdj_v1_hs_pbmc3_b_TTTATGCTCAGGATCT',
       'test-vdj_v1_hs_pbmc3_b_TTTATGCTCCTAGAAC',
       'test-vdj_v1_hs_pbmc3_b_TTTCCTCAGCAATATG',
       'test-vdj_v1_hs_pbmc3_b_TTTCCTCAGCGCTTAT',
       'test-vdj_v1_hs_pbmc3_b_TTTCCTCAGGGAAACA',
       'test-vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC',
       'test-vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG',
       'test-vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG'],
      dtype='object', length=2493)
[15]:
# same functionality as above
vdj.add_cell_prefix("test2", sep="_")
print(vdj.data[["sequence_id", "cell_id"]]), print(vdj.metadata_names)
                                                                                          sequence_id  \
sequence_id
test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG_co...  test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG_c...
test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_co...  test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_c...
test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_co...  test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_c...
test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_co...  test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_c...
test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_co...  test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_c...
...                                                                                               ...
test2_vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC_contig_1   test2_vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC_contig_1
test2_vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_2   test2_vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_2
test2_vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_1   test2_vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_1
test2_vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_2   test2_vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_2
test2_vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_1   test2_vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_1

                                                                                         cell_id
sequence_id
test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG_co...  test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG
test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_co...  test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC
test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_co...  test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC
test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_co...  test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG
test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_co...  test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG
...                                                                                          ...
test2_vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC_contig_1       test2_vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC
test2_vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_2       test2_vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG
test2_vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_1       test2_vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG
test2_vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_2       test2_vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG
test2_vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_1       test2_vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG

[7355 rows x 2 columns]
Index(['test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG',
       'test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC',
       'test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG',
       'test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCTTGAGAC',
       'test2_sc5p_v2_hs_PBMC_10k_b_AAACGGGAGCGACGTA',
       'test2_sc5p_v2_hs_PBMC_10k_b_AAACGGGCACTGTTAG',
       'test2_sc5p_v2_hs_PBMC_10k_b_AAAGATGAGGATGCGT',
       'test2_sc5p_v2_hs_PBMC_10k_b_AAAGATGGTCGAATCT',
       'test2_sc5p_v2_hs_PBMC_10k_b_AAAGATGGTGAGGGAG',
       'test2_sc5p_v2_hs_PBMC_10k_b_AAAGTAGCAGATCCAT',
       ...
       'test2_vdj_v1_hs_pbmc3_b_TTTACTGTCAGCTGGC',
       'test2_vdj_v1_hs_pbmc3_b_TTTATGCGTCAGAATA',
       'test2_vdj_v1_hs_pbmc3_b_TTTATGCTCAGGATCT',
       'test2_vdj_v1_hs_pbmc3_b_TTTATGCTCCTAGAAC',
       'test2_vdj_v1_hs_pbmc3_b_TTTCCTCAGCAATATG',
       'test2_vdj_v1_hs_pbmc3_b_TTTCCTCAGCGCTTAT',
       'test2_vdj_v1_hs_pbmc3_b_TTTCCTCAGGGAAACA',
       'test2_vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC',
       'test2_vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG',
       'test2_vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG'],
      dtype='object', length=2493)
[15]:
(None, None)

Simplifying the V/DJ/C call annotations

Sometimes the V/DJ/C call annotations can be quite verbose. You can simplify them with the .simplify() function. This function will remove the , and only keep the first element of the call, as well as stripping alleles. This is useful for when you want to simplify the V/DJ/C calls for plotting purposes.

[16]:
# before
(
    vdj.data[["v_call", "j_call"]],
    vdj.metadata[["v_call_VDJ", "j_call_VDJ"]],
)
[16]:
(                                                                      v_call  \
 sequence_id
 test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG_co...  IGKV1-33*01,IGKV1D-33*01
 test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_co...  IGHV1-69*01,IGHV1-69D*01
 test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_co...                IGKV1-8*01
 test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_co...               IGLV5-45*02
 test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_co...                IGHV1-2*02
 ...                                                                      ...
 test2_vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC_contig_1                IGHV1-46*01
 test2_vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_2   IGHV1-69*01,IGHV1-69D*01
 test2_vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_1                IGLV1-47*01
 test2_vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_2                IGLV2-11*01
 test2_vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_1   IGHV3-23*01,IGHV3-23D*01

                                                                         j_call
 sequence_id
 test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG_co...                    IGKJ4*01
 test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_co...                    IGHJ3*02
 test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_co...                    IGKJ1*01
 test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_co...                    IGLJ3*02
 test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_co...                    IGHJ3*02
 ...                                                                        ...
 test2_vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC_contig_1                     IGHJ5*02
 test2_vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_2                     IGHJ6*02
 test2_vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_1                     IGLJ3*02
 test2_vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_2   IGLJ2*01,IGLJ3*01,IGLJ3*02
 test2_vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_1                     IGHJ4*02

 [7355 rows x 2 columns],
                                                       v_call_VDJ j_call_VDJ
 test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG                None       None
 test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC  IGHV1-69D,IGHV1-69      IGHJ3
 test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG             IGHV1-2      IGHJ3
 test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCTTGAGAC            IGHV5-51      IGHJ3
 test2_sc5p_v2_hs_PBMC_10k_b_AAACGGGAGCGACGTA             IGHV4-4      IGHJ3
 ...                                                          ...        ...
 test2_vdj_v1_hs_pbmc3_b_TTTCCTCAGCGCTTAT                IGHV3-30      IGHJ6
 test2_vdj_v1_hs_pbmc3_b_TTTCCTCAGGGAAACA                IGHV4-61      IGHJ2
 test2_vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC                IGHV1-46      IGHJ5
 test2_vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG      IGHV1-69D,IGHV1-69      IGHJ6
 test2_vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG      IGHV3-23,IGHV3-23D      IGHJ4

 [2493 rows x 2 columns])
[17]:
# after
vdj.simplify()
(
    vdj.data[["v_call", "j_call"]],
    vdj.metadata[["v_call_VDJ", "j_call_VDJ"]],
)
[17]:
(                                                      v_call j_call
 sequence_id
 test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG_co...  IGKV1-33  IGKJ4
 test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_co...  IGHV1-69  IGHJ3
 test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_co...   IGKV1-8  IGKJ1
 test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_co...  IGLV5-45  IGLJ3
 test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_co...   IGHV1-2  IGHJ3
 ...                                                      ...    ...
 test2_vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC_contig_1   IGHV1-46  IGHJ5
 test2_vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_2   IGHV1-69  IGHJ6
 test2_vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_1   IGLV1-47  IGLJ3
 test2_vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_2   IGLV2-11  IGLJ2
 test2_vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_1   IGHV3-23  IGHJ4

 [7355 rows x 2 columns],
                                              v_call_VDJ j_call_VDJ
 test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG       None       None
 test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC   IGHV1-69      IGHJ3
 test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG    IGHV1-2      IGHJ3
 test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCTTGAGAC   IGHV5-51      IGHJ3
 test2_sc5p_v2_hs_PBMC_10k_b_AAACGGGAGCGACGTA    IGHV4-4      IGHJ3
 ...                                                 ...        ...
 test2_vdj_v1_hs_pbmc3_b_TTTCCTCAGCGCTTAT       IGHV3-30      IGHJ6
 test2_vdj_v1_hs_pbmc3_b_TTTCCTCAGGGAAACA       IGHV4-61      IGHJ2
 test2_vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC       IGHV1-46      IGHJ5
 test2_vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG       IGHV1-69      IGHJ6
 test2_vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG       IGHV3-23      IGHJ4

 [2493 rows x 2 columns])

concatenating multiple objects

This is a simple function to concatenate (append) two or more Dandelion class, or pandas dataframes. Note that this operates on the .data slot and not the .metadata slot.

[18]:
vdj
[18]:
Dandelion class object with n_obs = 2493 and n_contigs = 7355
    data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'c_call', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end', 'v_score', 'v_identity', 'v_support', 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2', 'cdr3', 'cell_id', 'consensus_count', 'umi_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'j_support_igblastn', 'j_score_igblastn', 'j_call_igblastn', 'j_call_blastn', 'j_identity_blastn', 'j_alignment_length_blastn', 'j_number_of_mismatches_blastn', 'j_number_of_gap_openings_blastn', 'j_sequence_start_blastn', 'j_sequence_end_blastn', 'j_germline_start_blastn', 'j_germline_end_blastn', 'j_support_blastn', 'j_score_blastn', 'j_sequence_alignment_blastn', 'j_germline_alignment_blastn', 'j_source', 'd_support_igblastn', 'd_score_igblastn', 'd_call_igblastn', 'd_call_blastn', 'd_identity_blastn', 'd_alignment_length_blastn', 'd_number_of_mismatches_blastn', 'd_number_of_gap_openings_blastn', 'd_sequence_start_blastn', 'd_sequence_end_blastn', 'd_germline_start_blastn', 'd_germline_end_blastn', 'd_support_blastn', 'd_score_blastn', 'd_sequence_alignment_blastn', 'd_germline_alignment_blastn', 'd_source', 'v_call_genotyped', 'germline_alignment_d_mask', 'sample_id', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_sequence_end', 'c_score', 'c_identity', 'c_call_10x', 'junction_aa_length', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa', 'sequence_alignment_aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'complete_vdj', 'j_call_multimappers', 'j_call_multiplicity', 'j_call_sequence_start_multimappers', 'j_call_sequence_end_multimappers', 'j_call_support_multimappers', 'mu_count', 'ambiguous', 'extra', 'rearrangement_status', 'clone_id'
    metadata: 'clone_id', 'clone_id_rank', 'sample_id', 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'umi_count_B_VDJ', 'umi_count_B_VJ', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'v_call_B_VDJ_main', 'd_call_B_VDJ_main', 'j_call_B_VDJ_main', 'v_call_B_VJ_main', 'j_call_B_VJ_main', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'
[19]:
# just simple concatenation x 3. check the difference between the cell and contig numbers between this object and just vdj
vdj_concat = ddl.tl.concat([vdj, vdj, vdj])
vdj_concat
[19]:
Dandelion class object with n_obs = 2493 and n_contigs = 22065
    data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'c_call', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end', 'v_score', 'v_identity', 'v_support', 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2', 'cdr3', 'cell_id', 'consensus_count', 'umi_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'j_support_igblastn', 'j_score_igblastn', 'j_call_igblastn', 'j_call_blastn', 'j_identity_blastn', 'j_alignment_length_blastn', 'j_number_of_mismatches_blastn', 'j_number_of_gap_openings_blastn', 'j_sequence_start_blastn', 'j_sequence_end_blastn', 'j_germline_start_blastn', 'j_germline_end_blastn', 'j_support_blastn', 'j_score_blastn', 'j_sequence_alignment_blastn', 'j_germline_alignment_blastn', 'j_source', 'd_support_igblastn', 'd_score_igblastn', 'd_call_igblastn', 'd_call_blastn', 'd_identity_blastn', 'd_alignment_length_blastn', 'd_number_of_mismatches_blastn', 'd_number_of_gap_openings_blastn', 'd_sequence_start_blastn', 'd_sequence_end_blastn', 'd_germline_start_blastn', 'd_germline_end_blastn', 'd_support_blastn', 'd_score_blastn', 'd_sequence_alignment_blastn', 'd_germline_alignment_blastn', 'd_source', 'v_call_genotyped', 'germline_alignment_d_mask', 'sample_id', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_sequence_end', 'c_score', 'c_identity', 'c_call_10x', 'junction_aa_length', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa', 'sequence_alignment_aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'complete_vdj', 'j_call_multimappers', 'j_call_multiplicity', 'j_call_sequence_start_multimappers', 'j_call_sequence_end_multimappers', 'j_call_support_multimappers', 'mu_count', 'ambiguous', 'extra', 'rearrangement_status', 'clone_id'
    metadata: 'clone_id', 'clone_id_rank', 'sample_id', 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'umi_count_B_VDJ', 'umi_count_B_VJ', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'v_call_B_VDJ_main', 'd_call_B_VDJ_main', 'j_call_B_VDJ_main', 'v_call_B_VJ_main', 'j_call_B_VJ_main', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'
[20]:
vdj_concat.data[["sequence_id", "cell_id"]].head()
[20]:
sequence_id cell_id
sequence_id
sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG_contig_1_0 sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG_contig_1_0 test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG
sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG_contig_1_1 sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG_contig_1_1 test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG
sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG_contig_1_2 sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG_contig_1_2 test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG
sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_contig_2_0 sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_contig_2_0 test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC
sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_contig_2_1 sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_contig_2_1 test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC

ddl.concat also lets you add in your custom prefixes/suffixes to append to the sequence ids. If not provided, it will add -0, -1 etc. as a suffix if it detects that the sequence ids are not unique as seen above.

read/write

Dandelion class can be saved using .write_h5ddl and .write_pkl functions with accompanying compression methods e.g. gzip. write_h5ddl primarily uses h5py library and write_pkl just uses pickle. read_h5ddl and read_pkl functions will read the respective file formats accordingly.

[21]:
%time vdj.write_h5ddl('dandelion_results_test.h5ddl', compression="gzip")
CPU times: user 8.01 s, sys: 403 ms, total: 8.41 s
Wall time: 9.49 s

If you see any warnings above, it’s due to mix dtypes somewhere in the object. So do some checking if you think it will interfere with downstream usage.

[22]:
%time vdj_1 = ddl.read_h5ddl('dandelion_results_test.h5ddl')
vdj_1
CPU times: user 600 ms, sys: 78.1 ms, total: 678 ms
Wall time: 774 ms
[22]:
Dandelion class object with n_obs = 2493 and n_contigs = 7355
    data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'c_call', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end', 'v_score', 'v_identity', 'v_support', 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2', 'cdr3', 'cell_id', 'consensus_count', 'umi_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'j_support_igblastn', 'j_score_igblastn', 'j_call_igblastn', 'j_call_blastn', 'j_identity_blastn', 'j_alignment_length_blastn', 'j_number_of_mismatches_blastn', 'j_number_of_gap_openings_blastn', 'j_sequence_start_blastn', 'j_sequence_end_blastn', 'j_germline_start_blastn', 'j_germline_end_blastn', 'j_support_blastn', 'j_score_blastn', 'j_sequence_alignment_blastn', 'j_germline_alignment_blastn', 'j_source', 'd_support_igblastn', 'd_score_igblastn', 'd_call_igblastn', 'd_call_blastn', 'd_identity_blastn', 'd_alignment_length_blastn', 'd_number_of_mismatches_blastn', 'd_number_of_gap_openings_blastn', 'd_sequence_start_blastn', 'd_sequence_end_blastn', 'd_germline_start_blastn', 'd_germline_end_blastn', 'd_support_blastn', 'd_score_blastn', 'd_sequence_alignment_blastn', 'd_germline_alignment_blastn', 'd_source', 'v_call_genotyped', 'germline_alignment_d_mask', 'sample_id', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_sequence_end', 'c_score', 'c_identity', 'c_call_10x', 'junction_aa_length', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa', 'sequence_alignment_aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'complete_vdj', 'j_call_multimappers', 'j_call_multiplicity', 'j_call_sequence_start_multimappers', 'j_call_sequence_end_multimappers', 'j_call_support_multimappers', 'mu_count', 'ambiguous', 'extra', 'rearrangement_status', 'clone_id'
    metadata: 'clone_id', 'clone_id_rank', 'sample_id', 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'umi_count_B_VDJ', 'umi_count_B_VJ', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'v_call_B_VDJ_main', 'd_call_B_VDJ_main', 'j_call_B_VDJ_main', 'v_call_B_VJ_main', 'j_call_B_VJ_main', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'

There’s also other types of writing functions such as .write_airr and .write_10x, which will write the object to a .tsv or .csv file that is compatible with airr and 10x formats respectively. The use case for .write_10x is e.g. if you want to reannotate your VDJ data using the preprocessing workflow but your data is not actually from 10x’s platform. Note that .write_10x only writes the contig table and fasta files and does not include metadata, graph, or distances.

[23]:
import pandas as pd

vdj_1.write_airr("test.airr.tsv")
df = pd.read_csv("test.airr.tsv", sep="\t")
df
[23]:
sequence_id sequence rev_comp productive v_call d_call j_call sequence_alignment germline_alignment junction ... j_call_multimappers j_call_multiplicity j_call_sequence_start_multimappers j_call_sequence_end_multimappers j_call_support_multimappers mu_count ambiguous extra rearrangement_status clone_id
0 test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG_c... TGGGGAGGAGTCAGTCCCAACCAGGACACGGCCTGGACATGAGGGT... F T IGKV1-33 NaN IGKJ4 GACATCCAGATGACCCAGTCTCCATCCTCCCTGTCTGCATCTGTGG... GACATCCAGATGACCCAGTCTCCATCCTCCCTGTCTGCATCTGTAG... TGTCAACAATATGACGAACTTCCCGTCACTTTC ... ["IGKJ4*01"] 1 [385] [412] [3.56e-09] 27 F F standard B_VJ_187_2_3
1 test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_c... ATCACATAACAACCACATTCCTCCTCTAAAGAAGCCCCTGGGAGCA... F T IGHV1-69 IGHD3-22 IGHJ3 CAGGTGCAGCTGGTGCAGTCTGGGGCT...GAGGTGAAGAAGCCTG... CAGGTGCAGCTGGTGCAGTCTGGGGCT...GAGGTGAAGAAGCCTG... TGTGCGACTACGTATTACTATGATAGTAGTGGTTATTACCAGAATG... ... ["IGHJ3*02"] 1 [445] [494] [4.5799999999999995e-23] 0 F F standard B_VDJ_220_3_2_VJ_4_2_1
2 test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_c... AGGAGTCAGACCCTGTCAGGACACAGCATAGACATGAGGGTCCCCG... F T IGKV1-8 NaN IGKJ1 GCCATCCGGATGACCCAGTCTCCATCCTCATTCTCTGCATCTACAG... GCCATCCGGATGACCCAGTCTCCATCCTCATTCTCTGCATCTACAG... TGTCAACAGTATTATAGTTACCCTCGGACGTTC ... ["IGKJ1*01"] 1 [380] [415] [2.7e-15] 0 F F standard B_VDJ_220_3_2_VJ_4_2_1
3 test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_c... ACTGTGGGGGTAAGAGGTTGTGTCCACCATGGCCTGGACTCCTCTC... F T IGLV5-45 NaN IGLJ3 CAGGCTGTGCTGACTCAGCCGTCTTCC...CTCTCTGCATCTCCTG... CAGGCTGTGCTGACTCAGCCGTCTTCC...CTCTCTGCATCTCCTG... TGTATGATTTGGCACAGCAGCGCTTGGGTGGTC ... ["IGLJ3*01"] 1 [402] [431] [6.84e-12] 8 F F standard B_VDJ_85_1_1_VJ_194_1_1
4 test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_c... GGGAGCATCACCCAGCAACCACATCTGTCCTCTAGAGAATCCCCTG... F T IGHV1-2 NaN IGHJ3 CAGGTGCAACTGGTGCAGTCTGGGGGT...GAGGTAAAGAAGCCTG... CAGGTGCAGCTGGTGCAGTCTGGGGCT...GAGGTGAAGAAGCCTG... TGTGCGAGAGAGATAGAGGGGGACGGTGTTTTTGAAATCTGG ... ["IGHJ3*02"] 1 [433] [479] [4.48e-18] 22 F F standard B_VDJ_85_1_1_VJ_194_1_1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
7350 test2_vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC_contig_1 ATCATCCAACAACCACATCCCTTCTCTACAGAAGCCTCTGAGAGGA... F T IGHV1-46 IGHD2-15 IGHJ5 CAGGTGCAGCTGGTGCAGTCTGGGGCT...GAGGTGAAGAAGCCTG... CAGGTGCAGCTGGTGCAGTCTGGGGCT...GAGGTGAAGAAGCCTG... TGTGCGAGAGAGGGATATTGTAGTGGTGGTAGCTGCTACTCCCCCG... ... ["IGHJ5*02"] 1 [461] [506] [7.83e-21] 0 F F standard B_VDJ_120_5_2_VJ_58_2_1
7351 test2_vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_2 ATCACATAACAACCACATTCCTCCTCTAAAGAAGCCCCTGGGAGCA... F T IGHV1-69 IGHD2-15 IGHJ6 CAGGTGCAGCTGGTGCAGTCTGGGGCT...GAGGTGAAGAAGCCTG... CAGGTGCAGCTGGTGCAGTCTGGGGCT...GAGGTGAAGAAGCCTG... TGTGCGAGATCTCTGGATATTGTAGTGGTGGTAGCACTCTACTACT... ... ["IGHJ6*02"] 1 [439] [497] [4.57e-28] 0 F F standard B_VDJ_197_4_2_VJ_70_3_1
7352 test2_vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_1 AGCTTCAGCTGTGGTAGAGAAGACAGGATTCAGGACAATCTCCAGC... F T IGLV1-47 NaN IGLJ3 CAGTCTGTGCTGACTCAGCCACCCTCA...GCGTCTGGGACCCCCG... CAGTCTGTGCTGACTCAGCCACCCTCA...GCGTCTGGGACCCCCG... TGTGCAGCATGGGATGACAGCCTGAGTGGTTGGGTGTTC ... ["IGLJ3*02"] 1 [397] [434] [2.46e-16] 0 F F standard B_VDJ_197_4_2_VJ_70_3_1
7353 test2_vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_2 GGCTGGGGTCTCAGGAGGCAGCACTCTCGGGACGTCTCCACCATGG... F T IGLV2-11 NaN IGLJ2 CAGTCTGCCCTGACTCAGCCTCGCTCA...GTGTCCGGGTCTCCTG... CAGTCTGCCCTGACTCAGCCTCGCTCA...GTGTCCGGGTCTCCTG... TGCTGCTCATATGCAGGCAGCTACACTGTGTTTTTC ... ["IGLJ3*01"] 1 [393] [430] [2.46e-11] 4 F F standard B_VDJ_206_6_7_VJ_87_3_1
7354 test2_vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_1 AGCTCTGAGAGAGGAGCCCAGCCCTGGGATTTTCAGGTGTTTTCAT... F T IGHV3-23 NaN IGHJ4 GAGGTGCAGGTGTTGGAGTCTGGGGGA...GGCTTGGAACAGCCTG... GAGGTGCAGCTGTTGGAGTCTGGGGGA...GGCTTGGTACAGCCTG... TGTGCGGGGAGTCGGTGGTTATATTCTTTTGACTACTGG ... ["IGHJ4*02"] 1 [449] [491] [1.65e-17] 8 F F standard B_VDJ_206_6_7_VJ_87_3_1

7355 rows × 124 columns

[24]:
vdj_1.write_10x(
    folder="10x_test",
    filename_prefix="all",
)  # this writes both the conting_annotations.csv and contig.fasta
df = pd.read_csv("10x_test/all_contig_annotations.csv")
df
[24]:
barcode contig_id length chain v_gene d_gene j_gene c_gene full_length productive cdr3 cdr3_nt reads umis raw_clonotype_id raw_consensus_id
0 test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG_c... 556 IGK IGKV1-33 NaN IGKJ4 IGKC NaN True CQQYDELPVTF TGTCAACAATATGACGAACTTCCCGTCACTTTC 9139 68 B_VJ_187_2_3 B_VJ_187_2_3
1 test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_c... 565 IGH IGHV1-69 IGHD3-22 IGHJ3 IGHM NaN True CATTYYYDSSGYYQNDAFDIW TGTGCGACTACGTATTACTATGATAGTAGTGGTTATTACCAGAATG... 4161 51 B_VDJ_220_3_2_VJ_4_2_1 B_VDJ_220_3_2_VJ_4_2_1
2 test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_c... 551 IGK IGKV1-8 NaN IGKJ1 IGKC NaN True CQQYYSYPRTF TGTCAACAGTATTATAGTTACCCTCGGACGTTC 5679 43 B_VDJ_220_3_2_VJ_4_2_1 B_VDJ_220_3_2_VJ_4_2_1
3 test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_c... 642 IGL IGLV5-45 NaN IGLJ3 IGLC3 NaN True CMIWHSSAWVV TGTATGATTTGGCACAGCAGCGCTTGGGTGGTC 13160 90 B_VDJ_85_1_1_VJ_194_1_1 B_VDJ_85_1_1_VJ_194_1_1
4 test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_c... 550 IGH IGHV1-2 NaN IGHJ3 IGHM NaN True CAREIEGDGVFEIW TGTGCGAGAGAGATAGAGGGGGACGGTGTTTTTGAAATCTGG 5080 47 B_VDJ_85_1_1_VJ_194_1_1 B_VDJ_85_1_1_VJ_194_1_1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
7350 test2_vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC test2_vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC_contig_1 577 IGH IGHV1-46 IGHD2-15 IGHJ5 IGHM NaN True CAREGYCSGGSCYSPDPNNGWFDPW TGTGCGAGAGAGGGATATTGTAGTGGTGGTAGCTGCTACTCCCCCG... 2960 28 B_VDJ_120_5_2_VJ_58_2_1 B_VDJ_120_5_2_VJ_58_2_1
7351 test2_vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG test2_vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_2 568 IGH IGHV1-69 IGHD2-15 IGHJ6 IGHM NaN True CARSLDIVVVVALYYYYGMDVW TGTGCGAGATCTCTGGATATTGTAGTGGTGGTAGCACTCTACTACT... 2464 32 B_VDJ_197_4_2_VJ_70_3_1 B_VDJ_197_4_2_VJ_70_3_1
7352 test2_vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG test2_vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_1 645 IGL IGLV1-47 NaN IGLJ3 IGLC3 NaN True CAAWDDSLSGWVF TGTGCAGCATGGGATGACAGCCTGAGTGGTTGGGTGTTC 2457 28 B_VDJ_197_4_2_VJ_70_3_1 B_VDJ_197_4_2_VJ_70_3_1
7353 test2_vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG test2_vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_2 641 IGL IGLV2-11 NaN IGLJ2 IGLC NaN True CCSYAGSYTVFF TGCTGCTCATATGCAGGCAGCTACACTGTGTTTTTC 2744 36 B_VDJ_206_6_7_VJ_87_3_1 B_VDJ_206_6_7_VJ_87_3_1
7354 test2_vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG test2_vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_1 562 IGH IGHV3-23 NaN IGHJ4 IGHM NaN True CAGSRWLYSFDYW TGTGCGGGGAGTCGGTGGTTATATTCTTTTGACTACTGG 1915 22 B_VDJ_206_6_7_VJ_87_3_1 B_VDJ_206_6_7_VJ_87_3_1

7355 rows × 16 columns