Dandelion class
Much of the functions and utility of the dandelion package revolves around the Dandelion class object. The class will act as an intermediary object for storage and flexible interaction with other tools. This section will run through a quick primer to the Dandelion class.
Import modules
[1]:
import os
os.chdir("dandelion_tutorial/")
import dandelion as ddl
ddl.set_backend("base")
ddl.logging.print_versions()
dandelion==1.0.0a1.dev36 pandas==2.3.3 numpy==2.3.5 matplotlib==3.10.6 networkx==3.6.1 scipy==1.15.2
[2]:
vdj = ddl.read_h5ddl("dandelion_results.h5ddl")
# let's run find_clones again as this was not stored.
ddl.tl.find_clones(vdj)
vdj
Finding clones based on B cell VDJ chains using junction_aa: 100%|██████████| 228/228 [00:00<00:00, 3802.21it/s]
Finding clones based on B cell VJ chains using junction_aa: 100%|██████████| 213/213 [00:00<00:00, 4478.26it/s]
Refining clone assignment based on VJ chain pairing : 100%|██████████| 2493/2493 [00:00<00:00, 333420.49it/s]
[2]:
Dandelion class object with n_obs = 2493 and n_contigs = 7355
data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'c_call', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end', 'v_score', 'v_identity', 'v_support', 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2', 'cdr3', 'cell_id', 'consensus_count', 'umi_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'j_support_igblastn', 'j_score_igblastn', 'j_call_igblastn', 'j_call_blastn', 'j_identity_blastn', 'j_alignment_length_blastn', 'j_number_of_mismatches_blastn', 'j_number_of_gap_openings_blastn', 'j_sequence_start_blastn', 'j_sequence_end_blastn', 'j_germline_start_blastn', 'j_germline_end_blastn', 'j_support_blastn', 'j_score_blastn', 'j_sequence_alignment_blastn', 'j_germline_alignment_blastn', 'j_source', 'd_support_igblastn', 'd_score_igblastn', 'd_call_igblastn', 'd_call_blastn', 'd_identity_blastn', 'd_alignment_length_blastn', 'd_number_of_mismatches_blastn', 'd_number_of_gap_openings_blastn', 'd_sequence_start_blastn', 'd_sequence_end_blastn', 'd_germline_start_blastn', 'd_germline_end_blastn', 'd_support_blastn', 'd_score_blastn', 'd_sequence_alignment_blastn', 'd_germline_alignment_blastn', 'd_source', 'v_call_genotyped', 'germline_alignment_d_mask', 'sample_id', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_sequence_end', 'c_score', 'c_identity', 'c_call_10x', 'junction_aa_length', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa', 'sequence_alignment_aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'complete_vdj', 'j_call_multimappers', 'j_call_multiplicity', 'j_call_sequence_start_multimappers', 'j_call_sequence_end_multimappers', 'j_call_support_multimappers', 'mu_count', 'ambiguous', 'extra', 'rearrangement_status', 'clone_id'
metadata: 'clone_id', 'clone_id_rank', 'sample_id', 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'umi_count_B_VDJ', 'umi_count_B_VJ', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'v_call_B_VDJ_main', 'd_call_B_VDJ_main', 'j_call_B_VDJ_main', 'v_call_B_VJ_main', 'j_call_B_VJ_main', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'
Essentially, the .data slot holds the AIRR contig table while the .metadata holds a collapsed version that is compatible with combining with AnnData’s .obs slot. You can retrieve these slots like a typical class object; for example, if I want the metadata:
[3]:
vdj.metadata
[3]:
| clone_id | clone_id_rank | sample_id | locus_VDJ | locus_VJ | productive_VDJ | productive_VJ | v_call_VDJ | d_call_VDJ | j_call_VDJ | ... | d_call_B_VDJ_main | j_call_B_VDJ_main | v_call_B_VJ_main | j_call_B_VJ_main | isotype | isotype_status | locus_status | chain_status | rearrangement_status_VDJ | rearrangement_status_VJ | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG | B_VJ_187_2_3 | 135 | sc5p_v2_hs_PBMC_10k_b | None | IGK | None | T | None | None | None | ... | None | None | IGKV1-33,IGKV1D-33 | IGKJ4 | None | None | Orphan IGK | Orphan VJ | None | standard |
| sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC | B_VDJ_220_3_2_VJ_4_2_1 | 2196 | sc5p_v2_hs_PBMC_10k_b | IGH | IGK | T | T | IGHV1-69D,IGHV1-69 | IGHD3-22 | IGHJ3 | ... | IGHD3-22 | IGHJ3 | IGKV1-8 | IGKJ1 | IgM | IgM | IGH + IGK | Single pair | standard | standard |
| sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG | B_VDJ_85_1_1_VJ_194_1_1 | 1750 | sc5p_v2_hs_PBMC_10k_b | IGH | IGL | T | T | IGHV1-2 | None | IGHJ3 | ... | None | IGHJ3 | IGLV5-45 | IGLJ3 | IgM | IgM | IGH + IGL | Single pair | standard | standard |
| sc5p_v2_hs_PBMC_10k_b_AAACCTGTCTTGAGAC | B_VDJ_147_4_6_VJ_50_1_1 | 1751 | sc5p_v2_hs_PBMC_10k_b | IGH | IGK | T | T | IGHV5-51 | None | IGHJ3 | ... | None | IGHJ3 | IGKV1D-8 | IGKJ2 | IgM | IgM | IGH + IGK | Single pair | standard | standard |
| sc5p_v2_hs_PBMC_10k_b_AAACGGGAGCGACGTA | B_VDJ_145_2_1_VJ_56_2_1 | 1752 | sc5p_v2_hs_PBMC_10k_b | IGH | IGL | T | T | IGHV4-4 | IGHD6-13 | IGHJ3 | ... | IGHD6-13 | IGHJ3 | IGLV3-19 | IGLJ2,IGLJ3 | IgM | IgM | IGH + IGL | Single pair | standard | standard |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| vdj_v1_hs_pbmc3_b_TTTCCTCAGCGCTTAT | B_VDJ_66_5_4_VJ_183_1_1 | 895 | vdj_v1_hs_pbmc3_b | IGH | IGK | T | T | IGHV3-30 | IGHD4-17 | IGHJ6 | ... | IGHD4-17 | IGHJ6 | IGKV2-30 | IGKJ2 | IgM | IgM | IGH + IGK | Single pair | standard | standard |
| vdj_v1_hs_pbmc3_b_TTTCCTCAGGGAAACA | B_VDJ_205_2_1_VJ_105_4_1 | 896 | vdj_v1_hs_pbmc3_b | IGH | IGK | T | T | IGHV4-61 | IGHD6-13 | IGHJ2 | ... | IGHD6-13 | IGHJ2 | IGKV1-39,IGKV1D-39 | IGKJ1 | IgM | IgM | IGH + IGK | Single pair | standard | standard |
| vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC | B_VDJ_120_5_2_VJ_58_2_1 | 897 | vdj_v1_hs_pbmc3_b | IGH | IGK | T | T | IGHV1-46 | IGHD2-15 | IGHJ5 | ... | IGHD2-15 | IGHJ5 | IGKV1-39,IGKV1D-39 | IGKJ2 | IgM | IgM | IGH + IGK | Single pair | standard | standard |
| vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG | B_VDJ_197_4_2_VJ_70_3_1 | 898 | vdj_v1_hs_pbmc3_b | IGH | IGL | T | T | IGHV1-69D,IGHV1-69 | IGHD2-15 | IGHJ6 | ... | IGHD2-15 | IGHJ6 | IGLV1-47 | IGLJ3 | IgM | IgM | IGH + IGL | Single pair | standard | standard |
| vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG | B_VDJ_206_6_7_VJ_87_3_1 | 2598 | vdj_v1_hs_pbmc3_b | IGH | IGL | T | T | IGHV3-23,IGHV3-23D | None | IGHJ4 | ... | None | IGHJ4 | IGLV2-11 | IGLJ2,IGLJ3 | IgM | IgM | IGH + IGL | Single pair | standard | standard |
2493 rows × 47 columns
slicing
You can slice the Dandelion object via the .data or .metadata via their indices, with the behavior similar to how it is in pandas DataFrame and AnnData.
slicing .data
[4]:
# get the largest clone
largest_clone = vdj.data["clone_id"].value_counts().idxmax()
vdj[vdj.data["clone_id"] == largest_clone]
[4]:
Dandelion class object with n_obs = 626 and n_contigs = 714
data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'c_call', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end', 'v_score', 'v_identity', 'v_support', 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2', 'cdr3', 'cell_id', 'consensus_count', 'umi_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'j_support_igblastn', 'j_score_igblastn', 'j_call_igblastn', 'j_call_blastn', 'j_identity_blastn', 'j_alignment_length_blastn', 'j_number_of_mismatches_blastn', 'j_number_of_gap_openings_blastn', 'j_sequence_start_blastn', 'j_sequence_end_blastn', 'j_germline_start_blastn', 'j_germline_end_blastn', 'j_support_blastn', 'j_score_blastn', 'j_sequence_alignment_blastn', 'j_germline_alignment_blastn', 'j_source', 'd_support_igblastn', 'd_score_igblastn', 'd_call_igblastn', 'd_call_blastn', 'd_identity_blastn', 'd_alignment_length_blastn', 'd_number_of_mismatches_blastn', 'd_number_of_gap_openings_blastn', 'd_sequence_start_blastn', 'd_sequence_end_blastn', 'd_germline_start_blastn', 'd_germline_end_blastn', 'd_support_blastn', 'd_score_blastn', 'd_sequence_alignment_blastn', 'd_germline_alignment_blastn', 'd_source', 'v_call_genotyped', 'germline_alignment_d_mask', 'sample_id', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_sequence_end', 'c_score', 'c_identity', 'c_call_10x', 'junction_aa_length', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa', 'sequence_alignment_aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'complete_vdj', 'j_call_multimappers', 'j_call_multiplicity', 'j_call_sequence_start_multimappers', 'j_call_sequence_end_multimappers', 'j_call_support_multimappers', 'mu_count', 'ambiguous', 'extra', 'rearrangement_status', 'clone_id'
metadata: 'clone_id', 'clone_id_rank', 'sample_id', 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'umi_count_B_VDJ', 'umi_count_B_VJ', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'v_call_B_VDJ_main', 'd_call_B_VDJ_main', 'j_call_B_VDJ_main', 'v_call_B_VJ_main', 'j_call_B_VJ_main', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'
[5]:
vdj[
vdj.data_names.isin(
[
"sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG_contig_1",
"sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_contig_2",
"sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_contig_1",
"sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_contig_1",
"sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_contig_2",
]
)
]
[5]:
Dandelion class object with n_obs = 3 and n_contigs = 5
data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'c_call', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end', 'v_score', 'v_identity', 'v_support', 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2', 'cdr3', 'cell_id', 'consensus_count', 'umi_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'j_support_igblastn', 'j_score_igblastn', 'j_call_igblastn', 'j_call_blastn', 'j_identity_blastn', 'j_alignment_length_blastn', 'j_number_of_mismatches_blastn', 'j_number_of_gap_openings_blastn', 'j_sequence_start_blastn', 'j_sequence_end_blastn', 'j_germline_start_blastn', 'j_germline_end_blastn', 'j_support_blastn', 'j_score_blastn', 'j_sequence_alignment_blastn', 'j_germline_alignment_blastn', 'j_source', 'd_support_igblastn', 'd_score_igblastn', 'd_call_igblastn', 'd_call_blastn', 'd_identity_blastn', 'd_alignment_length_blastn', 'd_number_of_mismatches_blastn', 'd_number_of_gap_openings_blastn', 'd_sequence_start_blastn', 'd_sequence_end_blastn', 'd_germline_start_blastn', 'd_germline_end_blastn', 'd_support_blastn', 'd_score_blastn', 'd_sequence_alignment_blastn', 'd_germline_alignment_blastn', 'd_source', 'v_call_genotyped', 'germline_alignment_d_mask', 'sample_id', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_sequence_end', 'c_score', 'c_identity', 'c_call_10x', 'junction_aa_length', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa', 'sequence_alignment_aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'complete_vdj', 'j_call_multimappers', 'j_call_multiplicity', 'j_call_sequence_start_multimappers', 'j_call_sequence_end_multimappers', 'j_call_support_multimappers', 'mu_count', 'ambiguous', 'extra', 'rearrangement_status', 'clone_id'
metadata: 'clone_id', 'clone_id_rank', 'sample_id', 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'umi_count_B_VDJ', 'umi_count_B_VJ', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'v_call_B_VDJ_main', 'd_call_B_VDJ_main', 'j_call_B_VDJ_main', 'v_call_B_VJ_main', 'j_call_B_VJ_main', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'
slicing .metadata
[6]:
vdj[vdj.metadata["productive_VDJ"].isin(["T", "T|T"])]
[6]:
Dandelion class object with n_obs = 2334 and n_contigs = 5557
data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'c_call', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end', 'v_score', 'v_identity', 'v_support', 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2', 'cdr3', 'cell_id', 'consensus_count', 'umi_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'j_support_igblastn', 'j_score_igblastn', 'j_call_igblastn', 'j_call_blastn', 'j_identity_blastn', 'j_alignment_length_blastn', 'j_number_of_mismatches_blastn', 'j_number_of_gap_openings_blastn', 'j_sequence_start_blastn', 'j_sequence_end_blastn', 'j_germline_start_blastn', 'j_germline_end_blastn', 'j_support_blastn', 'j_score_blastn', 'j_sequence_alignment_blastn', 'j_germline_alignment_blastn', 'j_source', 'd_support_igblastn', 'd_score_igblastn', 'd_call_igblastn', 'd_call_blastn', 'd_identity_blastn', 'd_alignment_length_blastn', 'd_number_of_mismatches_blastn', 'd_number_of_gap_openings_blastn', 'd_sequence_start_blastn', 'd_sequence_end_blastn', 'd_germline_start_blastn', 'd_germline_end_blastn', 'd_support_blastn', 'd_score_blastn', 'd_sequence_alignment_blastn', 'd_germline_alignment_blastn', 'd_source', 'v_call_genotyped', 'germline_alignment_d_mask', 'sample_id', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_sequence_end', 'c_score', 'c_identity', 'c_call_10x', 'junction_aa_length', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa', 'sequence_alignment_aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'complete_vdj', 'j_call_multimappers', 'j_call_multiplicity', 'j_call_sequence_start_multimappers', 'j_call_sequence_end_multimappers', 'j_call_support_multimappers', 'mu_count', 'ambiguous', 'extra', 'rearrangement_status', 'clone_id'
metadata: 'clone_id', 'clone_id_rank', 'sample_id', 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'umi_count_B_VDJ', 'umi_count_B_VJ', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'v_call_B_VDJ_main', 'd_call_B_VDJ_main', 'j_call_B_VDJ_main', 'v_call_B_VJ_main', 'j_call_B_VJ_main', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'
[7]:
vdj[vdj.metadata_names == "vdj_v1_hs_pbmc3_b_TTTCCTCAGCGCTTAT"]
[7]:
Dandelion class object with n_obs = 1 and n_contigs = 2
data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'c_call', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end', 'v_score', 'v_identity', 'v_support', 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2', 'cdr3', 'cell_id', 'consensus_count', 'umi_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'j_support_igblastn', 'j_score_igblastn', 'j_call_igblastn', 'j_call_blastn', 'j_identity_blastn', 'j_alignment_length_blastn', 'j_number_of_mismatches_blastn', 'j_number_of_gap_openings_blastn', 'j_sequence_start_blastn', 'j_sequence_end_blastn', 'j_germline_start_blastn', 'j_germline_end_blastn', 'j_support_blastn', 'j_score_blastn', 'j_sequence_alignment_blastn', 'j_germline_alignment_blastn', 'j_source', 'd_support_igblastn', 'd_score_igblastn', 'd_call_igblastn', 'd_call_blastn', 'd_identity_blastn', 'd_alignment_length_blastn', 'd_number_of_mismatches_blastn', 'd_number_of_gap_openings_blastn', 'd_sequence_start_blastn', 'd_sequence_end_blastn', 'd_germline_start_blastn', 'd_germline_end_blastn', 'd_support_blastn', 'd_score_blastn', 'd_sequence_alignment_blastn', 'd_germline_alignment_blastn', 'd_source', 'v_call_genotyped', 'germline_alignment_d_mask', 'sample_id', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_sequence_end', 'c_score', 'c_identity', 'c_call_10x', 'junction_aa_length', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa', 'sequence_alignment_aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'complete_vdj', 'j_call_multimappers', 'j_call_multiplicity', 'j_call_sequence_start_multimappers', 'j_call_sequence_end_multimappers', 'j_call_support_multimappers', 'mu_count', 'ambiguous', 'extra', 'rearrangement_status', 'clone_id'
metadata: 'clone_id', 'clone_id_rank', 'sample_id', 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'umi_count_B_VDJ', 'umi_count_B_VJ', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'v_call_B_VDJ_main', 'd_call_B_VDJ_main', 'j_call_B_VDJ_main', 'v_call_B_VJ_main', 'j_call_B_VJ_main', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'
copy
You can deep copy the Dandelion object to another variable which will inherit all slots:
[8]:
vdj2 = vdj.copy()
vdj2.metadata
[8]:
| clone_id | clone_id_rank | sample_id | locus_VDJ | locus_VJ | productive_VDJ | productive_VJ | v_call_VDJ | d_call_VDJ | j_call_VDJ | ... | d_call_B_VDJ_main | j_call_B_VDJ_main | v_call_B_VJ_main | j_call_B_VJ_main | isotype | isotype_status | locus_status | chain_status | rearrangement_status_VDJ | rearrangement_status_VJ | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG | B_VJ_187_2_3 | 135 | sc5p_v2_hs_PBMC_10k_b | None | IGK | None | T | None | None | None | ... | None | None | IGKV1-33,IGKV1D-33 | IGKJ4 | None | None | Orphan IGK | Orphan VJ | None | standard |
| sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC | B_VDJ_220_3_2_VJ_4_2_1 | 2196 | sc5p_v2_hs_PBMC_10k_b | IGH | IGK | T | T | IGHV1-69D,IGHV1-69 | IGHD3-22 | IGHJ3 | ... | IGHD3-22 | IGHJ3 | IGKV1-8 | IGKJ1 | IgM | IgM | IGH + IGK | Single pair | standard | standard |
| sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG | B_VDJ_85_1_1_VJ_194_1_1 | 1750 | sc5p_v2_hs_PBMC_10k_b | IGH | IGL | T | T | IGHV1-2 | None | IGHJ3 | ... | None | IGHJ3 | IGLV5-45 | IGLJ3 | IgM | IgM | IGH + IGL | Single pair | standard | standard |
| sc5p_v2_hs_PBMC_10k_b_AAACCTGTCTTGAGAC | B_VDJ_147_4_6_VJ_50_1_1 | 1751 | sc5p_v2_hs_PBMC_10k_b | IGH | IGK | T | T | IGHV5-51 | None | IGHJ3 | ... | None | IGHJ3 | IGKV1D-8 | IGKJ2 | IgM | IgM | IGH + IGK | Single pair | standard | standard |
| sc5p_v2_hs_PBMC_10k_b_AAACGGGAGCGACGTA | B_VDJ_145_2_1_VJ_56_2_1 | 1752 | sc5p_v2_hs_PBMC_10k_b | IGH | IGL | T | T | IGHV4-4 | IGHD6-13 | IGHJ3 | ... | IGHD6-13 | IGHJ3 | IGLV3-19 | IGLJ2,IGLJ3 | IgM | IgM | IGH + IGL | Single pair | standard | standard |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| vdj_v1_hs_pbmc3_b_TTTCCTCAGCGCTTAT | B_VDJ_66_5_4_VJ_183_1_1 | 895 | vdj_v1_hs_pbmc3_b | IGH | IGK | T | T | IGHV3-30 | IGHD4-17 | IGHJ6 | ... | IGHD4-17 | IGHJ6 | IGKV2-30 | IGKJ2 | IgM | IgM | IGH + IGK | Single pair | standard | standard |
| vdj_v1_hs_pbmc3_b_TTTCCTCAGGGAAACA | B_VDJ_205_2_1_VJ_105_4_1 | 896 | vdj_v1_hs_pbmc3_b | IGH | IGK | T | T | IGHV4-61 | IGHD6-13 | IGHJ2 | ... | IGHD6-13 | IGHJ2 | IGKV1-39,IGKV1D-39 | IGKJ1 | IgM | IgM | IGH + IGK | Single pair | standard | standard |
| vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC | B_VDJ_120_5_2_VJ_58_2_1 | 897 | vdj_v1_hs_pbmc3_b | IGH | IGK | T | T | IGHV1-46 | IGHD2-15 | IGHJ5 | ... | IGHD2-15 | IGHJ5 | IGKV1-39,IGKV1D-39 | IGKJ2 | IgM | IgM | IGH + IGK | Single pair | standard | standard |
| vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG | B_VDJ_197_4_2_VJ_70_3_1 | 898 | vdj_v1_hs_pbmc3_b | IGH | IGL | T | T | IGHV1-69D,IGHV1-69 | IGHD2-15 | IGHJ6 | ... | IGHD2-15 | IGHJ6 | IGLV1-47 | IGLJ3 | IgM | IgM | IGH + IGL | Single pair | standard | standard |
| vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG | B_VDJ_206_6_7_VJ_87_3_1 | 2598 | vdj_v1_hs_pbmc3_b | IGH | IGL | T | T | IGHV3-23,IGHV3-23D | None | IGHJ4 | ... | None | IGHJ4 | IGLV2-11 | IGLJ2,IGLJ3 | IgM | IgM | IGH + IGL | Single pair | standard | standard |
2493 rows × 47 columns
Retrieving entries with update_metadata
The .metadata slot in Dandelion class automatically initializes whenever the .data slot is filled. However, it only returns a standard number of columns that are pre-specified. To retrieve other columns from the .data slot, we can update the metadata with ddl.update_metadata and specify the options retrieve and retrieve_mode.
The following modes determine how the retrieval is completed:
split and unique only - splits the retrieval into VDJ and VJ chains. A | will separate unique element.
split and merge - splits the retrieval into VDJ and VJ chains. A | will separate every element.
merge and unique only - smiliar to above but merged into a single column.
split - split retrieval into individual columns for each contig.
merge - merge retrieval into a single column where a | will separate every element.
For numerical columns, there’s additional options:
split and sum - splits the retrieval into VDJ and VJ chains and sum separately.
split and average - smiliar to above but average instead of sum.
sum - sum the retrievals into a single column.
average - averages the retrievals into a single column.
If retrieve_mode is not specified, it will default to split and merge
Example: retrieving fwr1 sequences
[9]:
vdj.update_metadata(retrieve="fwr1")
vdj
[9]:
Dandelion class object with n_obs = 2493 and n_contigs = 7355
data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'c_call', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end', 'v_score', 'v_identity', 'v_support', 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2', 'cdr3', 'cell_id', 'consensus_count', 'umi_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'j_support_igblastn', 'j_score_igblastn', 'j_call_igblastn', 'j_call_blastn', 'j_identity_blastn', 'j_alignment_length_blastn', 'j_number_of_mismatches_blastn', 'j_number_of_gap_openings_blastn', 'j_sequence_start_blastn', 'j_sequence_end_blastn', 'j_germline_start_blastn', 'j_germline_end_blastn', 'j_support_blastn', 'j_score_blastn', 'j_sequence_alignment_blastn', 'j_germline_alignment_blastn', 'j_source', 'd_support_igblastn', 'd_score_igblastn', 'd_call_igblastn', 'd_call_blastn', 'd_identity_blastn', 'd_alignment_length_blastn', 'd_number_of_mismatches_blastn', 'd_number_of_gap_openings_blastn', 'd_sequence_start_blastn', 'd_sequence_end_blastn', 'd_germline_start_blastn', 'd_germline_end_blastn', 'd_support_blastn', 'd_score_blastn', 'd_sequence_alignment_blastn', 'd_germline_alignment_blastn', 'd_source', 'v_call_genotyped', 'germline_alignment_d_mask', 'sample_id', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_sequence_end', 'c_score', 'c_identity', 'c_call_10x', 'junction_aa_length', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa', 'sequence_alignment_aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'complete_vdj', 'j_call_multimappers', 'j_call_multiplicity', 'j_call_sequence_start_multimappers', 'j_call_sequence_end_multimappers', 'j_call_support_multimappers', 'mu_count', 'ambiguous', 'extra', 'rearrangement_status', 'clone_id'
metadata: 'clone_id', 'clone_id_rank', 'sample_id', 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'umi_count_B_VDJ', 'umi_count_B_VJ', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'v_call_B_VDJ_main', 'd_call_B_VDJ_main', 'j_call_B_VDJ_main', 'v_call_B_VJ_main', 'j_call_B_VJ_main', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ', 'fwr1_VJ', 'fwr1_VDJ'
Note the additional fwr1 VDJ and VJ columns in the metadata slot.
By default, dandelion will not try to merge numerical columns as it can create mixed dtype columns.
There is a new sub-function that will try and retrieve frequently used columns such as np1_length, np2_length:
[10]:
vdj.update_plus()
vdj
/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.12/site-packages/numpy/_core/fromnumeric.py:3860: RuntimeWarning: Mean of empty slice.
/opt/homebrew/Caskroom/miniforge/base/envs/dandelion/lib/python3.12/site-packages/numpy/_core/_methods.py:144: RuntimeWarning: invalid value encountered in scalar divide
[10]:
Dandelion class object with n_obs = 2493 and n_contigs = 7355
data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'c_call', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end', 'v_score', 'v_identity', 'v_support', 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2', 'cdr3', 'cell_id', 'consensus_count', 'umi_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'j_support_igblastn', 'j_score_igblastn', 'j_call_igblastn', 'j_call_blastn', 'j_identity_blastn', 'j_alignment_length_blastn', 'j_number_of_mismatches_blastn', 'j_number_of_gap_openings_blastn', 'j_sequence_start_blastn', 'j_sequence_end_blastn', 'j_germline_start_blastn', 'j_germline_end_blastn', 'j_support_blastn', 'j_score_blastn', 'j_sequence_alignment_blastn', 'j_germline_alignment_blastn', 'j_source', 'd_support_igblastn', 'd_score_igblastn', 'd_call_igblastn', 'd_call_blastn', 'd_identity_blastn', 'd_alignment_length_blastn', 'd_number_of_mismatches_blastn', 'd_number_of_gap_openings_blastn', 'd_sequence_start_blastn', 'd_sequence_end_blastn', 'd_germline_start_blastn', 'd_germline_end_blastn', 'd_support_blastn', 'd_score_blastn', 'd_sequence_alignment_blastn', 'd_germline_alignment_blastn', 'd_source', 'v_call_genotyped', 'germline_alignment_d_mask', 'sample_id', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_sequence_end', 'c_score', 'c_identity', 'c_call_10x', 'junction_aa_length', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa', 'sequence_alignment_aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'complete_vdj', 'j_call_multimappers', 'j_call_multiplicity', 'j_call_sequence_start_multimappers', 'j_call_sequence_end_multimappers', 'j_call_support_multimappers', 'mu_count', 'ambiguous', 'extra', 'rearrangement_status', 'clone_id'
metadata: 'clone_id', 'clone_id_rank', 'sample_id', 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'umi_count_B_VDJ', 'umi_count_B_VJ', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'v_call_B_VDJ_main', 'd_call_B_VDJ_main', 'j_call_B_VDJ_main', 'v_call_B_VJ_main', 'j_call_B_VJ_main', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ', 'fwr1_VJ', 'fwr1_VDJ', 'mu_count_VDJ', 'mu_count_VJ', 'mu_count', 'junction_length_VDJ', 'junction_length_VJ', 'junction_aa_length_VDJ', 'junction_aa_length_VJ', 'np1_length_VDJ', 'np1_length_VJ', 'np2_length_VDJ'
Renaming barcodes
You can now use a simple function to rename the barcodes (both sequence and cell ids at the same time). This is useful for when you want to rename the barcodes to a more meaningful name. This only works on the indices that were initially used to create the Dandelion object. So if you have run the function once already, it doesn’t continuously add the prefix/suffix to the new indices. It just updates based on the original indices.
[11]:
# original
print(vdj.data[["sequence_id", "cell_id"]]), print(vdj.metadata_names)
sequence_id \
sequence_id
sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG_contig_1 sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG_contig_1
sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_contig_2 sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_contig_2
sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_contig_1 sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_contig_1
sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_contig_1 sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_contig_1
sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_contig_2 sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_contig_2
... ...
vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC_contig_1 vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC_contig_1
vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_2 vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_2
vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_1 vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_1
vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_2 vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_2
vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_1 vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_1
cell_id
sequence_id
sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG_contig_1 sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG
sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_contig_2 sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC
sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_contig_1 sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC
sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_contig_1 sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG
sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_contig_2 sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG
... ...
vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC_contig_1 vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC
vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_2 vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG
vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_1 vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG
vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_2 vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG
vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_1 vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG
[7355 rows x 2 columns]
Index(['sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG',
'sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC',
'sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG',
'sc5p_v2_hs_PBMC_10k_b_AAACCTGTCTTGAGAC',
'sc5p_v2_hs_PBMC_10k_b_AAACGGGAGCGACGTA',
'sc5p_v2_hs_PBMC_10k_b_AAACGGGCACTGTTAG',
'sc5p_v2_hs_PBMC_10k_b_AAAGATGAGGATGCGT',
'sc5p_v2_hs_PBMC_10k_b_AAAGATGGTCGAATCT',
'sc5p_v2_hs_PBMC_10k_b_AAAGATGGTGAGGGAG',
'sc5p_v2_hs_PBMC_10k_b_AAAGTAGCAGATCCAT',
...
'vdj_v1_hs_pbmc3_b_TTTACTGTCAGCTGGC',
'vdj_v1_hs_pbmc3_b_TTTATGCGTCAGAATA',
'vdj_v1_hs_pbmc3_b_TTTATGCTCAGGATCT',
'vdj_v1_hs_pbmc3_b_TTTATGCTCCTAGAAC',
'vdj_v1_hs_pbmc3_b_TTTCCTCAGCAATATG',
'vdj_v1_hs_pbmc3_b_TTTCCTCAGCGCTTAT',
'vdj_v1_hs_pbmc3_b_TTTCCTCAGGGAAACA',
'vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC',
'vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG',
'vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG'],
dtype='object', length=2493)
[11]:
(None, None)
[12]:
# let's add a 'test-' as a prefix. There's also the suffix option
vdj.add_sequence_prefix("test", sep="-")
print(vdj.data[["sequence_id", "cell_id"]]), print(vdj.metadata_names)
sequence_id \
sequence_id
test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG_con... test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG_co...
test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_con... test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_co...
test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_con... test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_co...
test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_con... test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_co...
test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_con... test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_co...
... ...
test-vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC_contig_1 test-vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC_contig_1
test-vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_2 test-vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_2
test-vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_1 test-vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_1
test-vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_2 test-vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_2
test-vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_1 test-vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_1
cell_id
sequence_id
test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG_con... test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG
test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_con... test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC
test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_con... test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC
test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_con... test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG
test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_con... test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG
... ...
test-vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC_contig_1 test-vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC
test-vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_2 test-vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG
test-vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_1 test-vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG
test-vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_2 test-vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG
test-vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_1 test-vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG
[7355 rows x 2 columns]
Index(['test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG',
'test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC',
'test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG',
'test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCTTGAGAC',
'test-sc5p_v2_hs_PBMC_10k_b_AAACGGGAGCGACGTA',
'test-sc5p_v2_hs_PBMC_10k_b_AAACGGGCACTGTTAG',
'test-sc5p_v2_hs_PBMC_10k_b_AAAGATGAGGATGCGT',
'test-sc5p_v2_hs_PBMC_10k_b_AAAGATGGTCGAATCT',
'test-sc5p_v2_hs_PBMC_10k_b_AAAGATGGTGAGGGAG',
'test-sc5p_v2_hs_PBMC_10k_b_AAAGTAGCAGATCCAT',
...
'test-vdj_v1_hs_pbmc3_b_TTTACTGTCAGCTGGC',
'test-vdj_v1_hs_pbmc3_b_TTTATGCGTCAGAATA',
'test-vdj_v1_hs_pbmc3_b_TTTATGCTCAGGATCT',
'test-vdj_v1_hs_pbmc3_b_TTTATGCTCCTAGAAC',
'test-vdj_v1_hs_pbmc3_b_TTTCCTCAGCAATATG',
'test-vdj_v1_hs_pbmc3_b_TTTCCTCAGCGCTTAT',
'test-vdj_v1_hs_pbmc3_b_TTTCCTCAGGGAAACA',
'test-vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC',
'test-vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG',
'test-vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG'],
dtype='object', length=2493)
[12]:
(None, None)
[13]:
len(vdj._original_cell_ids.unique())
[13]:
3158
[14]:
vdj._metadata.index
[14]:
Index(['test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG',
'test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC',
'test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG',
'test-sc5p_v2_hs_PBMC_10k_b_AAACCTGTCTTGAGAC',
'test-sc5p_v2_hs_PBMC_10k_b_AAACGGGAGCGACGTA',
'test-sc5p_v2_hs_PBMC_10k_b_AAACGGGCACTGTTAG',
'test-sc5p_v2_hs_PBMC_10k_b_AAAGATGAGGATGCGT',
'test-sc5p_v2_hs_PBMC_10k_b_AAAGATGGTCGAATCT',
'test-sc5p_v2_hs_PBMC_10k_b_AAAGATGGTGAGGGAG',
'test-sc5p_v2_hs_PBMC_10k_b_AAAGTAGCAGATCCAT',
...
'test-vdj_v1_hs_pbmc3_b_TTTACTGTCAGCTGGC',
'test-vdj_v1_hs_pbmc3_b_TTTATGCGTCAGAATA',
'test-vdj_v1_hs_pbmc3_b_TTTATGCTCAGGATCT',
'test-vdj_v1_hs_pbmc3_b_TTTATGCTCCTAGAAC',
'test-vdj_v1_hs_pbmc3_b_TTTCCTCAGCAATATG',
'test-vdj_v1_hs_pbmc3_b_TTTCCTCAGCGCTTAT',
'test-vdj_v1_hs_pbmc3_b_TTTCCTCAGGGAAACA',
'test-vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC',
'test-vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG',
'test-vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG'],
dtype='object', length=2493)
[15]:
# same functionality as above
vdj.add_cell_prefix("test2", sep="_")
print(vdj.data[["sequence_id", "cell_id"]]), print(vdj.metadata_names)
sequence_id \
sequence_id
test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG_co... test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG_c...
test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_co... test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_c...
test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_co... test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_c...
test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_co... test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_c...
test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_co... test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_c...
... ...
test2_vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC_contig_1 test2_vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC_contig_1
test2_vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_2 test2_vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_2
test2_vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_1 test2_vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_1
test2_vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_2 test2_vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_2
test2_vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_1 test2_vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_1
cell_id
sequence_id
test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG_co... test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG
test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_co... test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC
test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_co... test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC
test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_co... test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG
test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_co... test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG
... ...
test2_vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC_contig_1 test2_vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC
test2_vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_2 test2_vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG
test2_vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_1 test2_vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG
test2_vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_2 test2_vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG
test2_vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_1 test2_vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG
[7355 rows x 2 columns]
Index(['test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG',
'test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC',
'test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG',
'test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCTTGAGAC',
'test2_sc5p_v2_hs_PBMC_10k_b_AAACGGGAGCGACGTA',
'test2_sc5p_v2_hs_PBMC_10k_b_AAACGGGCACTGTTAG',
'test2_sc5p_v2_hs_PBMC_10k_b_AAAGATGAGGATGCGT',
'test2_sc5p_v2_hs_PBMC_10k_b_AAAGATGGTCGAATCT',
'test2_sc5p_v2_hs_PBMC_10k_b_AAAGATGGTGAGGGAG',
'test2_sc5p_v2_hs_PBMC_10k_b_AAAGTAGCAGATCCAT',
...
'test2_vdj_v1_hs_pbmc3_b_TTTACTGTCAGCTGGC',
'test2_vdj_v1_hs_pbmc3_b_TTTATGCGTCAGAATA',
'test2_vdj_v1_hs_pbmc3_b_TTTATGCTCAGGATCT',
'test2_vdj_v1_hs_pbmc3_b_TTTATGCTCCTAGAAC',
'test2_vdj_v1_hs_pbmc3_b_TTTCCTCAGCAATATG',
'test2_vdj_v1_hs_pbmc3_b_TTTCCTCAGCGCTTAT',
'test2_vdj_v1_hs_pbmc3_b_TTTCCTCAGGGAAACA',
'test2_vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC',
'test2_vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG',
'test2_vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG'],
dtype='object', length=2493)
[15]:
(None, None)
Simplifying the V/DJ/C call annotations
Sometimes the V/DJ/C call annotations can be quite verbose. You can simplify them with the .simplify() function. This function will remove the , and only keep the first element of the call, as well as stripping alleles. This is useful for when you want to simplify the V/DJ/C calls for plotting purposes.
[16]:
# before
(
vdj.data[["v_call", "j_call"]],
vdj.metadata[["v_call_VDJ", "j_call_VDJ"]],
)
[16]:
( v_call \
sequence_id
test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG_co... IGKV1-33*01,IGKV1D-33*01
test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_co... IGHV1-69*01,IGHV1-69D*01
test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_co... IGKV1-8*01
test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_co... IGLV5-45*02
test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_co... IGHV1-2*02
... ...
test2_vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC_contig_1 IGHV1-46*01
test2_vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_2 IGHV1-69*01,IGHV1-69D*01
test2_vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_1 IGLV1-47*01
test2_vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_2 IGLV2-11*01
test2_vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_1 IGHV3-23*01,IGHV3-23D*01
j_call
sequence_id
test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG_co... IGKJ4*01
test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_co... IGHJ3*02
test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_co... IGKJ1*01
test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_co... IGLJ3*02
test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_co... IGHJ3*02
... ...
test2_vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC_contig_1 IGHJ5*02
test2_vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_2 IGHJ6*02
test2_vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_1 IGLJ3*02
test2_vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_2 IGLJ2*01,IGLJ3*01,IGLJ3*02
test2_vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_1 IGHJ4*02
[7355 rows x 2 columns],
v_call_VDJ j_call_VDJ
test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG None None
test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC IGHV1-69D,IGHV1-69 IGHJ3
test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG IGHV1-2 IGHJ3
test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCTTGAGAC IGHV5-51 IGHJ3
test2_sc5p_v2_hs_PBMC_10k_b_AAACGGGAGCGACGTA IGHV4-4 IGHJ3
... ... ...
test2_vdj_v1_hs_pbmc3_b_TTTCCTCAGCGCTTAT IGHV3-30 IGHJ6
test2_vdj_v1_hs_pbmc3_b_TTTCCTCAGGGAAACA IGHV4-61 IGHJ2
test2_vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC IGHV1-46 IGHJ5
test2_vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG IGHV1-69D,IGHV1-69 IGHJ6
test2_vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG IGHV3-23,IGHV3-23D IGHJ4
[2493 rows x 2 columns])
[17]:
# after
vdj.simplify()
(
vdj.data[["v_call", "j_call"]],
vdj.metadata[["v_call_VDJ", "j_call_VDJ"]],
)
[17]:
( v_call j_call
sequence_id
test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG_co... IGKV1-33 IGKJ4
test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_co... IGHV1-69 IGHJ3
test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_co... IGKV1-8 IGKJ1
test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_co... IGLV5-45 IGLJ3
test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_co... IGHV1-2 IGHJ3
... ... ...
test2_vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC_contig_1 IGHV1-46 IGHJ5
test2_vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_2 IGHV1-69 IGHJ6
test2_vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_1 IGLV1-47 IGLJ3
test2_vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_2 IGLV2-11 IGLJ2
test2_vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_1 IGHV3-23 IGHJ4
[7355 rows x 2 columns],
v_call_VDJ j_call_VDJ
test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG None None
test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC IGHV1-69 IGHJ3
test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG IGHV1-2 IGHJ3
test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCTTGAGAC IGHV5-51 IGHJ3
test2_sc5p_v2_hs_PBMC_10k_b_AAACGGGAGCGACGTA IGHV4-4 IGHJ3
... ... ...
test2_vdj_v1_hs_pbmc3_b_TTTCCTCAGCGCTTAT IGHV3-30 IGHJ6
test2_vdj_v1_hs_pbmc3_b_TTTCCTCAGGGAAACA IGHV4-61 IGHJ2
test2_vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC IGHV1-46 IGHJ5
test2_vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG IGHV1-69 IGHJ6
test2_vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG IGHV3-23 IGHJ4
[2493 rows x 2 columns])
concatenating multiple objects
This is a simple function to concatenate (append) two or more Dandelion class, or pandas dataframes. Note that this operates on the .data slot and not the .metadata slot.
[18]:
vdj
[18]:
Dandelion class object with n_obs = 2493 and n_contigs = 7355
data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'c_call', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end', 'v_score', 'v_identity', 'v_support', 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2', 'cdr3', 'cell_id', 'consensus_count', 'umi_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'j_support_igblastn', 'j_score_igblastn', 'j_call_igblastn', 'j_call_blastn', 'j_identity_blastn', 'j_alignment_length_blastn', 'j_number_of_mismatches_blastn', 'j_number_of_gap_openings_blastn', 'j_sequence_start_blastn', 'j_sequence_end_blastn', 'j_germline_start_blastn', 'j_germline_end_blastn', 'j_support_blastn', 'j_score_blastn', 'j_sequence_alignment_blastn', 'j_germline_alignment_blastn', 'j_source', 'd_support_igblastn', 'd_score_igblastn', 'd_call_igblastn', 'd_call_blastn', 'd_identity_blastn', 'd_alignment_length_blastn', 'd_number_of_mismatches_blastn', 'd_number_of_gap_openings_blastn', 'd_sequence_start_blastn', 'd_sequence_end_blastn', 'd_germline_start_blastn', 'd_germline_end_blastn', 'd_support_blastn', 'd_score_blastn', 'd_sequence_alignment_blastn', 'd_germline_alignment_blastn', 'd_source', 'v_call_genotyped', 'germline_alignment_d_mask', 'sample_id', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_sequence_end', 'c_score', 'c_identity', 'c_call_10x', 'junction_aa_length', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa', 'sequence_alignment_aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'complete_vdj', 'j_call_multimappers', 'j_call_multiplicity', 'j_call_sequence_start_multimappers', 'j_call_sequence_end_multimappers', 'j_call_support_multimappers', 'mu_count', 'ambiguous', 'extra', 'rearrangement_status', 'clone_id'
metadata: 'clone_id', 'clone_id_rank', 'sample_id', 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'umi_count_B_VDJ', 'umi_count_B_VJ', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'v_call_B_VDJ_main', 'd_call_B_VDJ_main', 'j_call_B_VDJ_main', 'v_call_B_VJ_main', 'j_call_B_VJ_main', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'
[19]:
# just simple concatenation x 3. check the difference between the cell and contig numbers between this object and just vdj
vdj_concat = ddl.tl.concat([vdj, vdj, vdj])
vdj_concat
[19]:
Dandelion class object with n_obs = 2493 and n_contigs = 22065
data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'c_call', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end', 'v_score', 'v_identity', 'v_support', 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2', 'cdr3', 'cell_id', 'consensus_count', 'umi_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'j_support_igblastn', 'j_score_igblastn', 'j_call_igblastn', 'j_call_blastn', 'j_identity_blastn', 'j_alignment_length_blastn', 'j_number_of_mismatches_blastn', 'j_number_of_gap_openings_blastn', 'j_sequence_start_blastn', 'j_sequence_end_blastn', 'j_germline_start_blastn', 'j_germline_end_blastn', 'j_support_blastn', 'j_score_blastn', 'j_sequence_alignment_blastn', 'j_germline_alignment_blastn', 'j_source', 'd_support_igblastn', 'd_score_igblastn', 'd_call_igblastn', 'd_call_blastn', 'd_identity_blastn', 'd_alignment_length_blastn', 'd_number_of_mismatches_blastn', 'd_number_of_gap_openings_blastn', 'd_sequence_start_blastn', 'd_sequence_end_blastn', 'd_germline_start_blastn', 'd_germline_end_blastn', 'd_support_blastn', 'd_score_blastn', 'd_sequence_alignment_blastn', 'd_germline_alignment_blastn', 'd_source', 'v_call_genotyped', 'germline_alignment_d_mask', 'sample_id', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_sequence_end', 'c_score', 'c_identity', 'c_call_10x', 'junction_aa_length', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa', 'sequence_alignment_aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'complete_vdj', 'j_call_multimappers', 'j_call_multiplicity', 'j_call_sequence_start_multimappers', 'j_call_sequence_end_multimappers', 'j_call_support_multimappers', 'mu_count', 'ambiguous', 'extra', 'rearrangement_status', 'clone_id'
metadata: 'clone_id', 'clone_id_rank', 'sample_id', 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'umi_count_B_VDJ', 'umi_count_B_VJ', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'v_call_B_VDJ_main', 'd_call_B_VDJ_main', 'j_call_B_VDJ_main', 'v_call_B_VJ_main', 'j_call_B_VJ_main', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'
[20]:
vdj_concat.data[["sequence_id", "cell_id"]].head()
[20]:
| sequence_id | cell_id | |
|---|---|---|
| sequence_id | ||
| sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG_contig_1_0 | sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG_contig_1_0 | test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG |
| sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG_contig_1_1 | sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG_contig_1_1 | test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG |
| sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG_contig_1_2 | sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG_contig_1_2 | test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG |
| sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_contig_2_0 | sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_contig_2_0 | test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC |
| sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_contig_2_1 | sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_contig_2_1 | test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC |
ddl.concat also lets you add in your custom prefixes/suffixes to append to the sequence ids. If not provided, it will add -0, -1 etc. as a suffix if it detects that the sequence ids are not unique as seen above.
read/write
Dandelion class can be saved using .write_h5ddl and .write_pkl functions with accompanying compression methods e.g. gzip. write_h5ddl primarily uses h5py library and write_pkl just uses pickle. read_h5ddl and read_pkl functions will read the respective file formats accordingly.
[21]:
%time vdj.write_h5ddl('dandelion_results_test.h5ddl', compression="gzip")
CPU times: user 8.01 s, sys: 403 ms, total: 8.41 s
Wall time: 9.49 s
If you see any warnings above, it’s due to mix dtypes somewhere in the object. So do some checking if you think it will interfere with downstream usage.
[22]:
%time vdj_1 = ddl.read_h5ddl('dandelion_results_test.h5ddl')
vdj_1
CPU times: user 600 ms, sys: 78.1 ms, total: 678 ms
Wall time: 774 ms
[22]:
Dandelion class object with n_obs = 2493 and n_contigs = 7355
data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'c_call', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end', 'v_score', 'v_identity', 'v_support', 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2', 'cdr3', 'cell_id', 'consensus_count', 'umi_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'j_support_igblastn', 'j_score_igblastn', 'j_call_igblastn', 'j_call_blastn', 'j_identity_blastn', 'j_alignment_length_blastn', 'j_number_of_mismatches_blastn', 'j_number_of_gap_openings_blastn', 'j_sequence_start_blastn', 'j_sequence_end_blastn', 'j_germline_start_blastn', 'j_germline_end_blastn', 'j_support_blastn', 'j_score_blastn', 'j_sequence_alignment_blastn', 'j_germline_alignment_blastn', 'j_source', 'd_support_igblastn', 'd_score_igblastn', 'd_call_igblastn', 'd_call_blastn', 'd_identity_blastn', 'd_alignment_length_blastn', 'd_number_of_mismatches_blastn', 'd_number_of_gap_openings_blastn', 'd_sequence_start_blastn', 'd_sequence_end_blastn', 'd_germline_start_blastn', 'd_germline_end_blastn', 'd_support_blastn', 'd_score_blastn', 'd_sequence_alignment_blastn', 'd_germline_alignment_blastn', 'd_source', 'v_call_genotyped', 'germline_alignment_d_mask', 'sample_id', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_sequence_end', 'c_score', 'c_identity', 'c_call_10x', 'junction_aa_length', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa', 'sequence_alignment_aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'complete_vdj', 'j_call_multimappers', 'j_call_multiplicity', 'j_call_sequence_start_multimappers', 'j_call_sequence_end_multimappers', 'j_call_support_multimappers', 'mu_count', 'ambiguous', 'extra', 'rearrangement_status', 'clone_id'
metadata: 'clone_id', 'clone_id_rank', 'sample_id', 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'umi_count_B_VDJ', 'umi_count_B_VJ', 'v_call_VDJ_main', 'v_call_VJ_main', 'd_call_VDJ_main', 'j_call_VDJ_main', 'j_call_VJ_main', 'c_call_VDJ_main', 'c_call_VJ_main', 'v_call_B_VDJ_main', 'd_call_B_VDJ_main', 'j_call_B_VDJ_main', 'v_call_B_VJ_main', 'j_call_B_VJ_main', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'
There’s also other types of writing functions such as .write_airr and .write_10x, which will write the object to a .tsv or .csv file that is compatible with airr and 10x formats respectively. The use case for .write_10x is e.g. if you want to reannotate your VDJ data using the preprocessing workflow but your data is not actually from 10x’s platform. Note that .write_10x only writes the contig table and fasta files and does not include metadata, graph, or
distances.
[23]:
import pandas as pd
vdj_1.write_airr("test.airr.tsv")
df = pd.read_csv("test.airr.tsv", sep="\t")
df
[23]:
| sequence_id | sequence | rev_comp | productive | v_call | d_call | j_call | sequence_alignment | germline_alignment | junction | ... | j_call_multimappers | j_call_multiplicity | j_call_sequence_start_multimappers | j_call_sequence_end_multimappers | j_call_support_multimappers | mu_count | ambiguous | extra | rearrangement_status | clone_id | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG_c... | TGGGGAGGAGTCAGTCCCAACCAGGACACGGCCTGGACATGAGGGT... | F | T | IGKV1-33 | NaN | IGKJ4 | GACATCCAGATGACCCAGTCTCCATCCTCCCTGTCTGCATCTGTGG... | GACATCCAGATGACCCAGTCTCCATCCTCCCTGTCTGCATCTGTAG... | TGTCAACAATATGACGAACTTCCCGTCACTTTC | ... | ["IGKJ4*01"] | 1 | [385] | [412] | [3.56e-09] | 27 | F | F | standard | B_VJ_187_2_3 |
| 1 | test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_c... | ATCACATAACAACCACATTCCTCCTCTAAAGAAGCCCCTGGGAGCA... | F | T | IGHV1-69 | IGHD3-22 | IGHJ3 | CAGGTGCAGCTGGTGCAGTCTGGGGCT...GAGGTGAAGAAGCCTG... | CAGGTGCAGCTGGTGCAGTCTGGGGCT...GAGGTGAAGAAGCCTG... | TGTGCGACTACGTATTACTATGATAGTAGTGGTTATTACCAGAATG... | ... | ["IGHJ3*02"] | 1 | [445] | [494] | [4.5799999999999995e-23] | 0 | F | F | standard | B_VDJ_220_3_2_VJ_4_2_1 |
| 2 | test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_c... | AGGAGTCAGACCCTGTCAGGACACAGCATAGACATGAGGGTCCCCG... | F | T | IGKV1-8 | NaN | IGKJ1 | GCCATCCGGATGACCCAGTCTCCATCCTCATTCTCTGCATCTACAG... | GCCATCCGGATGACCCAGTCTCCATCCTCATTCTCTGCATCTACAG... | TGTCAACAGTATTATAGTTACCCTCGGACGTTC | ... | ["IGKJ1*01"] | 1 | [380] | [415] | [2.7e-15] | 0 | F | F | standard | B_VDJ_220_3_2_VJ_4_2_1 |
| 3 | test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_c... | ACTGTGGGGGTAAGAGGTTGTGTCCACCATGGCCTGGACTCCTCTC... | F | T | IGLV5-45 | NaN | IGLJ3 | CAGGCTGTGCTGACTCAGCCGTCTTCC...CTCTCTGCATCTCCTG... | CAGGCTGTGCTGACTCAGCCGTCTTCC...CTCTCTGCATCTCCTG... | TGTATGATTTGGCACAGCAGCGCTTGGGTGGTC | ... | ["IGLJ3*01"] | 1 | [402] | [431] | [6.84e-12] | 8 | F | F | standard | B_VDJ_85_1_1_VJ_194_1_1 |
| 4 | test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_c... | GGGAGCATCACCCAGCAACCACATCTGTCCTCTAGAGAATCCCCTG... | F | T | IGHV1-2 | NaN | IGHJ3 | CAGGTGCAACTGGTGCAGTCTGGGGGT...GAGGTAAAGAAGCCTG... | CAGGTGCAGCTGGTGCAGTCTGGGGCT...GAGGTGAAGAAGCCTG... | TGTGCGAGAGAGATAGAGGGGGACGGTGTTTTTGAAATCTGG | ... | ["IGHJ3*02"] | 1 | [433] | [479] | [4.48e-18] | 22 | F | F | standard | B_VDJ_85_1_1_VJ_194_1_1 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 7350 | test2_vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC_contig_1 | ATCATCCAACAACCACATCCCTTCTCTACAGAAGCCTCTGAGAGGA... | F | T | IGHV1-46 | IGHD2-15 | IGHJ5 | CAGGTGCAGCTGGTGCAGTCTGGGGCT...GAGGTGAAGAAGCCTG... | CAGGTGCAGCTGGTGCAGTCTGGGGCT...GAGGTGAAGAAGCCTG... | TGTGCGAGAGAGGGATATTGTAGTGGTGGTAGCTGCTACTCCCCCG... | ... | ["IGHJ5*02"] | 1 | [461] | [506] | [7.83e-21] | 0 | F | F | standard | B_VDJ_120_5_2_VJ_58_2_1 |
| 7351 | test2_vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_2 | ATCACATAACAACCACATTCCTCCTCTAAAGAAGCCCCTGGGAGCA... | F | T | IGHV1-69 | IGHD2-15 | IGHJ6 | CAGGTGCAGCTGGTGCAGTCTGGGGCT...GAGGTGAAGAAGCCTG... | CAGGTGCAGCTGGTGCAGTCTGGGGCT...GAGGTGAAGAAGCCTG... | TGTGCGAGATCTCTGGATATTGTAGTGGTGGTAGCACTCTACTACT... | ... | ["IGHJ6*02"] | 1 | [439] | [497] | [4.57e-28] | 0 | F | F | standard | B_VDJ_197_4_2_VJ_70_3_1 |
| 7352 | test2_vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_1 | AGCTTCAGCTGTGGTAGAGAAGACAGGATTCAGGACAATCTCCAGC... | F | T | IGLV1-47 | NaN | IGLJ3 | CAGTCTGTGCTGACTCAGCCACCCTCA...GCGTCTGGGACCCCCG... | CAGTCTGTGCTGACTCAGCCACCCTCA...GCGTCTGGGACCCCCG... | TGTGCAGCATGGGATGACAGCCTGAGTGGTTGGGTGTTC | ... | ["IGLJ3*02"] | 1 | [397] | [434] | [2.46e-16] | 0 | F | F | standard | B_VDJ_197_4_2_VJ_70_3_1 |
| 7353 | test2_vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_2 | GGCTGGGGTCTCAGGAGGCAGCACTCTCGGGACGTCTCCACCATGG... | F | T | IGLV2-11 | NaN | IGLJ2 | CAGTCTGCCCTGACTCAGCCTCGCTCA...GTGTCCGGGTCTCCTG... | CAGTCTGCCCTGACTCAGCCTCGCTCA...GTGTCCGGGTCTCCTG... | TGCTGCTCATATGCAGGCAGCTACACTGTGTTTTTC | ... | ["IGLJ3*01"] | 1 | [393] | [430] | [2.46e-11] | 4 | F | F | standard | B_VDJ_206_6_7_VJ_87_3_1 |
| 7354 | test2_vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_1 | AGCTCTGAGAGAGGAGCCCAGCCCTGGGATTTTCAGGTGTTTTCAT... | F | T | IGHV3-23 | NaN | IGHJ4 | GAGGTGCAGGTGTTGGAGTCTGGGGGA...GGCTTGGAACAGCCTG... | GAGGTGCAGCTGTTGGAGTCTGGGGGA...GGCTTGGTACAGCCTG... | TGTGCGGGGAGTCGGTGGTTATATTCTTTTGACTACTGG | ... | ["IGHJ4*02"] | 1 | [449] | [491] | [1.65e-17] | 8 | F | F | standard | B_VDJ_206_6_7_VJ_87_3_1 |
7355 rows × 124 columns
[24]:
vdj_1.write_10x(
folder="10x_test",
filename_prefix="all",
) # this writes both the conting_annotations.csv and contig.fasta
df = pd.read_csv("10x_test/all_contig_annotations.csv")
df
[24]:
| barcode | contig_id | length | chain | v_gene | d_gene | j_gene | c_gene | full_length | productive | cdr3 | cdr3_nt | reads | umis | raw_clonotype_id | raw_consensus_id | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG | test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG_c... | 556 | IGK | IGKV1-33 | NaN | IGKJ4 | IGKC | NaN | True | CQQYDELPVTF | TGTCAACAATATGACGAACTTCCCGTCACTTTC | 9139 | 68 | B_VJ_187_2_3 | B_VJ_187_2_3 |
| 1 | test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC | test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_c... | 565 | IGH | IGHV1-69 | IGHD3-22 | IGHJ3 | IGHM | NaN | True | CATTYYYDSSGYYQNDAFDIW | TGTGCGACTACGTATTACTATGATAGTAGTGGTTATTACCAGAATG... | 4161 | 51 | B_VDJ_220_3_2_VJ_4_2_1 | B_VDJ_220_3_2_VJ_4_2_1 |
| 2 | test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC | test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_c... | 551 | IGK | IGKV1-8 | NaN | IGKJ1 | IGKC | NaN | True | CQQYYSYPRTF | TGTCAACAGTATTATAGTTACCCTCGGACGTTC | 5679 | 43 | B_VDJ_220_3_2_VJ_4_2_1 | B_VDJ_220_3_2_VJ_4_2_1 |
| 3 | test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG | test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_c... | 642 | IGL | IGLV5-45 | NaN | IGLJ3 | IGLC3 | NaN | True | CMIWHSSAWVV | TGTATGATTTGGCACAGCAGCGCTTGGGTGGTC | 13160 | 90 | B_VDJ_85_1_1_VJ_194_1_1 | B_VDJ_85_1_1_VJ_194_1_1 |
| 4 | test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG | test2_sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_c... | 550 | IGH | IGHV1-2 | NaN | IGHJ3 | IGHM | NaN | True | CAREIEGDGVFEIW | TGTGCGAGAGAGATAGAGGGGGACGGTGTTTTTGAAATCTGG | 5080 | 47 | B_VDJ_85_1_1_VJ_194_1_1 | B_VDJ_85_1_1_VJ_194_1_1 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 7350 | test2_vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC | test2_vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC_contig_1 | 577 | IGH | IGHV1-46 | IGHD2-15 | IGHJ5 | IGHM | NaN | True | CAREGYCSGGSCYSPDPNNGWFDPW | TGTGCGAGAGAGGGATATTGTAGTGGTGGTAGCTGCTACTCCCCCG... | 2960 | 28 | B_VDJ_120_5_2_VJ_58_2_1 | B_VDJ_120_5_2_VJ_58_2_1 |
| 7351 | test2_vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG | test2_vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_2 | 568 | IGH | IGHV1-69 | IGHD2-15 | IGHJ6 | IGHM | NaN | True | CARSLDIVVVVALYYYYGMDVW | TGTGCGAGATCTCTGGATATTGTAGTGGTGGTAGCACTCTACTACT... | 2464 | 32 | B_VDJ_197_4_2_VJ_70_3_1 | B_VDJ_197_4_2_VJ_70_3_1 |
| 7352 | test2_vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG | test2_vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_1 | 645 | IGL | IGLV1-47 | NaN | IGLJ3 | IGLC3 | NaN | True | CAAWDDSLSGWVF | TGTGCAGCATGGGATGACAGCCTGAGTGGTTGGGTGTTC | 2457 | 28 | B_VDJ_197_4_2_VJ_70_3_1 | B_VDJ_197_4_2_VJ_70_3_1 |
| 7353 | test2_vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG | test2_vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_2 | 641 | IGL | IGLV2-11 | NaN | IGLJ2 | IGLC | NaN | True | CCSYAGSYTVFF | TGCTGCTCATATGCAGGCAGCTACACTGTGTTTTTC | 2744 | 36 | B_VDJ_206_6_7_VJ_87_3_1 | B_VDJ_206_6_7_VJ_87_3_1 |
| 7354 | test2_vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG | test2_vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_1 | 562 | IGH | IGHV3-23 | NaN | IGHJ4 | IGHM | NaN | True | CAGSRWLYSFDYW | TGTGCGGGGAGTCGGTGGTTATATTCTTTTGACTACTGG | 1915 | 22 | B_VDJ_206_6_7_VJ_87_3_1 | B_VDJ_206_6_7_VJ_87_3_1 |
7355 rows × 16 columns