DandelionPolars class

Much of the functions and utility of the dandelion.polars backend revolves around the DandelionPolars class object. The class will act as an intermediary object for storage and flexible interaction with other tools. This section will run through a quick primer to the DandelionPolars class.

DandelionPolars Overview

DandelionPolars is the core data container for the dandelion.polars backend, replacing the original pandas-based Dandelion class. It displays as “Lazy Dandelion object” (when lazy=True, the default) or “Dandelion object” (when lazy=False).

Key changes:

  • data accepts pl.LazyFrame, pl.DataFrame, pd.DataFrame, or a file path. Pandas DataFrames are automatically converted to Polars on input.

  • lazy=True (default): .data and .metadata are stored as LazyFrames with deferred query execution.

  • lazy=False: stored as eager DataFrames.

LazyFrame behavior

When .data is backed by a LazyFrame:

  • vdj.data.column_name returns pl.col("column_name") (an expression), not a concrete Series.

  • To get concrete data, call vdj.data.collect() to materialize to a DataFrame.

  • Methods like write_csv are not available on LazyFrame — you must .collect() first.

vdj.data.collect().write_csv("output.tsv", separator="\t")

Backend Conversion

Here are some additional QoL functions for converting between lazy and eager modes, as well as to pandas if needed. Note that when in lazy mode, .data returns a LazyFrame and when in eager mode, it returns a DataFrame. When converting to pandas, .data will return a pd.DataFrame.

vdj.to_pandas()
# vdj.data now returns a pd.DataFrame

vdj.to_polars(lazy=True)
# vdj.data now returns a DataFrameAccessor wrapping a LazyFrame

vdj.to_eager()
# vdj.data is now a polars DataFrame (not lazy)

vdj.to_lazy()
# vdj.data is now a LazyFrame again

Import modules

[1]:
import os

os.chdir("dandelion_tutorial/")
import dandelion as ddl

ddl.logging.print_versions()
dandelion==1.0.0a1.dev36 pandas==2.3.3 numpy==2.3.5 matplotlib==3.10.6 networkx==3.6.1 scipy==1.15.2
[ ]:
vdj = ddl.read_zipddl("dandelion_results_simplified.zipddl")
vdj
Using PyTorch backend with Apple Metal GPU
Finding clones based on B cell VDJ chains using junction_aa: 100%|██████████| 1233/1233 [00:01<00:00, 803.50it/s]
Finding clones based on B cell VJ chains using junction_aa: 100%|██████████| 576/576 [00:00<00:00, 769.57it/s]
Lazy Dandelion object with n_obs = 2496 and n_contigs = 5767
    data: sequence_id, sequence, rev_comp, productive, v_call, d_call, j_call, sequence_alignment, germline_alignment, junction, junction_aa, v_cigar, d_cigar, j_cigar, stop_codon, vj_in_frame, locus, junction_length, np1_length, np2_length, v_sequence_start, v_sequence_end, v_germline_start, v_germline_end, d_sequence_start, d_sequence_end, d_germline_start, d_germline_end, j_sequence_start, j_sequence_end, j_germline_start, j_germline_end, v_score, v_identity, v_support, d_score, d_identity, d_support, j_score, j_identity, j_support, fwr1, fwr2, fwr3, fwr4, cdr1, cdr2, cdr3, cell_id, consensus_count, umi_count, v_call_10x, d_call_10x, j_call_10x, junction_10x, junction_10x_aa, j_call_blastn, j_identity_blastn, j_alignment_length_blastn, j_number_of_mismatches_blastn, j_number_of_gap_openings_blastn, j_sequence_start_blastn, j_sequence_end_blastn, j_germline_start_blastn, j_germline_end_blastn, j_support_blastn, j_score_blastn, j_sequence_alignment_blastn, j_germline_alignment_blastn, j_call_igblastn, j_source, j_support_igblastn, j_score_igblastn, d_call_blastn, d_identity_blastn, d_alignment_length_blastn, d_number_of_mismatches_blastn, d_number_of_gap_openings_blastn, d_sequence_start_blastn, d_sequence_end_blastn, d_germline_start_blastn, d_germline_end_blastn, d_support_blastn, d_score_blastn, d_sequence_alignment_blastn, d_germline_alignment_blastn, d_call_igblastn, d_source, d_support_igblastn, d_score_igblastn, v_call_genotyped, germline_alignment_d_mask, sample_id, c_call, c_sequence_alignment, c_germline_alignment, c_sequence_start, c_sequence_end, c_score, c_identity, c_call_10x, junction_aa_length, fwr1_aa, fwr2_aa, fwr3_aa, fwr4_aa, cdr1_aa, cdr2_aa, cdr3_aa, sequence_alignment_aa, v_sequence_alignment_aa, d_sequence_alignment_aa, j_sequence_alignment_aa, complete_vdj, j_call_multimappers, j_call_multiplicity, j_call_sequence_start_multimappers, j_call_sequence_end_multimappers, j_call_support_multimappers, mu_count, extra, ambiguous, rearrangement_status, clone_id
    metadata: cell_id, clone_id, clone_id_rank, sample_id, productive_VDJ, productive_VJ, d_call_VDJ, j_call_VDJ, j_call_VJ, junction_VDJ, junction_VJ, junction_aa_VDJ, junction_aa_VJ, locus_VDJ, locus_VJ, v_call_VDJ, v_call_VJ, c_call_VDJ, c_call_VJ, umi_count_VDJ, umi_count_VJ, productive_VDJ_main, productive_VJ_main, d_call_VDJ_main, j_call_VDJ_main, j_call_VJ_main, junction_VDJ_main, junction_VJ_main, junction_aa_VDJ_main, junction_aa_VJ_main, locus_VDJ_main, locus_VJ_main, v_call_genotyped_VDJ_main, v_call_genotyped_VJ_main, c_call_VDJ_main, c_call_VJ_main, umi_count_VDJ_main, umi_count_VJ_main, isotype, isotype_main, isotype_status, locus_status, chain_status, rearrangement_status_VDJ, rearrangement_status_VJ
    layout: layout for 2351 vertices, layout for 148 vertices
    graph: networkx graph of 2351 vertices, networkx graph of 148 vertices

Essentially, the .data slot holds the AIRR contig table while the .metadata holds a collapsed version that is compatible with combining with AnnData’s .obs slot. Both slots return lazy Polars expressions and require .collect() to retrieve the actual data. You can retrieve these slots like a typical class object; for example, if I want the metadata:

[3]:
vdj.metadata.collect()
[3]:
shape: (2_496, 45)
cell_idclone_idclone_id_ranksample_idproductive_VDJproductive_VJd_call_VDJj_call_VDJj_call_VJjunction_VDJjunction_VJjunction_aa_VDJjunction_aa_VJlocus_VDJlocus_VJv_call_VDJv_call_VJc_call_VDJc_call_VJumi_count_VDJumi_count_VJproductive_VDJ_mainproductive_VJ_maind_call_VDJ_mainj_call_VDJ_mainj_call_VJ_mainjunction_VDJ_mainjunction_VJ_mainjunction_aa_VDJ_mainjunction_aa_VJ_mainlocus_VDJ_mainlocus_VJ_mainv_call_genotyped_VDJ_mainv_call_genotyped_VJ_mainc_call_VDJ_mainc_call_VJ_mainumi_count_VDJ_mainumi_count_VJ_mainisotypeisotype_mainisotype_statuslocus_statuschain_statusrearrangement_status_VDJrearrangement_status_VJ
strstrcatstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstri64i64strstrstrstrstrstrstrstrstrstrstrstrstrstrstrf64f64strstrstrstrstrstrstr
"sc5p_v2_hs_PBMC_10k_b_AAACCTGT…"B_VJ_36_2_3""1286""sc5p_v2_hs_PBMC_10k_b"null"T"nullnull"IGKJ4"null"TGTCAACAATATGACGAACTTCCCGTCACT…null"CQQYDELPVTF"null"IGK"null"IGKV1-33*01,IGKV1D-33*01"null"IGKC"068null"T"nullnull"IGKJ4"null"TGTCAACAATATGACGAACTTCCCGTCACT…null"CQQYDELPVTF"null"IGK"null"IGKV1-33*01,IGKV1D-33*01"null"IGKC"null68.0nullnullnull"Orphan IGK""Orphan VJ"null"Standard"
"sc5p_v2_hs_PBMC_10k_b_AAACCTGT…"B_VDJ_42_3_1_VJ_59_2_1""1983""sc5p_v2_hs_PBMC_10k_b""T""T""IGHD3-22""IGHJ3""IGKJ1""TGTGCGACTACGTATTACTATGATAGTAGT…"TGTCAACAGTATTATAGTTACCCTCGGACG…"CATTYYYDSSGYYQNDAFDIW""CQQYYSYPRTF""IGH""IGK""IGHV1-69*01,IGHV1-69D*01""IGKV1-8*01""IGHM""IGKC"5143"T""T""IGHD3-22""IGHJ3""IGKJ1""TGTGCGACTACGTATTACTATGATAGTAGT…"TGTCAACAGTATTATAGTTACCCTCGGACG…"CATTYYYDSSGYYQNDAFDIW""CQQYYSYPRTF""IGH""IGK""IGHV1-69*01,IGHV1-69D*01""IGKV1-8*01""IGHM""IGKC"51.043.0"IgM""IgM""IgM""IGH + IGK""Single pair""Standard""Standard"
"sc5p_v2_hs_PBMC_10k_b_AAACCTGT…"B_VDJ_9_1_2_VJ_253_1_1""1406""sc5p_v2_hs_PBMC_10k_b""T""T"null"IGHJ3""IGLJ3""TGTGCGAGAGAGATAGAGGGGGACGGTGTT…"TGTATGATTTGGCACAGCAGCGCTTGGGTG…"CAREIEGDGVFEIW""CMIWHSSAWVV""IGH""IGL""IGHV1-2*02""IGLV5-45*02""IGHM""IGLC3"4790"T""T"null"IGHJ3""IGLJ3""TGTGCGAGAGAGATAGAGGGGGACGGTGTT…"TGTATGATTTGGCACAGCAGCGCTTGGGTG…"CAREIEGDGVFEIW""CMIWHSSAWVV""IGH""IGL""IGHV1-2*02""IGLV5-45*02""IGHM""IGLC3"47.090.0"IgM""IgM""IgM""IGH + IGL""Single pair""Standard""Standard"
"sc5p_v2_hs_PBMC_10k_b_AAACCTGT…"B_VDJ_246_4_2_VJ_82_1_1""2175""sc5p_v2_hs_PBMC_10k_b""T""T"null"IGHJ3""IGKJ2""TGTGCGAGACATATCCGTGGGAACAGATTT…"TGTCAACAGTATTATAGTTTCCCGTACACT…"CARHIRGNRFGNDAFDIW""CQQYYSFPYTF""IGH""IGK""IGHV5-51*03""IGKV1D-8*01""IGHM""IGKC"8022"T""T"null"IGHJ3""IGKJ2""TGTGCGAGACATATCCGTGGGAACAGATTT…"TGTCAACAGTATTATAGTTTCCCGTACACT…"CARHIRGNRFGNDAFDIW""CQQYYSFPYTF""IGH""IGK""IGHV5-51*03""IGKV1D-8*01""IGHM""IGKC"80.022.0"IgM""IgM""IgM""IGH + IGK""Single pair""Standard""Standard"
"sc5p_v2_hs_PBMC_10k_b_AAACGGGA…"B_VDJ_217_2_1_VJ_222_2_1""234""sc5p_v2_hs_PBMC_10k_b""T""T""IGHD6-13""IGHJ3""IGLJ2""TGTGCGAGAGTAGGCTATAGAGCAGCAGCT…"TGTAACTCCCGGGACAGCAGTGGTAACCAT…"CARVGYRAAAGTDAFDIW""CNSRDSSGNHVVF""IGH""IGL""IGHV4-4*07""IGLV3-19*01""IGHM""IGLC"1814"T""T""IGHD6-13""IGHJ3""IGLJ2""TGTGCGAGAGTAGGCTATAGAGCAGCAGCT…"TGTAACTCCCGGGACAGCAGTGGTAACCAT…"CARVGYRAAAGTDAFDIW""CNSRDSSGNHVVF""IGH""IGL""IGHV4-4*07""IGLV3-19*01""IGHM""IGLC"18.014.0"IgM""IgM""IgM""IGH + IGL""Single pair""Standard""Standard"
"vdj_v1_hs_pbmc3_b_TTTCCTCAGCGC…"B_VDJ_109_5_5_VJ_98_1_1""1638""vdj_v1_hs_pbmc3_b""T""T""IGHD4-17""IGHJ6""IGKJ2""TGTGCGAAAGCCGCCTACGGTGAGGGGCTC…"TGCATGCAAGGTACACACTGGCCGTACACT…"CAKAAYGEGLRYYYYGMDVW""CMQGTHWPYTF""IGH""IGK""IGHV3-30*18""IGKV2-30*01""IGHM""IGKC"1128"T""T""IGHD4-17""IGHJ6""IGKJ2""TGTGCGAAAGCCGCCTACGGTGAGGGGCTC…"TGCATGCAAGGTACACACTGGCCGTACACT…"CAKAAYGEGLRYYYYGMDVW""CMQGTHWPYTF""IGH""IGK""IGHV3-30*18""IGKV2-30*01""IGHM""IGKC"11.028.0"IgM""IgM""IgM""IGH + IGK""Single pair""Standard""Standard"
"vdj_v1_hs_pbmc3_b_TTTCCTCAGGGA…"B_VDJ_232_2_1_VJ_39_4_1""238""vdj_v1_hs_pbmc3_b""T""T""IGHD6-13""IGHJ2""IGKJ1""TGTGCGAGACCCCGTATAGCAGGATCTGGG…"TGTCAACAGAGTTACAGTACCCCGTGGACG…"CARPRIAGSGWYFDLW""CQQSYSTPWTF""IGH""IGK""IGHV4-61*12""IGKV1-39*01,IGKV1D-39*01""IGHM""IGKC"14159"T""T""IGHD6-13""IGHJ2""IGKJ1""TGTGCGAGACCCCGTATAGCAGGATCTGGG…"TGTCAACAGAGTTACAGTACCCCGTGGACG…"CARPRIAGSGWYFDLW""CQQSYSTPWTF""IGH""IGK""IGHV4-61*12""IGKV1-39*01,IGKV1D-39*01""IGHM""IGKC"14.0159.0"IgM""IgM""IgM""IGH + IGK""Single pair""Standard""Standard"
"vdj_v1_hs_pbmc3_b_TTTCCTCTCGAC…"B_VDJ_33_5_1_VJ_41_1_1""1887""vdj_v1_hs_pbmc3_b""T""T""IGHD2-15""IGHJ5""IGKJ2""TGTGCGAGAGAGGGATATTGTAGTGGTGGT…"TGTCAACAGAGTTACAGTACCCCTCGGACT…"CAREGYCSGGSCYSPDPNNGWFDPW""CQQSYSTPRTF""IGH""IGK""IGHV1-46*01""IGKV1-39*01,IGKV1D-39*01""IGHM""IGKC"2835"T""T""IGHD2-15""IGHJ5""IGKJ2""TGTGCGAGAGAGGGATATTGTAGTGGTGGT…"TGTCAACAGAGTTACAGTACCCCTCGGACT…"CAREGYCSGGSCYSPDPNNGWFDPW""CQQSYSTPRTF""IGH""IGK""IGHV1-46*01""IGKV1-39*01,IGKV1D-39*01""IGHM""IGKC"28.035.0"IgM""IgM""IgM""IGH + IGK""Single pair""Standard""Standard"
"vdj_v1_hs_pbmc3_b_TTTGCGCCATAC…"B_VDJ_45_4_2_VJ_181_3_1""953""vdj_v1_hs_pbmc3_b""T""T""IGHD2-15""IGHJ6""IGLJ3""TGTGCGAGATCTCTGGATATTGTAGTGGTG…"TGTGCAGCATGGGATGACAGCCTGAGTGGT…"CARSLDIVVVVALYYYYGMDVW""CAAWDDSLSGWVF""IGH""IGL""IGHV1-69*01,IGHV1-69D*01""IGLV1-47*01""IGHM""IGLC3"3228"T""T""IGHD2-15""IGHJ6""IGLJ3""TGTGCGAGATCTCTGGATATTGTAGTGGTG…"TGTGCAGCATGGGATGACAGCCTGAGTGGT…"CARSLDIVVVVALYYYYGMDVW""CAAWDDSLSGWVF""IGH""IGL""IGHV1-69*01,IGHV1-69D*01""IGLV1-47*01""IGHM""IGLC3"32.028.0"IgM""IgM""IgM""IGH + IGL""Single pair""Standard""Standard"
"vdj_v1_hs_pbmc3_b_TTTGGTTGTAGG…"B_VDJ_94_6_3_VJ_190_1_1""1392""vdj_v1_hs_pbmc3_b""T""T"null"IGHJ4""IGLJ2""TGTGCGGGGAGTCGGTGGTTATATTCTTTT…"TGCTGCTCATATGCAGGCAGCTACACTGTG…"CAGSRWLYSFDYW""CCSYAGSYTVFF""IGH""IGL""IGHV3-23*01,IGHV3-23D*01""IGLV2-11*01""IGHM""IGLC"2236"T""T"null"IGHJ4""IGLJ2""TGTGCGGGGAGTCGGTGGTTATATTCTTTT…"TGCTGCTCATATGCAGGCAGCTACACTGTG…"CAGSRWLYSFDYW""CCSYAGSYTVFF""IGH""IGL""IGHV3-23*01,IGHV3-23D*01""IGLV2-11*01""IGHM""IGLC"22.036.0"IgM""IgM""IgM""IGH + IGL""Single pair""Standard""Standard"

slicing

You can slice the DandelionPolars object via the .data or .metadata via their indices, with the behavior similar to how it is in pandas DataFrame and AnnData. Since both slots return lazy Polars expressions, you need to call .collect() before using standard Polars/pandas indexing operations on them.

slicing .data

[4]:
# get the largest clone
largest_clone = vdj.data.collect()["clone_id"].value_counts()["clone_id"][0]

vdj[vdj.data.collect()["clone_id"] == largest_clone]
[4]:
Lazy Dandelion object with n_obs = 1 and n_contigs = 2
    data: sequence_id, sequence, rev_comp, productive, v_call, d_call, j_call, sequence_alignment, germline_alignment, junction, junction_aa, v_cigar, d_cigar, j_cigar, stop_codon, vj_in_frame, locus, junction_length, np1_length, np2_length, v_sequence_start, v_sequence_end, v_germline_start, v_germline_end, d_sequence_start, d_sequence_end, d_germline_start, d_germline_end, j_sequence_start, j_sequence_end, j_germline_start, j_germline_end, v_score, v_identity, v_support, d_score, d_identity, d_support, j_score, j_identity, j_support, fwr1, fwr2, fwr3, fwr4, cdr1, cdr2, cdr3, cell_id, consensus_count, umi_count, v_call_10x, d_call_10x, j_call_10x, junction_10x, junction_10x_aa, j_call_blastn, j_identity_blastn, j_alignment_length_blastn, j_number_of_mismatches_blastn, j_number_of_gap_openings_blastn, j_sequence_start_blastn, j_sequence_end_blastn, j_germline_start_blastn, j_germline_end_blastn, j_support_blastn, j_score_blastn, j_sequence_alignment_blastn, j_germline_alignment_blastn, j_call_igblastn, j_source, j_support_igblastn, j_score_igblastn, d_call_blastn, d_identity_blastn, d_alignment_length_blastn, d_number_of_mismatches_blastn, d_number_of_gap_openings_blastn, d_sequence_start_blastn, d_sequence_end_blastn, d_germline_start_blastn, d_germline_end_blastn, d_support_blastn, d_score_blastn, d_sequence_alignment_blastn, d_germline_alignment_blastn, d_call_igblastn, d_source, d_support_igblastn, d_score_igblastn, v_call_genotyped, germline_alignment_d_mask, sample_id, c_call, c_sequence_alignment, c_germline_alignment, c_sequence_start, c_sequence_end, c_score, c_identity, c_call_10x, junction_aa_length, fwr1_aa, fwr2_aa, fwr3_aa, fwr4_aa, cdr1_aa, cdr2_aa, cdr3_aa, sequence_alignment_aa, v_sequence_alignment_aa, d_sequence_alignment_aa, j_sequence_alignment_aa, complete_vdj, j_call_multimappers, j_call_multiplicity, j_call_sequence_start_multimappers, j_call_sequence_end_multimappers, j_call_support_multimappers, mu_count, extra, ambiguous, rearrangement_status, clone_id
    metadata: cell_id, clone_id, clone_id_rank, sample_id, productive_VDJ, productive_VJ, d_call_VDJ, j_call_VDJ, j_call_VJ, junction_VDJ, junction_VJ, junction_aa_VDJ, junction_aa_VJ, locus_VDJ, locus_VJ, v_call_VDJ, v_call_VJ, c_call_VDJ, c_call_VJ, umi_count_VDJ, umi_count_VJ, productive_VDJ_main, productive_VJ_main, d_call_VDJ_main, j_call_VDJ_main, j_call_VJ_main, junction_VDJ_main, junction_VJ_main, junction_aa_VDJ_main, junction_aa_VJ_main, locus_VDJ_main, locus_VJ_main, v_call_genotyped_VDJ_main, v_call_genotyped_VJ_main, c_call_VDJ_main, c_call_VJ_main, umi_count_VDJ_main, umi_count_VJ_main, isotype, isotype_main, isotype_status, locus_status, chain_status, rearrangement_status_VDJ, rearrangement_status_VJ
    layout: layout for 1 vertices, layout for 0 vertices
    graph: networkx graph of 1 vertices, networkx graph of 0 vertices
[5]:
vdj[
    vdj.data_names.is_in(
        [
            "sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG_contig_1",
            "sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_contig_2",
            "sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_contig_1",
            "sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_contig_1",
            "sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_contig_2",
        ]
    )
]
[5]:
Lazy Dandelion object with n_obs = 3 and n_contigs = 5
    data: sequence_id, sequence, rev_comp, productive, v_call, d_call, j_call, sequence_alignment, germline_alignment, junction, junction_aa, v_cigar, d_cigar, j_cigar, stop_codon, vj_in_frame, locus, junction_length, np1_length, np2_length, v_sequence_start, v_sequence_end, v_germline_start, v_germline_end, d_sequence_start, d_sequence_end, d_germline_start, d_germline_end, j_sequence_start, j_sequence_end, j_germline_start, j_germline_end, v_score, v_identity, v_support, d_score, d_identity, d_support, j_score, j_identity, j_support, fwr1, fwr2, fwr3, fwr4, cdr1, cdr2, cdr3, cell_id, consensus_count, umi_count, v_call_10x, d_call_10x, j_call_10x, junction_10x, junction_10x_aa, j_call_blastn, j_identity_blastn, j_alignment_length_blastn, j_number_of_mismatches_blastn, j_number_of_gap_openings_blastn, j_sequence_start_blastn, j_sequence_end_blastn, j_germline_start_blastn, j_germline_end_blastn, j_support_blastn, j_score_blastn, j_sequence_alignment_blastn, j_germline_alignment_blastn, j_call_igblastn, j_source, j_support_igblastn, j_score_igblastn, d_call_blastn, d_identity_blastn, d_alignment_length_blastn, d_number_of_mismatches_blastn, d_number_of_gap_openings_blastn, d_sequence_start_blastn, d_sequence_end_blastn, d_germline_start_blastn, d_germline_end_blastn, d_support_blastn, d_score_blastn, d_sequence_alignment_blastn, d_germline_alignment_blastn, d_call_igblastn, d_source, d_support_igblastn, d_score_igblastn, v_call_genotyped, germline_alignment_d_mask, sample_id, c_call, c_sequence_alignment, c_germline_alignment, c_sequence_start, c_sequence_end, c_score, c_identity, c_call_10x, junction_aa_length, fwr1_aa, fwr2_aa, fwr3_aa, fwr4_aa, cdr1_aa, cdr2_aa, cdr3_aa, sequence_alignment_aa, v_sequence_alignment_aa, d_sequence_alignment_aa, j_sequence_alignment_aa, complete_vdj, j_call_multimappers, j_call_multiplicity, j_call_sequence_start_multimappers, j_call_sequence_end_multimappers, j_call_support_multimappers, mu_count, extra, ambiguous, rearrangement_status, clone_id
    metadata: cell_id, clone_id, clone_id_rank, sample_id, productive_VDJ, productive_VJ, d_call_VDJ, j_call_VDJ, j_call_VJ, junction_VDJ, junction_VJ, junction_aa_VDJ, junction_aa_VJ, locus_VDJ, locus_VJ, v_call_VDJ, v_call_VJ, c_call_VDJ, c_call_VJ, umi_count_VDJ, umi_count_VJ, productive_VDJ_main, productive_VJ_main, d_call_VDJ_main, j_call_VDJ_main, j_call_VJ_main, junction_VDJ_main, junction_VJ_main, junction_aa_VDJ_main, junction_aa_VJ_main, locus_VDJ_main, locus_VJ_main, v_call_genotyped_VDJ_main, v_call_genotyped_VJ_main, c_call_VDJ_main, c_call_VJ_main, umi_count_VDJ_main, umi_count_VJ_main, isotype, isotype_main, isotype_status, locus_status, chain_status, rearrangement_status_VDJ, rearrangement_status_VJ
    layout: layout for 2 vertices, layout for 0 vertices
    graph: networkx graph of 2 vertices, networkx graph of 0 vertices

slicing .metadata

[6]:
vdj[vdj.metadata.collect()["productive_VDJ"].is_in(["T", "T|T"])]
[6]:
Lazy Dandelion object with n_obs = 2336 and n_contigs = 5557
    data: sequence_id, sequence, rev_comp, productive, v_call, d_call, j_call, sequence_alignment, germline_alignment, junction, junction_aa, v_cigar, d_cigar, j_cigar, stop_codon, vj_in_frame, locus, junction_length, np1_length, np2_length, v_sequence_start, v_sequence_end, v_germline_start, v_germline_end, d_sequence_start, d_sequence_end, d_germline_start, d_germline_end, j_sequence_start, j_sequence_end, j_germline_start, j_germline_end, v_score, v_identity, v_support, d_score, d_identity, d_support, j_score, j_identity, j_support, fwr1, fwr2, fwr3, fwr4, cdr1, cdr2, cdr3, cell_id, consensus_count, umi_count, v_call_10x, d_call_10x, j_call_10x, junction_10x, junction_10x_aa, j_call_blastn, j_identity_blastn, j_alignment_length_blastn, j_number_of_mismatches_blastn, j_number_of_gap_openings_blastn, j_sequence_start_blastn, j_sequence_end_blastn, j_germline_start_blastn, j_germline_end_blastn, j_support_blastn, j_score_blastn, j_sequence_alignment_blastn, j_germline_alignment_blastn, j_call_igblastn, j_source, j_support_igblastn, j_score_igblastn, d_call_blastn, d_identity_blastn, d_alignment_length_blastn, d_number_of_mismatches_blastn, d_number_of_gap_openings_blastn, d_sequence_start_blastn, d_sequence_end_blastn, d_germline_start_blastn, d_germline_end_blastn, d_support_blastn, d_score_blastn, d_sequence_alignment_blastn, d_germline_alignment_blastn, d_call_igblastn, d_source, d_support_igblastn, d_score_igblastn, v_call_genotyped, germline_alignment_d_mask, sample_id, c_call, c_sequence_alignment, c_germline_alignment, c_sequence_start, c_sequence_end, c_score, c_identity, c_call_10x, junction_aa_length, fwr1_aa, fwr2_aa, fwr3_aa, fwr4_aa, cdr1_aa, cdr2_aa, cdr3_aa, sequence_alignment_aa, v_sequence_alignment_aa, d_sequence_alignment_aa, j_sequence_alignment_aa, complete_vdj, j_call_multimappers, j_call_multiplicity, j_call_sequence_start_multimappers, j_call_sequence_end_multimappers, j_call_support_multimappers, mu_count, extra, ambiguous, rearrangement_status, clone_id
    metadata: cell_id, clone_id, clone_id_rank, sample_id, productive_VDJ, productive_VJ, d_call_VDJ, j_call_VDJ, j_call_VJ, junction_VDJ, junction_VJ, junction_aa_VDJ, junction_aa_VJ, locus_VDJ, locus_VJ, v_call_VDJ, v_call_VJ, c_call_VDJ, c_call_VJ, umi_count_VDJ, umi_count_VJ, productive_VDJ_main, productive_VJ_main, d_call_VDJ_main, j_call_VDJ_main, j_call_VJ_main, junction_VDJ_main, junction_VJ_main, junction_aa_VDJ_main, junction_aa_VJ_main, locus_VDJ_main, locus_VJ_main, v_call_genotyped_VDJ_main, v_call_genotyped_VJ_main, c_call_VDJ_main, c_call_VJ_main, umi_count_VDJ_main, umi_count_VJ_main, isotype, isotype_main, isotype_status, locus_status, chain_status, rearrangement_status_VDJ, rearrangement_status_VJ
    layout: layout for 2336 vertices, layout for 146 vertices
    graph: networkx graph of 2336 vertices, networkx graph of 146 vertices
[7]:
vdj[vdj.metadata_names == "vdj_v1_hs_pbmc3_b_TTTCCTCAGCGCTTAT"]
[7]:
Lazy Dandelion object with n_obs = 1 and n_contigs = 2
    data: sequence_id, sequence, rev_comp, productive, v_call, d_call, j_call, sequence_alignment, germline_alignment, junction, junction_aa, v_cigar, d_cigar, j_cigar, stop_codon, vj_in_frame, locus, junction_length, np1_length, np2_length, v_sequence_start, v_sequence_end, v_germline_start, v_germline_end, d_sequence_start, d_sequence_end, d_germline_start, d_germline_end, j_sequence_start, j_sequence_end, j_germline_start, j_germline_end, v_score, v_identity, v_support, d_score, d_identity, d_support, j_score, j_identity, j_support, fwr1, fwr2, fwr3, fwr4, cdr1, cdr2, cdr3, cell_id, consensus_count, umi_count, v_call_10x, d_call_10x, j_call_10x, junction_10x, junction_10x_aa, j_call_blastn, j_identity_blastn, j_alignment_length_blastn, j_number_of_mismatches_blastn, j_number_of_gap_openings_blastn, j_sequence_start_blastn, j_sequence_end_blastn, j_germline_start_blastn, j_germline_end_blastn, j_support_blastn, j_score_blastn, j_sequence_alignment_blastn, j_germline_alignment_blastn, j_call_igblastn, j_source, j_support_igblastn, j_score_igblastn, d_call_blastn, d_identity_blastn, d_alignment_length_blastn, d_number_of_mismatches_blastn, d_number_of_gap_openings_blastn, d_sequence_start_blastn, d_sequence_end_blastn, d_germline_start_blastn, d_germline_end_blastn, d_support_blastn, d_score_blastn, d_sequence_alignment_blastn, d_germline_alignment_blastn, d_call_igblastn, d_source, d_support_igblastn, d_score_igblastn, v_call_genotyped, germline_alignment_d_mask, sample_id, c_call, c_sequence_alignment, c_germline_alignment, c_sequence_start, c_sequence_end, c_score, c_identity, c_call_10x, junction_aa_length, fwr1_aa, fwr2_aa, fwr3_aa, fwr4_aa, cdr1_aa, cdr2_aa, cdr3_aa, sequence_alignment_aa, v_sequence_alignment_aa, d_sequence_alignment_aa, j_sequence_alignment_aa, complete_vdj, j_call_multimappers, j_call_multiplicity, j_call_sequence_start_multimappers, j_call_sequence_end_multimappers, j_call_support_multimappers, mu_count, extra, ambiguous, rearrangement_status, clone_id
    metadata: cell_id, clone_id, clone_id_rank, sample_id, productive_VDJ, productive_VJ, d_call_VDJ, j_call_VDJ, j_call_VJ, junction_VDJ, junction_VJ, junction_aa_VDJ, junction_aa_VJ, locus_VDJ, locus_VJ, v_call_VDJ, v_call_VJ, c_call_VDJ, c_call_VJ, umi_count_VDJ, umi_count_VJ, productive_VDJ_main, productive_VJ_main, d_call_VDJ_main, j_call_VDJ_main, j_call_VJ_main, junction_VDJ_main, junction_VJ_main, junction_aa_VDJ_main, junction_aa_VJ_main, locus_VDJ_main, locus_VJ_main, v_call_genotyped_VDJ_main, v_call_genotyped_VJ_main, c_call_VDJ_main, c_call_VJ_main, umi_count_VDJ_main, umi_count_VJ_main, isotype, isotype_main, isotype_status, locus_status, chain_status, rearrangement_status_VDJ, rearrangement_status_VJ
    layout: layout for 1 vertices, layout for 0 vertices
    graph: networkx graph of 1 vertices, networkx graph of 0 vertices

copy

You can deep copy the DandelionPolars object to another variable which will inherit all slots:

[8]:
vdj2 = vdj.copy()
vdj2.metadata.collect()
[8]:
shape: (2_496, 45)
cell_idclone_idclone_id_ranksample_idproductive_VDJproductive_VJd_call_VDJj_call_VDJj_call_VJjunction_VDJjunction_VJjunction_aa_VDJjunction_aa_VJlocus_VDJlocus_VJv_call_VDJv_call_VJc_call_VDJc_call_VJumi_count_VDJumi_count_VJproductive_VDJ_mainproductive_VJ_maind_call_VDJ_mainj_call_VDJ_mainj_call_VJ_mainjunction_VDJ_mainjunction_VJ_mainjunction_aa_VDJ_mainjunction_aa_VJ_mainlocus_VDJ_mainlocus_VJ_mainv_call_genotyped_VDJ_mainv_call_genotyped_VJ_mainc_call_VDJ_mainc_call_VJ_mainumi_count_VDJ_mainumi_count_VJ_mainisotypeisotype_mainisotype_statuslocus_statuschain_statusrearrangement_status_VDJrearrangement_status_VJ
strstrcatstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstri64i64strstrstrstrstrstrstrstrstrstrstrstrstrstrstrf64f64strstrstrstrstrstrstr
"sc5p_v2_hs_PBMC_10k_b_AAACCTGT…"B_VJ_36_2_3""1286""sc5p_v2_hs_PBMC_10k_b"null"T"nullnull"IGKJ4"null"TGTCAACAATATGACGAACTTCCCGTCACT…null"CQQYDELPVTF"null"IGK"null"IGKV1-33*01,IGKV1D-33*01"null"IGKC"068null"T"nullnull"IGKJ4"null"TGTCAACAATATGACGAACTTCCCGTCACT…null"CQQYDELPVTF"null"IGK"null"IGKV1-33*01,IGKV1D-33*01"null"IGKC"null68.0nullnullnull"Orphan IGK""Orphan VJ"null"Standard"
"sc5p_v2_hs_PBMC_10k_b_AAACCTGT…"B_VDJ_42_3_1_VJ_59_2_1""1983""sc5p_v2_hs_PBMC_10k_b""T""T""IGHD3-22""IGHJ3""IGKJ1""TGTGCGACTACGTATTACTATGATAGTAGT…"TGTCAACAGTATTATAGTTACCCTCGGACG…"CATTYYYDSSGYYQNDAFDIW""CQQYYSYPRTF""IGH""IGK""IGHV1-69*01,IGHV1-69D*01""IGKV1-8*01""IGHM""IGKC"5143"T""T""IGHD3-22""IGHJ3""IGKJ1""TGTGCGACTACGTATTACTATGATAGTAGT…"TGTCAACAGTATTATAGTTACCCTCGGACG…"CATTYYYDSSGYYQNDAFDIW""CQQYYSYPRTF""IGH""IGK""IGHV1-69*01,IGHV1-69D*01""IGKV1-8*01""IGHM""IGKC"51.043.0"IgM""IgM""IgM""IGH + IGK""Single pair""Standard""Standard"
"sc5p_v2_hs_PBMC_10k_b_AAACCTGT…"B_VDJ_9_1_2_VJ_253_1_1""1406""sc5p_v2_hs_PBMC_10k_b""T""T"null"IGHJ3""IGLJ3""TGTGCGAGAGAGATAGAGGGGGACGGTGTT…"TGTATGATTTGGCACAGCAGCGCTTGGGTG…"CAREIEGDGVFEIW""CMIWHSSAWVV""IGH""IGL""IGHV1-2*02""IGLV5-45*02""IGHM""IGLC3"4790"T""T"null"IGHJ3""IGLJ3""TGTGCGAGAGAGATAGAGGGGGACGGTGTT…"TGTATGATTTGGCACAGCAGCGCTTGGGTG…"CAREIEGDGVFEIW""CMIWHSSAWVV""IGH""IGL""IGHV1-2*02""IGLV5-45*02""IGHM""IGLC3"47.090.0"IgM""IgM""IgM""IGH + IGL""Single pair""Standard""Standard"
"sc5p_v2_hs_PBMC_10k_b_AAACCTGT…"B_VDJ_246_4_2_VJ_82_1_1""2175""sc5p_v2_hs_PBMC_10k_b""T""T"null"IGHJ3""IGKJ2""TGTGCGAGACATATCCGTGGGAACAGATTT…"TGTCAACAGTATTATAGTTTCCCGTACACT…"CARHIRGNRFGNDAFDIW""CQQYYSFPYTF""IGH""IGK""IGHV5-51*03""IGKV1D-8*01""IGHM""IGKC"8022"T""T"null"IGHJ3""IGKJ2""TGTGCGAGACATATCCGTGGGAACAGATTT…"TGTCAACAGTATTATAGTTTCCCGTACACT…"CARHIRGNRFGNDAFDIW""CQQYYSFPYTF""IGH""IGK""IGHV5-51*03""IGKV1D-8*01""IGHM""IGKC"80.022.0"IgM""IgM""IgM""IGH + IGK""Single pair""Standard""Standard"
"sc5p_v2_hs_PBMC_10k_b_AAACGGGA…"B_VDJ_217_2_1_VJ_222_2_1""234""sc5p_v2_hs_PBMC_10k_b""T""T""IGHD6-13""IGHJ3""IGLJ2""TGTGCGAGAGTAGGCTATAGAGCAGCAGCT…"TGTAACTCCCGGGACAGCAGTGGTAACCAT…"CARVGYRAAAGTDAFDIW""CNSRDSSGNHVVF""IGH""IGL""IGHV4-4*07""IGLV3-19*01""IGHM""IGLC"1814"T""T""IGHD6-13""IGHJ3""IGLJ2""TGTGCGAGAGTAGGCTATAGAGCAGCAGCT…"TGTAACTCCCGGGACAGCAGTGGTAACCAT…"CARVGYRAAAGTDAFDIW""CNSRDSSGNHVVF""IGH""IGL""IGHV4-4*07""IGLV3-19*01""IGHM""IGLC"18.014.0"IgM""IgM""IgM""IGH + IGL""Single pair""Standard""Standard"
"vdj_v1_hs_pbmc3_b_TTTCCTCAGCGC…"B_VDJ_109_5_5_VJ_98_1_1""1638""vdj_v1_hs_pbmc3_b""T""T""IGHD4-17""IGHJ6""IGKJ2""TGTGCGAAAGCCGCCTACGGTGAGGGGCTC…"TGCATGCAAGGTACACACTGGCCGTACACT…"CAKAAYGEGLRYYYYGMDVW""CMQGTHWPYTF""IGH""IGK""IGHV3-30*18""IGKV2-30*01""IGHM""IGKC"1128"T""T""IGHD4-17""IGHJ6""IGKJ2""TGTGCGAAAGCCGCCTACGGTGAGGGGCTC…"TGCATGCAAGGTACACACTGGCCGTACACT…"CAKAAYGEGLRYYYYGMDVW""CMQGTHWPYTF""IGH""IGK""IGHV3-30*18""IGKV2-30*01""IGHM""IGKC"11.028.0"IgM""IgM""IgM""IGH + IGK""Single pair""Standard""Standard"
"vdj_v1_hs_pbmc3_b_TTTCCTCAGGGA…"B_VDJ_232_2_1_VJ_39_4_1""238""vdj_v1_hs_pbmc3_b""T""T""IGHD6-13""IGHJ2""IGKJ1""TGTGCGAGACCCCGTATAGCAGGATCTGGG…"TGTCAACAGAGTTACAGTACCCCGTGGACG…"CARPRIAGSGWYFDLW""CQQSYSTPWTF""IGH""IGK""IGHV4-61*12""IGKV1-39*01,IGKV1D-39*01""IGHM""IGKC"14159"T""T""IGHD6-13""IGHJ2""IGKJ1""TGTGCGAGACCCCGTATAGCAGGATCTGGG…"TGTCAACAGAGTTACAGTACCCCGTGGACG…"CARPRIAGSGWYFDLW""CQQSYSTPWTF""IGH""IGK""IGHV4-61*12""IGKV1-39*01,IGKV1D-39*01""IGHM""IGKC"14.0159.0"IgM""IgM""IgM""IGH + IGK""Single pair""Standard""Standard"
"vdj_v1_hs_pbmc3_b_TTTCCTCTCGAC…"B_VDJ_33_5_1_VJ_41_1_1""1887""vdj_v1_hs_pbmc3_b""T""T""IGHD2-15""IGHJ5""IGKJ2""TGTGCGAGAGAGGGATATTGTAGTGGTGGT…"TGTCAACAGAGTTACAGTACCCCTCGGACT…"CAREGYCSGGSCYSPDPNNGWFDPW""CQQSYSTPRTF""IGH""IGK""IGHV1-46*01""IGKV1-39*01,IGKV1D-39*01""IGHM""IGKC"2835"T""T""IGHD2-15""IGHJ5""IGKJ2""TGTGCGAGAGAGGGATATTGTAGTGGTGGT…"TGTCAACAGAGTTACAGTACCCCTCGGACT…"CAREGYCSGGSCYSPDPNNGWFDPW""CQQSYSTPRTF""IGH""IGK""IGHV1-46*01""IGKV1-39*01,IGKV1D-39*01""IGHM""IGKC"28.035.0"IgM""IgM""IgM""IGH + IGK""Single pair""Standard""Standard"
"vdj_v1_hs_pbmc3_b_TTTGCGCCATAC…"B_VDJ_45_4_2_VJ_181_3_1""953""vdj_v1_hs_pbmc3_b""T""T""IGHD2-15""IGHJ6""IGLJ3""TGTGCGAGATCTCTGGATATTGTAGTGGTG…"TGTGCAGCATGGGATGACAGCCTGAGTGGT…"CARSLDIVVVVALYYYYGMDVW""CAAWDDSLSGWVF""IGH""IGL""IGHV1-69*01,IGHV1-69D*01""IGLV1-47*01""IGHM""IGLC3"3228"T""T""IGHD2-15""IGHJ6""IGLJ3""TGTGCGAGATCTCTGGATATTGTAGTGGTG…"TGTGCAGCATGGGATGACAGCCTGAGTGGT…"CARSLDIVVVVALYYYYGMDVW""CAAWDDSLSGWVF""IGH""IGL""IGHV1-69*01,IGHV1-69D*01""IGLV1-47*01""IGHM""IGLC3"32.028.0"IgM""IgM""IgM""IGH + IGL""Single pair""Standard""Standard"
"vdj_v1_hs_pbmc3_b_TTTGGTTGTAGG…"B_VDJ_94_6_3_VJ_190_1_1""1392""vdj_v1_hs_pbmc3_b""T""T"null"IGHJ4""IGLJ2""TGTGCGGGGAGTCGGTGGTTATATTCTTTT…"TGCTGCTCATATGCAGGCAGCTACACTGTG…"CAGSRWLYSFDYW""CCSYAGSYTVFF""IGH""IGL""IGHV3-23*01,IGHV3-23D*01""IGLV2-11*01""IGHM""IGLC"2236"T""T"null"IGHJ4""IGLJ2""TGTGCGGGGAGTCGGTGGTTATATTCTTTT…"TGCTGCTCATATGCAGGCAGCTACACTGTG…"CAGSRWLYSFDYW""CCSYAGSYTVFF""IGH""IGL""IGHV3-23*01,IGHV3-23D*01""IGLV2-11*01""IGHM""IGLC"22.036.0"IgM""IgM""IgM""IGH + IGL""Single pair""Standard""Standard"

Retrieving entries with update_metadata

The .metadata slot in DandelionPolars class automatically initializes whenever the .data slot is filled. However, it only returns a standard number of columns that are pre-specified. To retrieve other columns from the .data slot, we can update the metadata with ddl.update_metadata and specify the options retrieve and retrieve_mode.

The following modes determine how the retrieval is completed:

split and unique only - splits the retrieval into VDJ and VJ chains. A | will separate unique element.

split and merge - splits the retrieval into VDJ and VJ chains. A | will separate every element.

merge and unique only - smiliar to above but merged into a single column.

split - split retrieval into individual columns for each contig.

merge - merge retrieval into a single column where a | will separate every element.

For numerical columns, there’s additional options:

split and sum - splits the retrieval into VDJ and VJ chains and sum separately.

split and average - smiliar to above but average instead of sum.

sum - sum the retrievals into a single column.

average - averages the retrievals into a single column.

If retrieve_mode is not specified, it will default to split and merge

Example: retrieving fwr1 sequences

[9]:
vdj.update_metadata(retrieve="fwr1")
vdj
[9]:
Lazy Dandelion object with n_obs = 2496 and n_contigs = 5767
    data: sequence_id, sequence, rev_comp, productive, v_call, d_call, j_call, sequence_alignment, germline_alignment, junction, junction_aa, v_cigar, d_cigar, j_cigar, stop_codon, vj_in_frame, locus, junction_length, np1_length, np2_length, v_sequence_start, v_sequence_end, v_germline_start, v_germline_end, d_sequence_start, d_sequence_end, d_germline_start, d_germline_end, j_sequence_start, j_sequence_end, j_germline_start, j_germline_end, v_score, v_identity, v_support, d_score, d_identity, d_support, j_score, j_identity, j_support, fwr1, fwr2, fwr3, fwr4, cdr1, cdr2, cdr3, cell_id, consensus_count, umi_count, v_call_10x, d_call_10x, j_call_10x, junction_10x, junction_10x_aa, j_call_blastn, j_identity_blastn, j_alignment_length_blastn, j_number_of_mismatches_blastn, j_number_of_gap_openings_blastn, j_sequence_start_blastn, j_sequence_end_blastn, j_germline_start_blastn, j_germline_end_blastn, j_support_blastn, j_score_blastn, j_sequence_alignment_blastn, j_germline_alignment_blastn, j_call_igblastn, j_source, j_support_igblastn, j_score_igblastn, d_call_blastn, d_identity_blastn, d_alignment_length_blastn, d_number_of_mismatches_blastn, d_number_of_gap_openings_blastn, d_sequence_start_blastn, d_sequence_end_blastn, d_germline_start_blastn, d_germline_end_blastn, d_support_blastn, d_score_blastn, d_sequence_alignment_blastn, d_germline_alignment_blastn, d_call_igblastn, d_source, d_support_igblastn, d_score_igblastn, v_call_genotyped, germline_alignment_d_mask, sample_id, c_call, c_sequence_alignment, c_germline_alignment, c_sequence_start, c_sequence_end, c_score, c_identity, c_call_10x, junction_aa_length, fwr1_aa, fwr2_aa, fwr3_aa, fwr4_aa, cdr1_aa, cdr2_aa, cdr3_aa, sequence_alignment_aa, v_sequence_alignment_aa, d_sequence_alignment_aa, j_sequence_alignment_aa, complete_vdj, j_call_multimappers, j_call_multiplicity, j_call_sequence_start_multimappers, j_call_sequence_end_multimappers, j_call_support_multimappers, mu_count, extra, ambiguous, rearrangement_status, clone_id
    metadata: cell_id, clone_id, clone_id_rank, sample_id, productive_VDJ, productive_VJ, d_call_VDJ, j_call_VDJ, j_call_VJ, junction_VDJ, junction_VJ, junction_aa_VDJ, junction_aa_VJ, locus_VDJ, locus_VJ, v_call_VDJ, v_call_VJ, c_call_VDJ, c_call_VJ, umi_count_VDJ, umi_count_VJ, productive_VDJ_main, productive_VJ_main, d_call_VDJ_main, j_call_VDJ_main, j_call_VJ_main, junction_VDJ_main, junction_VJ_main, junction_aa_VDJ_main, junction_aa_VJ_main, locus_VDJ_main, locus_VJ_main, v_call_genotyped_VDJ_main, v_call_genotyped_VJ_main, c_call_VDJ_main, c_call_VJ_main, umi_count_VDJ_main, umi_count_VJ_main, isotype, isotype_main, isotype_status, locus_status, chain_status, rearrangement_status_VDJ, rearrangement_status_VJ, fwr1_VDJ, fwr1_VJ
    layout: layout for 2351 vertices, layout for 148 vertices
    graph: networkx graph of 2351 vertices, networkx graph of 148 vertices

Note the additional fwr1 VDJ and VJ columns in the metadata slot.

By default, dandelion will not try to merge numerical columns as it can create mixed dtype columns.

There is a new sub-function that will try and retrieve frequently used columns such as np1_length, np2_length:

[10]:
vdj.update_plus()
vdj
[10]:
Lazy Dandelion object with n_obs = 2496 and n_contigs = 5767
    data: sequence_id, sequence, rev_comp, productive, v_call, d_call, j_call, sequence_alignment, germline_alignment, junction, junction_aa, v_cigar, d_cigar, j_cigar, stop_codon, vj_in_frame, locus, junction_length, np1_length, np2_length, v_sequence_start, v_sequence_end, v_germline_start, v_germline_end, d_sequence_start, d_sequence_end, d_germline_start, d_germline_end, j_sequence_start, j_sequence_end, j_germline_start, j_germline_end, v_score, v_identity, v_support, d_score, d_identity, d_support, j_score, j_identity, j_support, fwr1, fwr2, fwr3, fwr4, cdr1, cdr2, cdr3, cell_id, consensus_count, umi_count, v_call_10x, d_call_10x, j_call_10x, junction_10x, junction_10x_aa, j_call_blastn, j_identity_blastn, j_alignment_length_blastn, j_number_of_mismatches_blastn, j_number_of_gap_openings_blastn, j_sequence_start_blastn, j_sequence_end_blastn, j_germline_start_blastn, j_germline_end_blastn, j_support_blastn, j_score_blastn, j_sequence_alignment_blastn, j_germline_alignment_blastn, j_call_igblastn, j_source, j_support_igblastn, j_score_igblastn, d_call_blastn, d_identity_blastn, d_alignment_length_blastn, d_number_of_mismatches_blastn, d_number_of_gap_openings_blastn, d_sequence_start_blastn, d_sequence_end_blastn, d_germline_start_blastn, d_germline_end_blastn, d_support_blastn, d_score_blastn, d_sequence_alignment_blastn, d_germline_alignment_blastn, d_call_igblastn, d_source, d_support_igblastn, d_score_igblastn, v_call_genotyped, germline_alignment_d_mask, sample_id, c_call, c_sequence_alignment, c_germline_alignment, c_sequence_start, c_sequence_end, c_score, c_identity, c_call_10x, junction_aa_length, fwr1_aa, fwr2_aa, fwr3_aa, fwr4_aa, cdr1_aa, cdr2_aa, cdr3_aa, sequence_alignment_aa, v_sequence_alignment_aa, d_sequence_alignment_aa, j_sequence_alignment_aa, complete_vdj, j_call_multimappers, j_call_multiplicity, j_call_sequence_start_multimappers, j_call_sequence_end_multimappers, j_call_support_multimappers, mu_count, extra, ambiguous, rearrangement_status, clone_id
    metadata: cell_id, clone_id, clone_id_rank, sample_id, productive_VDJ, productive_VJ, d_call_VDJ, j_call_VDJ, j_call_VJ, junction_VDJ, junction_VJ, junction_aa_VDJ, junction_aa_VJ, locus_VDJ, locus_VJ, v_call_VDJ, v_call_VJ, c_call_VDJ, c_call_VJ, umi_count_VDJ, umi_count_VJ, productive_VDJ_main, productive_VJ_main, d_call_VDJ_main, j_call_VDJ_main, j_call_VJ_main, junction_VDJ_main, junction_VJ_main, junction_aa_VDJ_main, junction_aa_VJ_main, locus_VDJ_main, locus_VJ_main, v_call_genotyped_VDJ_main, v_call_genotyped_VJ_main, c_call_VDJ_main, c_call_VJ_main, umi_count_VDJ_main, umi_count_VJ_main, isotype, isotype_main, isotype_status, locus_status, chain_status, rearrangement_status_VDJ, rearrangement_status_VJ, fwr1_VDJ, fwr1_VJ, mu_count_VDJ, mu_count_VJ, mu_count, junction_length_VDJ, junction_length_VJ, junction_aa_length_VDJ, junction_aa_length_VJ, np1_length_VDJ, np1_length_VJ, np2_length_VDJ, np2_length_VJ
    layout: layout for 2351 vertices, layout for 148 vertices
    graph: networkx graph of 2351 vertices, networkx graph of 148 vertices

Renaming barcodes

You can now use a simple function to rename the barcodes (both sequence and cell ids at the same time). This is useful for when you want to rename the barcodes to a more meaningful name. This only works on the indices that were initially used to create the DandelionPolars object. So if you have run the function once already, it doesn’t continuously add the prefix/suffix to the new indices. It just updates based on the original indices.

[11]:
print(vdj.data.collect()[["sequence_id", "cell_id"]]), print(vdj.metadata_names)
shape: (5_767, 2)
┌─────────────────────────────────┬─────────────────────────────────┐
│ sequence_id                     ┆ cell_id                         │
│ ---                             ┆ ---                             │
│ str                             ┆ str                             │
╞═════════════════════════════════╪═════════════════════════════════╡
│ sc5p_v2_hs_PBMC_10k_b_AAACCTGT… ┆ sc5p_v2_hs_PBMC_10k_b_AAACCTGT… │
│ sc5p_v2_hs_PBMC_10k_b_AAACCTGT… ┆ sc5p_v2_hs_PBMC_10k_b_AAACCTGT… │
│ sc5p_v2_hs_PBMC_10k_b_AAACCTGT… ┆ sc5p_v2_hs_PBMC_10k_b_AAACCTGT… │
│ sc5p_v2_hs_PBMC_10k_b_AAACCTGT… ┆ sc5p_v2_hs_PBMC_10k_b_AAACCTGT… │
│ sc5p_v2_hs_PBMC_10k_b_AAACCTGT… ┆ sc5p_v2_hs_PBMC_10k_b_AAACCTGT… │
│ …                               ┆ …                               │
│ vdj_v1_hs_pbmc3_b_TTTCCTCTCGAC… ┆ vdj_v1_hs_pbmc3_b_TTTCCTCTCGAC… │
│ vdj_v1_hs_pbmc3_b_TTTGCGCCATAC… ┆ vdj_v1_hs_pbmc3_b_TTTGCGCCATAC… │
│ vdj_v1_hs_pbmc3_b_TTTGCGCCATAC… ┆ vdj_v1_hs_pbmc3_b_TTTGCGCCATAC… │
│ vdj_v1_hs_pbmc3_b_TTTGGTTGTAGG… ┆ vdj_v1_hs_pbmc3_b_TTTGGTTGTAGG… │
│ vdj_v1_hs_pbmc3_b_TTTGGTTGTAGG… ┆ vdj_v1_hs_pbmc3_b_TTTGGTTGTAGG… │
└─────────────────────────────────┴─────────────────────────────────┘
shape: (2_496,)
Series: 'cell_id' [str]
[
        "sc5p_v2_hs_PBMC_10k_b_GACGTGCT…
        "sc5p_v2_hs_PBMC_10k_b_CTAACTTA…
        "sc5p_v2_hs_PBMC_10k_b_GGGCATCA…
        "sc5p_v2_hs_PBMC_10k_b_GGGTTGCG…
        "sc5p_v2_hs_PBMC_10k_b_GTACTCCG…
        …
        "sc5p_v2_hs_PBMC_10k_b_AGTGTCAC…
        "sc5p_v2_hs_PBMC_10k_b_AACTGGTT…
        "sc5p_v2_hs_PBMC_10k_b_AGTGTCAC…
        "sc5p_v2_hs_PBMC_10k_b_CCACGGAA…
        "sc5p_v2_hs_PBMC_10k_b_CGACCTTG…
]
[11]:
(None, None)
[12]:
# let's add a 'test-' as a prefix. There's also the suffix option
vdj.add_sequence_prefix("test", sep="-")
print(vdj.data.collect()[["sequence_id", "cell_id"]]), print(vdj.metadata_names)
shape: (5_767, 2)
┌─────────────────────────────────┬─────────────────────────────────┐
│ sequence_id                     ┆ cell_id                         │
│ ---                             ┆ ---                             │
│ str                             ┆ str                             │
╞═════════════════════════════════╪═════════════════════════════════╡
│ test-sc5p_v2_hs_PBMC_10k_b_AAA… ┆ test-sc5p_v2_hs_PBMC_10k_b_AAA… │
│ test-sc5p_v2_hs_PBMC_10k_b_AAA… ┆ test-sc5p_v2_hs_PBMC_10k_b_AAA… │
│ test-sc5p_v2_hs_PBMC_10k_b_AAA… ┆ test-sc5p_v2_hs_PBMC_10k_b_AAA… │
│ test-sc5p_v2_hs_PBMC_10k_b_AAA… ┆ test-sc5p_v2_hs_PBMC_10k_b_AAA… │
│ test-sc5p_v2_hs_PBMC_10k_b_AAA… ┆ test-sc5p_v2_hs_PBMC_10k_b_AAA… │
│ …                               ┆ …                               │
│ test-vdj_v1_hs_pbmc3_b_TTTCCTC… ┆ test-vdj_v1_hs_pbmc3_b_TTTCCTC… │
│ test-vdj_v1_hs_pbmc3_b_TTTGCGC… ┆ test-vdj_v1_hs_pbmc3_b_TTTGCGC… │
│ test-vdj_v1_hs_pbmc3_b_TTTGCGC… ┆ test-vdj_v1_hs_pbmc3_b_TTTGCGC… │
│ test-vdj_v1_hs_pbmc3_b_TTTGGTT… ┆ test-vdj_v1_hs_pbmc3_b_TTTGGTT… │
│ test-vdj_v1_hs_pbmc3_b_TTTGGTT… ┆ test-vdj_v1_hs_pbmc3_b_TTTGGTT… │
└─────────────────────────────────┴─────────────────────────────────┘
shape: (2_496,)
Series: 'cell_id' [str]
[
        "test-sc5p_v2_hs_PBMC_10k_b_AAA…
        "test-sc5p_v2_hs_PBMC_10k_b_AAA…
        "test-sc5p_v2_hs_PBMC_10k_b_AAA…
        "test-sc5p_v2_hs_PBMC_10k_b_AAA…
        "test-sc5p_v2_hs_PBMC_10k_b_AAA…
        …
        "test-vdj_v1_hs_pbmc3_b_TTTCCTC…
        "test-vdj_v1_hs_pbmc3_b_TTTCCTC…
        "test-vdj_v1_hs_pbmc3_b_TTTCCTC…
        "test-vdj_v1_hs_pbmc3_b_TTTGCGC…
        "test-vdj_v1_hs_pbmc3_b_TTTGGTT…
]
[12]:
(None, None)
[13]:
len(vdj._original_cell_ids.unique())
[13]:
2496
[14]:
vdj.metadata_names
[14]:
shape: (2_496,)
cell_id
str
"test-sc5p_v2_hs_PBMC_10k_b_AAA…
"test-sc5p_v2_hs_PBMC_10k_b_AAA…
"test-sc5p_v2_hs_PBMC_10k_b_AAA…
"test-sc5p_v2_hs_PBMC_10k_b_AAA…
"test-sc5p_v2_hs_PBMC_10k_b_AAA…
"test-vdj_v1_hs_pbmc3_b_TTTCCTC…
"test-vdj_v1_hs_pbmc3_b_TTTCCTC…
"test-vdj_v1_hs_pbmc3_b_TTTCCTC…
"test-vdj_v1_hs_pbmc3_b_TTTGCGC…
"test-vdj_v1_hs_pbmc3_b_TTTGGTT…
[15]:
vdj._original_cell_ids
[15]:
shape: (5_767,)
cell_id
str
"sc5p_v2_hs_PBMC_10k_b_AAACCTGT…
"sc5p_v2_hs_PBMC_10k_b_AAACCTGT…
"sc5p_v2_hs_PBMC_10k_b_AAACCTGT…
"sc5p_v2_hs_PBMC_10k_b_AAACCTGT…
"sc5p_v2_hs_PBMC_10k_b_AAACCTGT…
"vdj_v1_hs_pbmc3_b_TTTCCTCTCGAC…
"vdj_v1_hs_pbmc3_b_TTTGCGCCATAC…
"vdj_v1_hs_pbmc3_b_TTTGCGCCATAC…
"vdj_v1_hs_pbmc3_b_TTTGGTTGTAGG…
"vdj_v1_hs_pbmc3_b_TTTGGTTGTAGG…
[16]:
# same functionality as above
vdj.add_cell_prefix("test2", sep="_")
print(vdj.data.collect()[["sequence_id", "cell_id"]]), print(vdj.metadata_names)
shape: (5_767, 2)
┌─────────────────────────────────┬─────────────────────────────────┐
│ sequence_id                     ┆ cell_id                         │
│ ---                             ┆ ---                             │
│ str                             ┆ str                             │
╞═════════════════════════════════╪═════════════════════════════════╡
│ test2_sc5p_v2_hs_PBMC_10k_b_AA… ┆ test2_sc5p_v2_hs_PBMC_10k_b_AA… │
│ test2_sc5p_v2_hs_PBMC_10k_b_AA… ┆ test2_sc5p_v2_hs_PBMC_10k_b_AA… │
│ test2_sc5p_v2_hs_PBMC_10k_b_AA… ┆ test2_sc5p_v2_hs_PBMC_10k_b_AA… │
│ test2_sc5p_v2_hs_PBMC_10k_b_AA… ┆ test2_sc5p_v2_hs_PBMC_10k_b_AA… │
│ test2_sc5p_v2_hs_PBMC_10k_b_AA… ┆ test2_sc5p_v2_hs_PBMC_10k_b_AA… │
│ …                               ┆ …                               │
│ test2_vdj_v1_hs_pbmc3_b_TTTCCT… ┆ test2_vdj_v1_hs_pbmc3_b_TTTCCT… │
│ test2_vdj_v1_hs_pbmc3_b_TTTGCG… ┆ test2_vdj_v1_hs_pbmc3_b_TTTGCG… │
│ test2_vdj_v1_hs_pbmc3_b_TTTGCG… ┆ test2_vdj_v1_hs_pbmc3_b_TTTGCG… │
│ test2_vdj_v1_hs_pbmc3_b_TTTGGT… ┆ test2_vdj_v1_hs_pbmc3_b_TTTGGT… │
│ test2_vdj_v1_hs_pbmc3_b_TTTGGT… ┆ test2_vdj_v1_hs_pbmc3_b_TTTGGT… │
└─────────────────────────────────┴─────────────────────────────────┘
shape: (2_496,)
Series: 'cell_id' [str]
[
        "test2_sc5p_v2_hs_PBMC_10k_b_AA…
        "test2_sc5p_v2_hs_PBMC_10k_b_AA…
        "test2_sc5p_v2_hs_PBMC_10k_b_AA…
        "test2_sc5p_v2_hs_PBMC_10k_b_AA…
        "test2_sc5p_v2_hs_PBMC_10k_b_AA…
        …
        "test2_vdj_v1_hs_pbmc3_b_TTTCCT…
        "test2_vdj_v1_hs_pbmc3_b_TTTCCT…
        "test2_vdj_v1_hs_pbmc3_b_TTTCCT…
        "test2_vdj_v1_hs_pbmc3_b_TTTGCG…
        "test2_vdj_v1_hs_pbmc3_b_TTTGGT…
]
[16]:
(None, None)

Simplifying the V/DJ/C call annotations

Sometimes the V/DJ/C call annotations can be quite verbose. You can simplify them with the .simplify() function. This function will remove the , and only keep the first element of the call, as well as stripping alleles. This is useful for when you want to simplify the V/DJ/C calls for plotting purposes.

[17]:
(
    vdj.data.collect()[["v_call", "j_call"]],
    vdj.metadata.collect()[["v_call_VDJ", "j_call_VDJ"]],
)
[17]:
(shape: (5_767, 2)
 ┌──────────────────────────┬────────────────────────────┐
 │ v_call                   ┆ j_call                     │
 │ ---                      ┆ ---                        │
 │ str                      ┆ str                        │
 ╞══════════════════════════╪════════════════════════════╡
 │ IGKV1-33*01,IGKV1D-33*01 ┆ IGKJ4*01                   │
 │ IGHV1-69*01,IGHV1-69D*01 ┆ IGHJ3*02                   │
 │ IGKV1-8*01               ┆ IGKJ1*01                   │
 │ IGLV5-45*02              ┆ IGLJ3*02                   │
 │ IGHV1-2*02               ┆ IGHJ3*02                   │
 │ …                        ┆ …                          │
 │ IGHV1-46*01              ┆ IGHJ5*02                   │
 │ IGHV1-69*01,IGHV1-69D*01 ┆ IGHJ6*02                   │
 │ IGLV1-47*01              ┆ IGLJ3*02                   │
 │ IGLV2-11*01              ┆ IGLJ2*01,IGLJ3*01,IGLJ3*02 │
 │ IGHV3-23*01,IGHV3-23D*01 ┆ IGHJ4*02                   │
 └──────────────────────────┴────────────────────────────┘,
 shape: (2_496, 2)
 ┌──────────────────────────┬────────────┐
 │ v_call_VDJ               ┆ j_call_VDJ │
 │ ---                      ┆ ---        │
 │ str                      ┆ str        │
 ╞══════════════════════════╪════════════╡
 │ null                     ┆ null       │
 │ IGHV1-69*01,IGHV1-69D*01 ┆ IGHJ3      │
 │ IGHV1-2*02               ┆ IGHJ3      │
 │ IGHV5-51*03              ┆ IGHJ3      │
 │ IGHV4-4*07               ┆ IGHJ3      │
 │ …                        ┆ …          │
 │ IGHV3-30*18              ┆ IGHJ6      │
 │ IGHV4-61*12              ┆ IGHJ2      │
 │ IGHV1-46*01              ┆ IGHJ5      │
 │ IGHV1-69*01,IGHV1-69D*01 ┆ IGHJ6      │
 │ IGHV3-23*01,IGHV3-23D*01 ┆ IGHJ4      │
 └──────────────────────────┴────────────┘)
[18]:
# after
vdj.simplify()
(
    vdj.data.collect()[["v_call", "j_call"]],
    vdj.metadata.collect()[["v_call_VDJ", "j_call_VDJ"]],
)
[18]:
(shape: (5_767, 2)
 ┌──────────┬────────┐
 │ v_call   ┆ j_call │
 │ ---      ┆ ---    │
 │ str      ┆ str    │
 ╞══════════╪════════╡
 │ IGKV1-33 ┆ IGKJ4  │
 │ IGHV1-69 ┆ IGHJ3  │
 │ IGKV1-8  ┆ IGKJ1  │
 │ IGLV5-45 ┆ IGLJ3  │
 │ IGHV1-2  ┆ IGHJ3  │
 │ …        ┆ …      │
 │ IGHV1-46 ┆ IGHJ5  │
 │ IGHV1-69 ┆ IGHJ6  │
 │ IGLV1-47 ┆ IGLJ3  │
 │ IGLV2-11 ┆ IGLJ2  │
 │ IGHV3-23 ┆ IGHJ4  │
 └──────────┴────────┘,
 shape: (2_496, 2)
 ┌────────────┬────────────┐
 │ v_call_VDJ ┆ j_call_VDJ │
 │ ---        ┆ ---        │
 │ str        ┆ str        │
 ╞════════════╪════════════╡
 │ null       ┆ null       │
 │ IGHV1-69   ┆ IGHJ3      │
 │ IGHV1-2    ┆ IGHJ3      │
 │ IGHV5-51   ┆ IGHJ3      │
 │ IGHV4-4    ┆ IGHJ3      │
 │ …          ┆ …          │
 │ IGHV3-30   ┆ IGHJ6      │
 │ IGHV4-61   ┆ IGHJ2      │
 │ IGHV1-46   ┆ IGHJ5      │
 │ IGHV1-69   ┆ IGHJ6      │
 │ IGHV3-23   ┆ IGHJ4      │
 └────────────┴────────────┘)

concatenating multiple objects

This is a simple function to concatenate (append) two or more DandelionPolars class, or pandas dataframes. Note that this operates on the .data slot and not the .metadata slot.

[19]:
vdj
[19]:
Lazy Dandelion object with n_obs = 2496 and n_contigs = 5767
    data: sequence_id, sequence, rev_comp, productive, v_call, d_call, j_call, sequence_alignment, germline_alignment, junction, junction_aa, v_cigar, d_cigar, j_cigar, stop_codon, vj_in_frame, locus, junction_length, np1_length, np2_length, v_sequence_start, v_sequence_end, v_germline_start, v_germline_end, d_sequence_start, d_sequence_end, d_germline_start, d_germline_end, j_sequence_start, j_sequence_end, j_germline_start, j_germline_end, v_score, v_identity, v_support, d_score, d_identity, d_support, j_score, j_identity, j_support, fwr1, fwr2, fwr3, fwr4, cdr1, cdr2, cdr3, cell_id, consensus_count, umi_count, v_call_10x, d_call_10x, j_call_10x, junction_10x, junction_10x_aa, j_call_blastn, j_identity_blastn, j_alignment_length_blastn, j_number_of_mismatches_blastn, j_number_of_gap_openings_blastn, j_sequence_start_blastn, j_sequence_end_blastn, j_germline_start_blastn, j_germline_end_blastn, j_support_blastn, j_score_blastn, j_sequence_alignment_blastn, j_germline_alignment_blastn, j_call_igblastn, j_source, j_support_igblastn, j_score_igblastn, d_call_blastn, d_identity_blastn, d_alignment_length_blastn, d_number_of_mismatches_blastn, d_number_of_gap_openings_blastn, d_sequence_start_blastn, d_sequence_end_blastn, d_germline_start_blastn, d_germline_end_blastn, d_support_blastn, d_score_blastn, d_sequence_alignment_blastn, d_germline_alignment_blastn, d_call_igblastn, d_source, d_support_igblastn, d_score_igblastn, v_call_genotyped, germline_alignment_d_mask, sample_id, c_call, c_sequence_alignment, c_germline_alignment, c_sequence_start, c_sequence_end, c_score, c_identity, c_call_10x, junction_aa_length, fwr1_aa, fwr2_aa, fwr3_aa, fwr4_aa, cdr1_aa, cdr2_aa, cdr3_aa, sequence_alignment_aa, v_sequence_alignment_aa, d_sequence_alignment_aa, j_sequence_alignment_aa, complete_vdj, j_call_multimappers, j_call_multiplicity, j_call_sequence_start_multimappers, j_call_sequence_end_multimappers, j_call_support_multimappers, mu_count, extra, ambiguous, rearrangement_status, clone_id
    metadata: cell_id, clone_id, clone_id_rank, sample_id, productive_VDJ, productive_VJ, d_call_VDJ, j_call_VDJ, j_call_VJ, junction_VDJ, junction_VJ, junction_aa_VDJ, junction_aa_VJ, locus_VDJ, locus_VJ, v_call_VDJ, v_call_VJ, c_call_VDJ, c_call_VJ, umi_count_VDJ, umi_count_VJ, productive_VDJ_main, productive_VJ_main, d_call_VDJ_main, j_call_VDJ_main, j_call_VJ_main, junction_VDJ_main, junction_VJ_main, junction_aa_VDJ_main, junction_aa_VJ_main, locus_VDJ_main, locus_VJ_main, v_call_genotyped_VDJ_main, v_call_genotyped_VJ_main, c_call_VDJ_main, c_call_VJ_main, umi_count_VDJ_main, umi_count_VJ_main, isotype, isotype_main, isotype_status, locus_status, chain_status, rearrangement_status_VDJ, rearrangement_status_VJ
    layout: layout for 2351 vertices, layout for 148 vertices
    graph: networkx graph of 2351 vertices, networkx graph of 148 vertices
[20]:
# just simple concatenation x 3. check the difference between the cell and contig numbers between this object and just vdj
vdj_concat = ddl.tl.concat([vdj, vdj, vdj])
vdj_concat
[20]:
Lazy Dandelion object with n_obs = 2496 and n_contigs = 17301
    data: sequence_id, sequence, rev_comp, productive, v_call, d_call, j_call, sequence_alignment, germline_alignment, junction, junction_aa, v_cigar, d_cigar, j_cigar, stop_codon, vj_in_frame, locus, junction_length, np1_length, np2_length, v_sequence_start, v_sequence_end, v_germline_start, v_germline_end, d_sequence_start, d_sequence_end, d_germline_start, d_germline_end, j_sequence_start, j_sequence_end, j_germline_start, j_germline_end, v_score, v_identity, v_support, d_score, d_identity, d_support, j_score, j_identity, j_support, fwr1, fwr2, fwr3, fwr4, cdr1, cdr2, cdr3, cell_id, consensus_count, umi_count, v_call_10x, d_call_10x, j_call_10x, junction_10x, junction_10x_aa, j_call_blastn, j_identity_blastn, j_alignment_length_blastn, j_number_of_mismatches_blastn, j_number_of_gap_openings_blastn, j_sequence_start_blastn, j_sequence_end_blastn, j_germline_start_blastn, j_germline_end_blastn, j_support_blastn, j_score_blastn, j_sequence_alignment_blastn, j_germline_alignment_blastn, j_call_igblastn, j_source, j_support_igblastn, j_score_igblastn, d_call_blastn, d_identity_blastn, d_alignment_length_blastn, d_number_of_mismatches_blastn, d_number_of_gap_openings_blastn, d_sequence_start_blastn, d_sequence_end_blastn, d_germline_start_blastn, d_germline_end_blastn, d_support_blastn, d_score_blastn, d_sequence_alignment_blastn, d_germline_alignment_blastn, d_call_igblastn, d_source, d_support_igblastn, d_score_igblastn, v_call_genotyped, germline_alignment_d_mask, sample_id, c_call, c_sequence_alignment, c_germline_alignment, c_sequence_start, c_sequence_end, c_score, c_identity, c_call_10x, junction_aa_length, fwr1_aa, fwr2_aa, fwr3_aa, fwr4_aa, cdr1_aa, cdr2_aa, cdr3_aa, sequence_alignment_aa, v_sequence_alignment_aa, d_sequence_alignment_aa, j_sequence_alignment_aa, complete_vdj, j_call_multimappers, j_call_multiplicity, j_call_sequence_start_multimappers, j_call_sequence_end_multimappers, j_call_support_multimappers, mu_count, extra, ambiguous, rearrangement_status, clone_id
    metadata: cell_id, clone_id, clone_id_rank, sample_id, productive_VDJ, productive_VJ, d_call_VDJ, j_call_VDJ, j_call_VJ, junction_VDJ, junction_VJ, junction_aa_VDJ, junction_aa_VJ, locus_VDJ, locus_VJ, v_call_VDJ, v_call_VJ, c_call_VDJ, c_call_VJ, umi_count_VDJ, umi_count_VJ, productive_VDJ_main, productive_VJ_main, d_call_VDJ_main, j_call_VDJ_main, j_call_VJ_main, junction_VDJ_main, junction_VJ_main, junction_aa_VDJ_main, junction_aa_VJ_main, locus_VDJ_main, locus_VJ_main, v_call_genotyped_VDJ_main, v_call_genotyped_VJ_main, c_call_VDJ_main, c_call_VJ_main, umi_count_VDJ_main, umi_count_VJ_main, isotype, isotype_main, isotype_status, locus_status, chain_status, rearrangement_status_VDJ, rearrangement_status_VJ
[21]:
vdj_concat.data.collect()[["sequence_id", "cell_id"]].head()
[21]:
shape: (5, 2)
sequence_idcell_id
strstr
"sc5p_v2_hs_PBMC_10k_b_AAACCTGT…"test2_sc5p_v2_hs_PBMC_10k_b_AA…
"sc5p_v2_hs_PBMC_10k_b_AAACCTGT…"test2_sc5p_v2_hs_PBMC_10k_b_AA…
"sc5p_v2_hs_PBMC_10k_b_AAACCTGT…"test2_sc5p_v2_hs_PBMC_10k_b_AA…
"sc5p_v2_hs_PBMC_10k_b_AAACCTGT…"test2_sc5p_v2_hs_PBMC_10k_b_AA…
"sc5p_v2_hs_PBMC_10k_b_AAACCTGT…"test2_sc5p_v2_hs_PBMC_10k_b_AA…

ddl.concat also lets you add in your custom prefixes/suffixes to append to the sequence ids. If not provided, it will add -0, -1 etc. as a suffix if it detects that the sequence ids are not unique as seen above.

read/write

DandelionPolars supports multiple read/write formats. The primary format is .zipddl, which stores data as Parquet blobs inside a Zarr v3 ZipStore container with optional Blosc/Zstd compression. The legacy .h5ddl is still supported.

Format / Input

Read

Write

Notes

.zipddl

read() / read_ddl() / read_zipddl()

.write() / .write_ddl() / .write_zipddl()

Default format (Parquet + Zarr v3 + HDF5 hybrid). Use for all new workflows. Preserves all slots (data, metadata, distances, graph, layout, germline) with Blosc/Zstd compression (compress=True) by default.

.h5ddl

read_h5ddl()

.write_h5ddl()

Legacy HDF5 format. Use only when interoperating with files saved by the base Dandelion backend. A companion .zarr file (same stem) is picked up automatically for distances on read.

.tsv (AIRR)

read_airr()

.write_airr()

Use for standard AIRR-compliant files, including dandelion’s own preprocessing output (all_contig_dandelion.tsv). Write exports the contig table only, metadata, graph, and distances are not included.

.tsv (Cell Ranger ≥ V4)

read_10x_airr()

Use only for tab-separated AIRR rearrangement table shipped by Cell Ranger (airr_rearrangement.tsv). For all other AIRR files use read_airr().

folder / .csv / .json / DataFrame (10x / SeekGene)

read_10x_vdj() / read_seekgene_vdj()

.write_vdj() / .write_10x()

Use when starting directly from CellRanger output without prior dandelion preprocessing. Accepts a folder, file path, or DataFrame; merges extra fields from a companion .json or .fasta automatically. read_seekgene_vdj strips the _10x suffix from column names.

.tsv (BD Rhapsody)

read_bd_airr()

Use for tab-separated AIRR table (_AIRR.tsv) from BD Rhapsody.

.tsv (Parse Biosciences)

read_parse_airr()

Use for tab-separated annotation table (_annotation_airr.tsv) from Parse Biosciences Evercode. Parse-specific column names (cell_barcode, transcript_count, cdr3, etc.) are renamed to AIRR equivalents on read.

[22]:
%time vdj.write_zipddl('dandelion_results_test.zipddl')
CPU times: user 713 ms, sys: 198 ms, total: 911 ms
Wall time: 968 ms

If you see any warnings above, it’s due to mix dtypes somewhere in the object. So do some checking if you think it will interfere with downstream usage.

[23]:
%time vdj_1 = ddl.read_zipddl('dandelion_results_test.zipddl')
vdj_1
CPU times: user 246 ms, sys: 79.8 ms, total: 326 ms
Wall time: 352 ms
[23]:
Lazy Dandelion object with n_obs = 2496 and n_contigs = 5767
    data: sequence_id, sequence, rev_comp, productive, v_call, d_call, j_call, sequence_alignment, germline_alignment, junction, junction_aa, v_cigar, d_cigar, j_cigar, stop_codon, vj_in_frame, locus, junction_length, np1_length, np2_length, v_sequence_start, v_sequence_end, v_germline_start, v_germline_end, d_sequence_start, d_sequence_end, d_germline_start, d_germline_end, j_sequence_start, j_sequence_end, j_germline_start, j_germline_end, v_score, v_identity, v_support, d_score, d_identity, d_support, j_score, j_identity, j_support, fwr1, fwr2, fwr3, fwr4, cdr1, cdr2, cdr3, cell_id, consensus_count, umi_count, v_call_10x, d_call_10x, j_call_10x, junction_10x, junction_10x_aa, j_call_blastn, j_identity_blastn, j_alignment_length_blastn, j_number_of_mismatches_blastn, j_number_of_gap_openings_blastn, j_sequence_start_blastn, j_sequence_end_blastn, j_germline_start_blastn, j_germline_end_blastn, j_support_blastn, j_score_blastn, j_sequence_alignment_blastn, j_germline_alignment_blastn, j_call_igblastn, j_source, j_support_igblastn, j_score_igblastn, d_call_blastn, d_identity_blastn, d_alignment_length_blastn, d_number_of_mismatches_blastn, d_number_of_gap_openings_blastn, d_sequence_start_blastn, d_sequence_end_blastn, d_germline_start_blastn, d_germline_end_blastn, d_support_blastn, d_score_blastn, d_sequence_alignment_blastn, d_germline_alignment_blastn, d_call_igblastn, d_source, d_support_igblastn, d_score_igblastn, v_call_genotyped, germline_alignment_d_mask, sample_id, c_call, c_sequence_alignment, c_germline_alignment, c_sequence_start, c_sequence_end, c_score, c_identity, c_call_10x, junction_aa_length, fwr1_aa, fwr2_aa, fwr3_aa, fwr4_aa, cdr1_aa, cdr2_aa, cdr3_aa, sequence_alignment_aa, v_sequence_alignment_aa, d_sequence_alignment_aa, j_sequence_alignment_aa, complete_vdj, j_call_multimappers, j_call_multiplicity, j_call_sequence_start_multimappers, j_call_sequence_end_multimappers, j_call_support_multimappers, mu_count, extra, ambiguous, rearrangement_status, clone_id
    metadata: cell_id, clone_id, clone_id_rank, sample_id, productive_VDJ, productive_VJ, d_call_VDJ, j_call_VDJ, j_call_VJ, junction_VDJ, junction_VJ, junction_aa_VDJ, junction_aa_VJ, locus_VDJ, locus_VJ, v_call_VDJ, v_call_VJ, c_call_VDJ, c_call_VJ, umi_count_VDJ, umi_count_VJ, productive_VDJ_main, productive_VJ_main, d_call_VDJ_main, j_call_VDJ_main, j_call_VJ_main, junction_VDJ_main, junction_VJ_main, junction_aa_VDJ_main, junction_aa_VJ_main, locus_VDJ_main, locus_VJ_main, v_call_genotyped_VDJ_main, v_call_genotyped_VJ_main, c_call_VDJ_main, c_call_VJ_main, umi_count_VDJ_main, umi_count_VJ_main, isotype, isotype_main, isotype_status, locus_status, chain_status, rearrangement_status_VDJ, rearrangement_status_VJ
    layout: layout for 2351 vertices, layout for 148 vertices
    graph: networkx graph of 2351 vertices, networkx graph of 148 vertices

The legacy .write_h5ddl / read_h5ddl format is still supported and accepts compression options such as "gzip".

[24]:
%time vdj.write_h5ddl('dandelion_results_test.h5ddl', compression="gzip")
CPU times: user 5.37 s, sys: 202 ms, total: 5.57 s
Wall time: 5.75 s
[25]:
%time vdj_2 = ddl.read_h5ddl('dandelion_results_test.h5ddl')
vdj_2
CPU times: user 584 ms, sys: 76 ms, total: 660 ms
Wall time: 728 ms
[25]:
Lazy Dandelion object with n_obs = 2496 and n_contigs = 5767
    data: sequence_id, sequence, rev_comp, productive, v_call, d_call, j_call, sequence_alignment, germline_alignment, junction, junction_aa, v_cigar, d_cigar, j_cigar, stop_codon, vj_in_frame, locus, junction_length, np1_length, np2_length, v_sequence_start, v_sequence_end, v_germline_start, v_germline_end, d_sequence_start, d_sequence_end, d_germline_start, d_germline_end, j_sequence_start, j_sequence_end, j_germline_start, j_germline_end, v_score, v_identity, v_support, d_score, d_identity, d_support, j_score, j_identity, j_support, fwr1, fwr2, fwr3, fwr4, cdr1, cdr2, cdr3, cell_id, consensus_count, umi_count, v_call_10x, d_call_10x, j_call_10x, junction_10x, junction_10x_aa, j_call_blastn, j_identity_blastn, j_alignment_length_blastn, j_number_of_mismatches_blastn, j_number_of_gap_openings_blastn, j_sequence_start_blastn, j_sequence_end_blastn, j_germline_start_blastn, j_germline_end_blastn, j_support_blastn, j_score_blastn, j_sequence_alignment_blastn, j_germline_alignment_blastn, j_call_igblastn, j_source, j_support_igblastn, j_score_igblastn, d_call_blastn, d_identity_blastn, d_alignment_length_blastn, d_number_of_mismatches_blastn, d_number_of_gap_openings_blastn, d_sequence_start_blastn, d_sequence_end_blastn, d_germline_start_blastn, d_germline_end_blastn, d_support_blastn, d_score_blastn, d_sequence_alignment_blastn, d_germline_alignment_blastn, d_call_igblastn, d_source, d_support_igblastn, d_score_igblastn, v_call_genotyped, germline_alignment_d_mask, sample_id, c_call, c_sequence_alignment, c_germline_alignment, c_sequence_start, c_sequence_end, c_score, c_identity, c_call_10x, junction_aa_length, fwr1_aa, fwr2_aa, fwr3_aa, fwr4_aa, cdr1_aa, cdr2_aa, cdr3_aa, sequence_alignment_aa, v_sequence_alignment_aa, d_sequence_alignment_aa, j_sequence_alignment_aa, complete_vdj, j_call_multimappers, j_call_multiplicity, j_call_sequence_start_multimappers, j_call_sequence_end_multimappers, j_call_support_multimappers, mu_count, extra, ambiguous, rearrangement_status, clone_id
    metadata: cell_id, clone_id, clone_id_rank, sample_id, productive_VDJ, productive_VJ, d_call_VDJ, j_call_VDJ, j_call_VJ, junction_VDJ, junction_VJ, junction_aa_VDJ, junction_aa_VJ, locus_VDJ, locus_VJ, v_call_VDJ, v_call_VJ, c_call_VDJ, c_call_VJ, umi_count_VDJ, umi_count_VJ, productive_VDJ_main, productive_VJ_main, d_call_VDJ_main, j_call_VDJ_main, j_call_VJ_main, junction_VDJ_main, junction_VJ_main, junction_aa_VDJ_main, junction_aa_VJ_main, locus_VDJ_main, locus_VJ_main, v_call_genotyped_VDJ_main, v_call_genotyped_VJ_main, c_call_VDJ_main, c_call_VJ_main, umi_count_VDJ_main, umi_count_VJ_main, isotype, isotype_main, isotype_status, locus_status, chain_status, rearrangement_status_VDJ, rearrangement_status_VJ
    layout: layout for 2351 vertices, layout for 148 vertices
    graph: networkx graph of 2351 vertices, networkx graph of 148 vertices

There’s also other types of writing functions such as .write_airr and .write_10x, which will write the object to a .tsv or .csv file that is compatible with airr and 10x formats respectively. The use case for .write_10x is e.g. if you want to reannotate your VDJ data using the preprocessing workflow but your data is not actually from 10x’s platform. Note that .write_10x only writes the contig table and fasta files and does not include metadata, graph, or distances.

[26]:
import pandas as pd

vdj2.write_airr("test.airr.tsv")
df = pd.read_csv("test.airr.tsv", sep="\t")
df
[26]:
sequence_id sequence rev_comp productive v_call d_call j_call sequence_alignment germline_alignment junction ... j_call_multimappers j_call_multiplicity j_call_sequence_start_multimappers j_call_sequence_end_multimappers j_call_support_multimappers mu_count extra ambiguous rearrangement_status clone_id
0 sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG_contig_1 TGGGGAGGAGTCAGTCCCAACCAGGACACGGCCTGGACATGAGGGT... F T IGKV1-33*01,IGKV1D-33*01 NaN IGKJ4*01 GACATCCAGATGACCCAGTCTCCATCCTCCCTGTCTGCATCTGTGG... GACATCCAGATGACCCAGTCTCCATCCTCCCTGTCTGCATCTGTAG... TGTCAACAATATGACGAACTTCCCGTCACTTTC ... ["IGKJ4*01"] 1.0 [385] [412] [3.56e-09] 27.0 F F Standard B_VJ_36_2_3
1 sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_contig_2 ATCACATAACAACCACATTCCTCCTCTAAAGAAGCCCCTGGGAGCA... F T IGHV1-69*01,IGHV1-69D*01 IGHD3-22*01 IGHJ3*02 CAGGTGCAGCTGGTGCAGTCTGGGGCT...GAGGTGAAGAAGCCTG... CAGGTGCAGCTGGTGCAGTCTGGGGCT...GAGGTGAAGAAGCCTG... TGTGCGACTACGTATTACTATGATAGTAGTGGTTATTACCAGAATG... ... ["IGHJ3*02"] 1.0 [445] [494] [4.58e-23] 0.0 F F Standard B_VDJ_42_3_1_VJ_59_2_1
2 sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_contig_1 AGGAGTCAGACCCTGTCAGGACACAGCATAGACATGAGGGTCCCCG... F T IGKV1-8*01 NaN IGKJ1*01 GCCATCCGGATGACCCAGTCTCCATCCTCATTCTCTGCATCTACAG... GCCATCCGGATGACCCAGTCTCCATCCTCATTCTCTGCATCTACAG... TGTCAACAGTATTATAGTTACCCTCGGACGTTC ... ["IGKJ1*01"] 1.0 [380] [415] [2.7e-15] 0.0 F F Standard B_VDJ_42_3_1_VJ_59_2_1
3 sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_contig_1 ACTGTGGGGGTAAGAGGTTGTGTCCACCATGGCCTGGACTCCTCTC... F T IGLV5-45*02 NaN IGLJ3*02 CAGGCTGTGCTGACTCAGCCGTCTTCC...CTCTCTGCATCTCCTG... CAGGCTGTGCTGACTCAGCCGTCTTCC...CTCTCTGCATCTCCTG... TGTATGATTTGGCACAGCAGCGCTTGGGTGGTC ... ["IGLJ3*01"] 1.0 [402] [431] [6.84e-12] 8.0 F F Standard B_VDJ_9_1_2_VJ_253_1_1
4 sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_contig_2 GGGAGCATCACCCAGCAACCACATCTGTCCTCTAGAGAATCCCCTG... F T IGHV1-2*02 NaN IGHJ3*02 CAGGTGCAACTGGTGCAGTCTGGGGGT...GAGGTAAAGAAGCCTG... CAGGTGCAGCTGGTGCAGTCTGGGGCT...GAGGTGAAGAAGCCTG... TGTGCGAGAGAGATAGAGGGGGACGGTGTTTTTGAAATCTGG ... ["IGHJ3*02"] 1.0 [433] [479] [4.48e-18] 22.0 F F Standard B_VDJ_9_1_2_VJ_253_1_1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
5762 vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC_contig_1 ATCATCCAACAACCACATCCCTTCTCTACAGAAGCCTCTGAGAGGA... F T IGHV1-46*01 IGHD2-15*01 IGHJ5*02 CAGGTGCAGCTGGTGCAGTCTGGGGCT...GAGGTGAAGAAGCCTG... CAGGTGCAGCTGGTGCAGTCTGGGGCT...GAGGTGAAGAAGCCTG... TGTGCGAGAGAGGGATATTGTAGTGGTGGTAGCTGCTACTCCCCCG... ... ["IGHJ5*02"] 1.0 [461] [506] [7.83e-21] 0.0 F F Standard B_VDJ_33_5_1_VJ_41_1_1
5763 vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_2 ATCACATAACAACCACATTCCTCCTCTAAAGAAGCCCCTGGGAGCA... F T IGHV1-69*01,IGHV1-69D*01 IGHD2-15*01 IGHJ6*02 CAGGTGCAGCTGGTGCAGTCTGGGGCT...GAGGTGAAGAAGCCTG... CAGGTGCAGCTGGTGCAGTCTGGGGCT...GAGGTGAAGAAGCCTG... TGTGCGAGATCTCTGGATATTGTAGTGGTGGTAGCACTCTACTACT... ... ["IGHJ6*02"] 1.0 [439] [497] [4.57e-28] 0.0 F F Standard B_VDJ_45_4_2_VJ_181_3_1
5764 vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_1 AGCTTCAGCTGTGGTAGAGAAGACAGGATTCAGGACAATCTCCAGC... F T IGLV1-47*01 NaN IGLJ3*02 CAGTCTGTGCTGACTCAGCCACCCTCA...GCGTCTGGGACCCCCG... CAGTCTGTGCTGACTCAGCCACCCTCA...GCGTCTGGGACCCCCG... TGTGCAGCATGGGATGACAGCCTGAGTGGTTGGGTGTTC ... ["IGLJ3*02"] 1.0 [397] [434] [2.46e-16] 0.0 F F Standard B_VDJ_45_4_2_VJ_181_3_1
5765 vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_2 GGCTGGGGTCTCAGGAGGCAGCACTCTCGGGACGTCTCCACCATGG... F T IGLV2-11*01 NaN IGLJ2*01,IGLJ3*01,IGLJ3*02 CAGTCTGCCCTGACTCAGCCTCGCTCA...GTGTCCGGGTCTCCTG... CAGTCTGCCCTGACTCAGCCTCGCTCA...GTGTCCGGGTCTCCTG... TGCTGCTCATATGCAGGCAGCTACACTGTGTTTTTC ... ["IGLJ3*01"] 1.0 [393] [430] [2.46e-11] 4.0 F F Standard B_VDJ_94_6_3_VJ_190_1_1
5766 vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_1 AGCTCTGAGAGAGGAGCCCAGCCCTGGGATTTTCAGGTGTTTTCAT... F T IGHV3-23*01,IGHV3-23D*01 NaN IGHJ4*02 GAGGTGCAGGTGTTGGAGTCTGGGGGA...GGCTTGGAACAGCCTG... GAGGTGCAGCTGTTGGAGTCTGGGGGA...GGCTTGGTACAGCCTG... TGTGCGGGGAGTCGGTGGTTATATTCTTTTGACTACTGG ... ["IGHJ4*02"] 1.0 [449] [491] [1.65e-17] 8.0 F F Standard B_VDJ_94_6_3_VJ_190_1_1

5767 rows × 124 columns

[27]:
vdj2.write_10x(
    folder="10x_test",
    filename_prefix="all",
)  # this writes both the contig_annotations.csv and contig.fasta
df = pd.read_csv("10x_test/all_contig_annotations.csv")
df
/Users/uqztuong/Documents/GitHub/dandelion/src/dandelion/polars/core/_core.py:3496: PerformanceWarning: Determining the column names of a LazyFrame requires resolving its schema, which is a potentially expensive operation. Use `LazyFrame.collect_schema().names()` to get the column names without this warning.
/Users/uqztuong/Documents/GitHub/dandelion/src/dandelion/polars/core/_core.py:3500: PerformanceWarning: Determining the column names of a LazyFrame requires resolving its schema, which is a potentially expensive operation. Use `LazyFrame.collect_schema().names()` to get the column names without this warning.
/Users/uqztuong/Documents/GitHub/dandelion/src/dandelion/polars/core/_core.py:3511: PerformanceWarning: Determining the column names of a LazyFrame requires resolving its schema, which is a potentially expensive operation. Use `LazyFrame.collect_schema().names()` to get the column names without this warning.
[27]:
barcode contig_id length chain v_gene d_gene j_gene c_gene full_length productive cdr3 cdr3_nt reads umis raw_clonotype_id raw_consensus_id
0 sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG_contig_1 556 IGK IGKV1-33*01,IGKV1D-33*01 NaN IGKJ4*01 IGKC NaN True CQQYDELPVTF TGTCAACAATATGACGAACTTCCCGTCACTTTC 9139 68 B_VJ_36_2_3 B_VJ_36_2_3
1 sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_contig_2 565 IGH IGHV1-69*01,IGHV1-69D*01 IGHD3-22*01 IGHJ3*02 IGHM NaN True CATTYYYDSSGYYQNDAFDIW TGTGCGACTACGTATTACTATGATAGTAGTGGTTATTACCAGAATG... 4161 51 B_VDJ_42_3_1_VJ_59_2_1 B_VDJ_42_3_1_VJ_59_2_1
2 sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_contig_1 551 IGK IGKV1-8*01 NaN IGKJ1*01 IGKC NaN True CQQYYSYPRTF TGTCAACAGTATTATAGTTACCCTCGGACGTTC 5679 43 B_VDJ_42_3_1_VJ_59_2_1 B_VDJ_42_3_1_VJ_59_2_1
3 sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_contig_1 642 IGL IGLV5-45*02 NaN IGLJ3*02 IGLC3 NaN True CMIWHSSAWVV TGTATGATTTGGCACAGCAGCGCTTGGGTGGTC 13160 90 B_VDJ_9_1_2_VJ_253_1_1 B_VDJ_9_1_2_VJ_253_1_1
4 sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_contig_2 550 IGH IGHV1-2*02 NaN IGHJ3*02 IGHM NaN True CAREIEGDGVFEIW TGTGCGAGAGAGATAGAGGGGGACGGTGTTTTTGAAATCTGG 5080 47 B_VDJ_9_1_2_VJ_253_1_1 B_VDJ_9_1_2_VJ_253_1_1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
5762 vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC vdj_v1_hs_pbmc3_b_TTTCCTCTCGACAGCC_contig_1 577 IGH IGHV1-46*01 IGHD2-15*01 IGHJ5*02 IGHM NaN True CAREGYCSGGSCYSPDPNNGWFDPW TGTGCGAGAGAGGGATATTGTAGTGGTGGTAGCTGCTACTCCCCCG... 2960 28 B_VDJ_33_5_1_VJ_41_1_1 B_VDJ_33_5_1_VJ_41_1_1
5763 vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_2 568 IGH IGHV1-69*01,IGHV1-69D*01 IGHD2-15*01 IGHJ6*02 IGHM NaN True CARSLDIVVVVALYYYYGMDVW TGTGCGAGATCTCTGGATATTGTAGTGGTGGTAGCACTCTACTACT... 2464 32 B_VDJ_45_4_2_VJ_181_3_1 B_VDJ_45_4_2_VJ_181_3_1
5764 vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG vdj_v1_hs_pbmc3_b_TTTGCGCCATACCATG_contig_1 645 IGL IGLV1-47*01 NaN IGLJ3*02 IGLC3 NaN True CAAWDDSLSGWVF TGTGCAGCATGGGATGACAGCCTGAGTGGTTGGGTGTTC 2457 28 B_VDJ_45_4_2_VJ_181_3_1 B_VDJ_45_4_2_VJ_181_3_1
5765 vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_2 641 IGL IGLV2-11*01 NaN IGLJ2*01,IGLJ3*01,IGLJ3*02 IGLC NaN True CCSYAGSYTVFF TGCTGCTCATATGCAGGCAGCTACACTGTGTTTTTC 2744 36 B_VDJ_94_6_3_VJ_190_1_1 B_VDJ_94_6_3_VJ_190_1_1
5766 vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG vdj_v1_hs_pbmc3_b_TTTGGTTGTAGGCATG_contig_1 562 IGH IGHV3-23*01,IGHV3-23D*01 NaN IGHJ4*02 IGHM NaN True CAGSRWLYSFDYW TGTGCGGGGAGTCGGTGGTTATATTCTTTTGACTACTGG 1915 22 B_VDJ_94_6_3_VJ_190_1_1 B_VDJ_94_6_3_VJ_190_1_1

5767 rows × 16 columns