DandelionPolars class
Much of the functions and utility of the dandelion.polars backend revolves around the DandelionPolars class object. The class will act as an intermediary object for storage and flexible interaction with other tools. This section will run through a quick primer to the DandelionPolars class.
DandelionPolars Overview
DandelionPolars is the core data container for the dandelion.polars backend, replacing the original pandas-based Dandelion class. It displays as “Lazy Dandelion object” (when lazy=True, the default) or “Dandelion object” (when lazy=False).
Key changes:
dataacceptspl.LazyFrame,pl.DataFrame,pd.DataFrame, or a file path. Pandas DataFrames are automatically converted to Polars on input.lazy=True(default):.dataand.metadataare stored asLazyFrames with deferred query execution.lazy=False: stored as eagerDataFrames.
LazyFrame behavior
When .data is backed by a LazyFrame:
vdj.data.column_namereturnspl.col("column_name")(an expression), not a concrete Series.To get concrete data, call
vdj.data.collect()to materialize to a DataFrame.Methods like
write_csvare not available on LazyFrame — you must.collect()first.
vdj.data.collect().write_csv("output.tsv", separator="\t")
Backend Conversion
Here are some additional QoL functions for converting between lazy and eager modes, as well as to pandas if needed. Note that when in lazy mode, .data returns a LazyFrame and when in eager mode, it returns a DataFrame. When converting to pandas, .data will return a pd.DataFrame.
vdj.to_pandas()
# vdj.data now returns a pd.DataFrame
vdj.to_polars(lazy=True)
# vdj.data now returns a DataFrameAccessor wrapping a LazyFrame
vdj.to_eager()
# vdj.data is now a polars DataFrame (not lazy)
vdj.to_lazy()
# vdj.data is now a LazyFrame again
Import modules
[1]:
import os
os.chdir("dandelion_tutorial/")
import dandelion as ddl
ddl.logging.print_versions()
dandelion==1.0.0a1.dev36 pandas==2.3.3 numpy==2.3.5 matplotlib==3.10.6 networkx==3.6.1 scipy==1.15.2
[ ]:
vdj = ddl.read_zipddl("dandelion_results_simplified.zipddl")
vdj
Using PyTorch backend with Apple Metal GPU
Finding clones based on B cell VDJ chains using junction_aa: 100%|██████████| 1233/1233 [00:01<00:00, 803.50it/s]
Finding clones based on B cell VJ chains using junction_aa: 100%|██████████| 576/576 [00:00<00:00, 769.57it/s]
Lazy Dandelion object with n_obs = 2496 and n_contigs = 5767
data: sequence_id, sequence, rev_comp, productive, v_call, d_call, j_call, sequence_alignment, germline_alignment, junction, junction_aa, v_cigar, d_cigar, j_cigar, stop_codon, vj_in_frame, locus, junction_length, np1_length, np2_length, v_sequence_start, v_sequence_end, v_germline_start, v_germline_end, d_sequence_start, d_sequence_end, d_germline_start, d_germline_end, j_sequence_start, j_sequence_end, j_germline_start, j_germline_end, v_score, v_identity, v_support, d_score, d_identity, d_support, j_score, j_identity, j_support, fwr1, fwr2, fwr3, fwr4, cdr1, cdr2, cdr3, cell_id, consensus_count, umi_count, v_call_10x, d_call_10x, j_call_10x, junction_10x, junction_10x_aa, j_call_blastn, j_identity_blastn, j_alignment_length_blastn, j_number_of_mismatches_blastn, j_number_of_gap_openings_blastn, j_sequence_start_blastn, j_sequence_end_blastn, j_germline_start_blastn, j_germline_end_blastn, j_support_blastn, j_score_blastn, j_sequence_alignment_blastn, j_germline_alignment_blastn, j_call_igblastn, j_source, j_support_igblastn, j_score_igblastn, d_call_blastn, d_identity_blastn, d_alignment_length_blastn, d_number_of_mismatches_blastn, d_number_of_gap_openings_blastn, d_sequence_start_blastn, d_sequence_end_blastn, d_germline_start_blastn, d_germline_end_blastn, d_support_blastn, d_score_blastn, d_sequence_alignment_blastn, d_germline_alignment_blastn, d_call_igblastn, d_source, d_support_igblastn, d_score_igblastn, v_call_genotyped, germline_alignment_d_mask, sample_id, c_call, c_sequence_alignment, c_germline_alignment, c_sequence_start, c_sequence_end, c_score, c_identity, c_call_10x, junction_aa_length, fwr1_aa, fwr2_aa, fwr3_aa, fwr4_aa, cdr1_aa, cdr2_aa, cdr3_aa, sequence_alignment_aa, v_sequence_alignment_aa, d_sequence_alignment_aa, j_sequence_alignment_aa, complete_vdj, j_call_multimappers, j_call_multiplicity, j_call_sequence_start_multimappers, j_call_sequence_end_multimappers, j_call_support_multimappers, mu_count, extra, ambiguous, rearrangement_status, clone_id
metadata: cell_id, clone_id, clone_id_rank, sample_id, productive_VDJ, productive_VJ, d_call_VDJ, j_call_VDJ, j_call_VJ, junction_VDJ, junction_VJ, junction_aa_VDJ, junction_aa_VJ, locus_VDJ, locus_VJ, v_call_VDJ, v_call_VJ, c_call_VDJ, c_call_VJ, umi_count_VDJ, umi_count_VJ, productive_VDJ_main, productive_VJ_main, d_call_VDJ_main, j_call_VDJ_main, j_call_VJ_main, junction_VDJ_main, junction_VJ_main, junction_aa_VDJ_main, junction_aa_VJ_main, locus_VDJ_main, locus_VJ_main, v_call_genotyped_VDJ_main, v_call_genotyped_VJ_main, c_call_VDJ_main, c_call_VJ_main, umi_count_VDJ_main, umi_count_VJ_main, isotype, isotype_main, isotype_status, locus_status, chain_status, rearrangement_status_VDJ, rearrangement_status_VJ
layout: layout for 2351 vertices, layout for 148 vertices
graph: networkx graph of 2351 vertices, networkx graph of 148 vertices
Essentially, the .data slot holds the AIRR contig table while the .metadata holds a collapsed version that is compatible with combining with AnnData’s .obs slot. Both slots return lazy Polars expressions and require .collect() to retrieve the actual data. You can retrieve these slots like a typical class object; for example, if I want the metadata:
[3]:
vdj.metadata.collect()
[3]:
| cell_id | clone_id | clone_id_rank | sample_id | productive_VDJ | productive_VJ | d_call_VDJ | j_call_VDJ | j_call_VJ | junction_VDJ | junction_VJ | junction_aa_VDJ | junction_aa_VJ | locus_VDJ | locus_VJ | v_call_VDJ | v_call_VJ | c_call_VDJ | c_call_VJ | umi_count_VDJ | umi_count_VJ | productive_VDJ_main | productive_VJ_main | d_call_VDJ_main | j_call_VDJ_main | j_call_VJ_main | junction_VDJ_main | junction_VJ_main | junction_aa_VDJ_main | junction_aa_VJ_main | locus_VDJ_main | locus_VJ_main | v_call_genotyped_VDJ_main | v_call_genotyped_VJ_main | c_call_VDJ_main | c_call_VJ_main | umi_count_VDJ_main | umi_count_VJ_main | isotype | isotype_main | isotype_status | locus_status | chain_status | rearrangement_status_VDJ | rearrangement_status_VJ |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| str | str | cat | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | i64 | i64 | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | f64 | f64 | str | str | str | str | str | str | str |
| "sc5p_v2_hs_PBMC_10k_b_AAACCTGT… | "B_VJ_36_2_3" | "1286" | "sc5p_v2_hs_PBMC_10k_b" | null | "T" | null | null | "IGKJ4" | null | "TGTCAACAATATGACGAACTTCCCGTCACT… | null | "CQQYDELPVTF" | null | "IGK" | null | "IGKV1-33*01,IGKV1D-33*01" | null | "IGKC" | 0 | 68 | null | "T" | null | null | "IGKJ4" | null | "TGTCAACAATATGACGAACTTCCCGTCACT… | null | "CQQYDELPVTF" | null | "IGK" | null | "IGKV1-33*01,IGKV1D-33*01" | null | "IGKC" | null | 68.0 | null | null | null | "Orphan IGK" | "Orphan VJ" | null | "Standard" |
| "sc5p_v2_hs_PBMC_10k_b_AAACCTGT… | "B_VDJ_42_3_1_VJ_59_2_1" | "1983" | "sc5p_v2_hs_PBMC_10k_b" | "T" | "T" | "IGHD3-22" | "IGHJ3" | "IGKJ1" | "TGTGCGACTACGTATTACTATGATAGTAGT… | "TGTCAACAGTATTATAGTTACCCTCGGACG… | "CATTYYYDSSGYYQNDAFDIW" | "CQQYYSYPRTF" | "IGH" | "IGK" | "IGHV1-69*01,IGHV1-69D*01" | "IGKV1-8*01" | "IGHM" | "IGKC" | 51 | 43 | "T" | "T" | "IGHD3-22" | "IGHJ3" | "IGKJ1" | "TGTGCGACTACGTATTACTATGATAGTAGT… | "TGTCAACAGTATTATAGTTACCCTCGGACG… | "CATTYYYDSSGYYQNDAFDIW" | "CQQYYSYPRTF" | "IGH" | "IGK" | "IGHV1-69*01,IGHV1-69D*01" | "IGKV1-8*01" | "IGHM" | "IGKC" | 51.0 | 43.0 | "IgM" | "IgM" | "IgM" | "IGH + IGK" | "Single pair" | "Standard" | "Standard" |
| "sc5p_v2_hs_PBMC_10k_b_AAACCTGT… | "B_VDJ_9_1_2_VJ_253_1_1" | "1406" | "sc5p_v2_hs_PBMC_10k_b" | "T" | "T" | null | "IGHJ3" | "IGLJ3" | "TGTGCGAGAGAGATAGAGGGGGACGGTGTT… | "TGTATGATTTGGCACAGCAGCGCTTGGGTG… | "CAREIEGDGVFEIW" | "CMIWHSSAWVV" | "IGH" | "IGL" | "IGHV1-2*02" | "IGLV5-45*02" | "IGHM" | "IGLC3" | 47 | 90 | "T" | "T" | null | "IGHJ3" | "IGLJ3" | "TGTGCGAGAGAGATAGAGGGGGACGGTGTT… | "TGTATGATTTGGCACAGCAGCGCTTGGGTG… | "CAREIEGDGVFEIW" | "CMIWHSSAWVV" | "IGH" | "IGL" | "IGHV1-2*02" | "IGLV5-45*02" | "IGHM" | "IGLC3" | 47.0 | 90.0 | "IgM" | "IgM" | "IgM" | "IGH + IGL" | "Single pair" | "Standard" | "Standard" |
| "sc5p_v2_hs_PBMC_10k_b_AAACCTGT… | "B_VDJ_246_4_2_VJ_82_1_1" | "2175" | "sc5p_v2_hs_PBMC_10k_b" | "T" | "T" | null | "IGHJ3" | "IGKJ2" | "TGTGCGAGACATATCCGTGGGAACAGATTT… | "TGTCAACAGTATTATAGTTTCCCGTACACT… | "CARHIRGNRFGNDAFDIW" | "CQQYYSFPYTF" | "IGH" | "IGK" | "IGHV5-51*03" | "IGKV1D-8*01" | "IGHM" | "IGKC" | 80 | 22 | "T" | "T" | null | "IGHJ3" | "IGKJ2" | "TGTGCGAGACATATCCGTGGGAACAGATTT… | "TGTCAACAGTATTATAGTTTCCCGTACACT… | "CARHIRGNRFGNDAFDIW" | "CQQYYSFPYTF" | "IGH" | "IGK" | "IGHV5-51*03" | "IGKV1D-8*01" | "IGHM" | "IGKC" | 80.0 | 22.0 | "IgM" | "IgM" | "IgM" | "IGH + IGK" | "Single pair" | "Standard" | "Standard" |
| "sc5p_v2_hs_PBMC_10k_b_AAACGGGA… | "B_VDJ_217_2_1_VJ_222_2_1" | "234" | "sc5p_v2_hs_PBMC_10k_b" | "T" | "T" | "IGHD6-13" | "IGHJ3" | "IGLJ2" | "TGTGCGAGAGTAGGCTATAGAGCAGCAGCT… | "TGTAACTCCCGGGACAGCAGTGGTAACCAT… | "CARVGYRAAAGTDAFDIW" | "CNSRDSSGNHVVF" | "IGH" | "IGL" | "IGHV4-4*07" | "IGLV3-19*01" | "IGHM" | "IGLC" | 18 | 14 | "T" | "T" | "IGHD6-13" | "IGHJ3" | "IGLJ2" | "TGTGCGAGAGTAGGCTATAGAGCAGCAGCT… | "TGTAACTCCCGGGACAGCAGTGGTAACCAT… | "CARVGYRAAAGTDAFDIW" | "CNSRDSSGNHVVF" | "IGH" | "IGL" | "IGHV4-4*07" | "IGLV3-19*01" | "IGHM" | "IGLC" | 18.0 | 14.0 | "IgM" | "IgM" | "IgM" | "IGH + IGL" | "Single pair" | "Standard" | "Standard" |
| … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … |
| "vdj_v1_hs_pbmc3_b_TTTCCTCAGCGC… | "B_VDJ_109_5_5_VJ_98_1_1" | "1638" | "vdj_v1_hs_pbmc3_b" | "T" | "T" | "IGHD4-17" | "IGHJ6" | "IGKJ2" | "TGTGCGAAAGCCGCCTACGGTGAGGGGCTC… | "TGCATGCAAGGTACACACTGGCCGTACACT… | "CAKAAYGEGLRYYYYGMDVW" | "CMQGTHWPYTF" | "IGH" | "IGK" | "IGHV3-30*18" | "IGKV2-30*01" | "IGHM" | "IGKC" | 11 | 28 | "T" | "T" | "IGHD4-17" | "IGHJ6" | "IGKJ2" | "TGTGCGAAAGCCGCCTACGGTGAGGGGCTC… | "TGCATGCAAGGTACACACTGGCCGTACACT… | "CAKAAYGEGLRYYYYGMDVW" | "CMQGTHWPYTF" | "IGH" | "IGK" | "IGHV3-30*18" | "IGKV2-30*01" | "IGHM" | "IGKC" | 11.0 | 28.0 | "IgM" | "IgM" | "IgM" | "IGH + IGK" | "Single pair" | "Standard" | "Standard" |
| "vdj_v1_hs_pbmc3_b_TTTCCTCAGGGA… | "B_VDJ_232_2_1_VJ_39_4_1" | "238" | "vdj_v1_hs_pbmc3_b" | "T" | "T" | "IGHD6-13" | "IGHJ2" | "IGKJ1" | "TGTGCGAGACCCCGTATAGCAGGATCTGGG… | "TGTCAACAGAGTTACAGTACCCCGTGGACG… | "CARPRIAGSGWYFDLW" | "CQQSYSTPWTF" | "IGH" | "IGK" | "IGHV4-61*12" | "IGKV1-39*01,IGKV1D-39*01" | "IGHM" | "IGKC" | 14 | 159 | "T" | "T" | "IGHD6-13" | "IGHJ2" | "IGKJ1" | "TGTGCGAGACCCCGTATAGCAGGATCTGGG… | "TGTCAACAGAGTTACAGTACCCCGTGGACG… | "CARPRIAGSGWYFDLW" | "CQQSYSTPWTF" | "IGH" | "IGK" | "IGHV4-61*12" | "IGKV1-39*01,IGKV1D-39*01" | "IGHM" | "IGKC" | 14.0 | 159.0 | "IgM" | "IgM" | "IgM" | "IGH + IGK" | "Single pair" | "Standard" | "Standard" |
| "vdj_v1_hs_pbmc3_b_TTTCCTCTCGAC… | "B_VDJ_33_5_1_VJ_41_1_1" | "1887" | "vdj_v1_hs_pbmc3_b" | "T" | "T" | "IGHD2-15" | "IGHJ5" | "IGKJ2" | "TGTGCGAGAGAGGGATATTGTAGTGGTGGT… | "TGTCAACAGAGTTACAGTACCCCTCGGACT… | "CAREGYCSGGSCYSPDPNNGWFDPW" | "CQQSYSTPRTF" | "IGH" | "IGK" | "IGHV1-46*01" | "IGKV1-39*01,IGKV1D-39*01" | "IGHM" | "IGKC" | 28 | 35 | "T" | "T" | "IGHD2-15" | "IGHJ5" | "IGKJ2" | "TGTGCGAGAGAGGGATATTGTAGTGGTGGT… | "TGTCAACAGAGTTACAGTACCCCTCGGACT… | "CAREGYCSGGSCYSPDPNNGWFDPW" | "CQQSYSTPRTF" | "IGH" | "IGK" | "IGHV1-46*01" | "IGKV1-39*01,IGKV1D-39*01" | "IGHM" | "IGKC" | 28.0 | 35.0 | "IgM" | "IgM" | "IgM" | "IGH + IGK" | "Single pair" | "Standard" | "Standard" |
| "vdj_v1_hs_pbmc3_b_TTTGCGCCATAC… | "B_VDJ_45_4_2_VJ_181_3_1" | "953" | "vdj_v1_hs_pbmc3_b" | "T" | "T" | "IGHD2-15" | "IGHJ6" | "IGLJ3" | "TGTGCGAGATCTCTGGATATTGTAGTGGTG… | "TGTGCAGCATGGGATGACAGCCTGAGTGGT… | "CARSLDIVVVVALYYYYGMDVW" | "CAAWDDSLSGWVF" | "IGH" | "IGL" | "IGHV1-69*01,IGHV1-69D*01" | "IGLV1-47*01" | "IGHM" | "IGLC3" | 32 | 28 | "T" | "T" | "IGHD2-15" | "IGHJ6" | "IGLJ3" | "TGTGCGAGATCTCTGGATATTGTAGTGGTG… | "TGTGCAGCATGGGATGACAGCCTGAGTGGT… | "CARSLDIVVVVALYYYYGMDVW" | "CAAWDDSLSGWVF" | "IGH" | "IGL" | "IGHV1-69*01,IGHV1-69D*01" | "IGLV1-47*01" | "IGHM" | "IGLC3" | 32.0 | 28.0 | "IgM" | "IgM" | "IgM" | "IGH + IGL" | "Single pair" | "Standard" | "Standard" |
| "vdj_v1_hs_pbmc3_b_TTTGGTTGTAGG… | "B_VDJ_94_6_3_VJ_190_1_1" | "1392" | "vdj_v1_hs_pbmc3_b" | "T" | "T" | null | "IGHJ4" | "IGLJ2" | "TGTGCGGGGAGTCGGTGGTTATATTCTTTT… | "TGCTGCTCATATGCAGGCAGCTACACTGTG… | "CAGSRWLYSFDYW" | "CCSYAGSYTVFF" | "IGH" | "IGL" | "IGHV3-23*01,IGHV3-23D*01" | "IGLV2-11*01" | "IGHM" | "IGLC" | 22 | 36 | "T" | "T" | null | "IGHJ4" | "IGLJ2" | "TGTGCGGGGAGTCGGTGGTTATATTCTTTT… | "TGCTGCTCATATGCAGGCAGCTACACTGTG… | "CAGSRWLYSFDYW" | "CCSYAGSYTVFF" | "IGH" | "IGL" | "IGHV3-23*01,IGHV3-23D*01" | "IGLV2-11*01" | "IGHM" | "IGLC" | 22.0 | 36.0 | "IgM" | "IgM" | "IgM" | "IGH + IGL" | "Single pair" | "Standard" | "Standard" |
slicing
You can slice the DandelionPolars object via the .data or .metadata via their indices, with the behavior similar to how it is in pandas DataFrame and AnnData. Since both slots return lazy Polars expressions, you need to call .collect() before using standard Polars/pandas indexing operations on them.
slicing .data
[4]:
# get the largest clone
largest_clone = vdj.data.collect()["clone_id"].value_counts()["clone_id"][0]
vdj[vdj.data.collect()["clone_id"] == largest_clone]
[4]:
Lazy Dandelion object with n_obs = 1 and n_contigs = 2
data: sequence_id, sequence, rev_comp, productive, v_call, d_call, j_call, sequence_alignment, germline_alignment, junction, junction_aa, v_cigar, d_cigar, j_cigar, stop_codon, vj_in_frame, locus, junction_length, np1_length, np2_length, v_sequence_start, v_sequence_end, v_germline_start, v_germline_end, d_sequence_start, d_sequence_end, d_germline_start, d_germline_end, j_sequence_start, j_sequence_end, j_germline_start, j_germline_end, v_score, v_identity, v_support, d_score, d_identity, d_support, j_score, j_identity, j_support, fwr1, fwr2, fwr3, fwr4, cdr1, cdr2, cdr3, cell_id, consensus_count, umi_count, v_call_10x, d_call_10x, j_call_10x, junction_10x, junction_10x_aa, j_call_blastn, j_identity_blastn, j_alignment_length_blastn, j_number_of_mismatches_blastn, j_number_of_gap_openings_blastn, j_sequence_start_blastn, j_sequence_end_blastn, j_germline_start_blastn, j_germline_end_blastn, j_support_blastn, j_score_blastn, j_sequence_alignment_blastn, j_germline_alignment_blastn, j_call_igblastn, j_source, j_support_igblastn, j_score_igblastn, d_call_blastn, d_identity_blastn, d_alignment_length_blastn, d_number_of_mismatches_blastn, d_number_of_gap_openings_blastn, d_sequence_start_blastn, d_sequence_end_blastn, d_germline_start_blastn, d_germline_end_blastn, d_support_blastn, d_score_blastn, d_sequence_alignment_blastn, d_germline_alignment_blastn, d_call_igblastn, d_source, d_support_igblastn, d_score_igblastn, v_call_genotyped, germline_alignment_d_mask, sample_id, c_call, c_sequence_alignment, c_germline_alignment, c_sequence_start, c_sequence_end, c_score, c_identity, c_call_10x, junction_aa_length, fwr1_aa, fwr2_aa, fwr3_aa, fwr4_aa, cdr1_aa, cdr2_aa, cdr3_aa, sequence_alignment_aa, v_sequence_alignment_aa, d_sequence_alignment_aa, j_sequence_alignment_aa, complete_vdj, j_call_multimappers, j_call_multiplicity, j_call_sequence_start_multimappers, j_call_sequence_end_multimappers, j_call_support_multimappers, mu_count, extra, ambiguous, rearrangement_status, clone_id
metadata: cell_id, clone_id, clone_id_rank, sample_id, productive_VDJ, productive_VJ, d_call_VDJ, j_call_VDJ, j_call_VJ, junction_VDJ, junction_VJ, junction_aa_VDJ, junction_aa_VJ, locus_VDJ, locus_VJ, v_call_VDJ, v_call_VJ, c_call_VDJ, c_call_VJ, umi_count_VDJ, umi_count_VJ, productive_VDJ_main, productive_VJ_main, d_call_VDJ_main, j_call_VDJ_main, j_call_VJ_main, junction_VDJ_main, junction_VJ_main, junction_aa_VDJ_main, junction_aa_VJ_main, locus_VDJ_main, locus_VJ_main, v_call_genotyped_VDJ_main, v_call_genotyped_VJ_main, c_call_VDJ_main, c_call_VJ_main, umi_count_VDJ_main, umi_count_VJ_main, isotype, isotype_main, isotype_status, locus_status, chain_status, rearrangement_status_VDJ, rearrangement_status_VJ
layout: layout for 1 vertices, layout for 0 vertices
graph: networkx graph of 1 vertices, networkx graph of 0 vertices
[5]:
vdj[
vdj.data_names.is_in(
[
"sc5p_v2_hs_PBMC_10k_b_AAACCTGTCATATCGG_contig_1",
"sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_contig_2",
"sc5p_v2_hs_PBMC_10k_b_AAACCTGTCCGTTGTC_contig_1",
"sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_contig_1",
"sc5p_v2_hs_PBMC_10k_b_AAACCTGTCGAGAACG_contig_2",
]
)
]
[5]:
Lazy Dandelion object with n_obs = 3 and n_contigs = 5
data: sequence_id, sequence, rev_comp, productive, v_call, d_call, j_call, sequence_alignment, germline_alignment, junction, junction_aa, v_cigar, d_cigar, j_cigar, stop_codon, vj_in_frame, locus, junction_length, np1_length, np2_length, v_sequence_start, v_sequence_end, v_germline_start, v_germline_end, d_sequence_start, d_sequence_end, d_germline_start, d_germline_end, j_sequence_start, j_sequence_end, j_germline_start, j_germline_end, v_score, v_identity, v_support, d_score, d_identity, d_support, j_score, j_identity, j_support, fwr1, fwr2, fwr3, fwr4, cdr1, cdr2, cdr3, cell_id, consensus_count, umi_count, v_call_10x, d_call_10x, j_call_10x, junction_10x, junction_10x_aa, j_call_blastn, j_identity_blastn, j_alignment_length_blastn, j_number_of_mismatches_blastn, j_number_of_gap_openings_blastn, j_sequence_start_blastn, j_sequence_end_blastn, j_germline_start_blastn, j_germline_end_blastn, j_support_blastn, j_score_blastn, j_sequence_alignment_blastn, j_germline_alignment_blastn, j_call_igblastn, j_source, j_support_igblastn, j_score_igblastn, d_call_blastn, d_identity_blastn, d_alignment_length_blastn, d_number_of_mismatches_blastn, d_number_of_gap_openings_blastn, d_sequence_start_blastn, d_sequence_end_blastn, d_germline_start_blastn, d_germline_end_blastn, d_support_blastn, d_score_blastn, d_sequence_alignment_blastn, d_germline_alignment_blastn, d_call_igblastn, d_source, d_support_igblastn, d_score_igblastn, v_call_genotyped, germline_alignment_d_mask, sample_id, c_call, c_sequence_alignment, c_germline_alignment, c_sequence_start, c_sequence_end, c_score, c_identity, c_call_10x, junction_aa_length, fwr1_aa, fwr2_aa, fwr3_aa, fwr4_aa, cdr1_aa, cdr2_aa, cdr3_aa, sequence_alignment_aa, v_sequence_alignment_aa, d_sequence_alignment_aa, j_sequence_alignment_aa, complete_vdj, j_call_multimappers, j_call_multiplicity, j_call_sequence_start_multimappers, j_call_sequence_end_multimappers, j_call_support_multimappers, mu_count, extra, ambiguous, rearrangement_status, clone_id
metadata: cell_id, clone_id, clone_id_rank, sample_id, productive_VDJ, productive_VJ, d_call_VDJ, j_call_VDJ, j_call_VJ, junction_VDJ, junction_VJ, junction_aa_VDJ, junction_aa_VJ, locus_VDJ, locus_VJ, v_call_VDJ, v_call_VJ, c_call_VDJ, c_call_VJ, umi_count_VDJ, umi_count_VJ, productive_VDJ_main, productive_VJ_main, d_call_VDJ_main, j_call_VDJ_main, j_call_VJ_main, junction_VDJ_main, junction_VJ_main, junction_aa_VDJ_main, junction_aa_VJ_main, locus_VDJ_main, locus_VJ_main, v_call_genotyped_VDJ_main, v_call_genotyped_VJ_main, c_call_VDJ_main, c_call_VJ_main, umi_count_VDJ_main, umi_count_VJ_main, isotype, isotype_main, isotype_status, locus_status, chain_status, rearrangement_status_VDJ, rearrangement_status_VJ
layout: layout for 2 vertices, layout for 0 vertices
graph: networkx graph of 2 vertices, networkx graph of 0 vertices
slicing .metadata
[6]:
vdj[vdj.metadata.collect()["productive_VDJ"].is_in(["T", "T|T"])]
[6]:
Lazy Dandelion object with n_obs = 2336 and n_contigs = 5557
data: sequence_id, sequence, rev_comp, productive, v_call, d_call, j_call, sequence_alignment, germline_alignment, junction, junction_aa, v_cigar, d_cigar, j_cigar, stop_codon, vj_in_frame, locus, junction_length, np1_length, np2_length, v_sequence_start, v_sequence_end, v_germline_start, v_germline_end, d_sequence_start, d_sequence_end, d_germline_start, d_germline_end, j_sequence_start, j_sequence_end, j_germline_start, j_germline_end, v_score, v_identity, v_support, d_score, d_identity, d_support, j_score, j_identity, j_support, fwr1, fwr2, fwr3, fwr4, cdr1, cdr2, cdr3, cell_id, consensus_count, umi_count, v_call_10x, d_call_10x, j_call_10x, junction_10x, junction_10x_aa, j_call_blastn, j_identity_blastn, j_alignment_length_blastn, j_number_of_mismatches_blastn, j_number_of_gap_openings_blastn, j_sequence_start_blastn, j_sequence_end_blastn, j_germline_start_blastn, j_germline_end_blastn, j_support_blastn, j_score_blastn, j_sequence_alignment_blastn, j_germline_alignment_blastn, j_call_igblastn, j_source, j_support_igblastn, j_score_igblastn, d_call_blastn, d_identity_blastn, d_alignment_length_blastn, d_number_of_mismatches_blastn, d_number_of_gap_openings_blastn, d_sequence_start_blastn, d_sequence_end_blastn, d_germline_start_blastn, d_germline_end_blastn, d_support_blastn, d_score_blastn, d_sequence_alignment_blastn, d_germline_alignment_blastn, d_call_igblastn, d_source, d_support_igblastn, d_score_igblastn, v_call_genotyped, germline_alignment_d_mask, sample_id, c_call, c_sequence_alignment, c_germline_alignment, c_sequence_start, c_sequence_end, c_score, c_identity, c_call_10x, junction_aa_length, fwr1_aa, fwr2_aa, fwr3_aa, fwr4_aa, cdr1_aa, cdr2_aa, cdr3_aa, sequence_alignment_aa, v_sequence_alignment_aa, d_sequence_alignment_aa, j_sequence_alignment_aa, complete_vdj, j_call_multimappers, j_call_multiplicity, j_call_sequence_start_multimappers, j_call_sequence_end_multimappers, j_call_support_multimappers, mu_count, extra, ambiguous, rearrangement_status, clone_id
metadata: cell_id, clone_id, clone_id_rank, sample_id, productive_VDJ, productive_VJ, d_call_VDJ, j_call_VDJ, j_call_VJ, junction_VDJ, junction_VJ, junction_aa_VDJ, junction_aa_VJ, locus_VDJ, locus_VJ, v_call_VDJ, v_call_VJ, c_call_VDJ, c_call_VJ, umi_count_VDJ, umi_count_VJ, productive_VDJ_main, productive_VJ_main, d_call_VDJ_main, j_call_VDJ_main, j_call_VJ_main, junction_VDJ_main, junction_VJ_main, junction_aa_VDJ_main, junction_aa_VJ_main, locus_VDJ_main, locus_VJ_main, v_call_genotyped_VDJ_main, v_call_genotyped_VJ_main, c_call_VDJ_main, c_call_VJ_main, umi_count_VDJ_main, umi_count_VJ_main, isotype, isotype_main, isotype_status, locus_status, chain_status, rearrangement_status_VDJ, rearrangement_status_VJ
layout: layout for 2336 vertices, layout for 146 vertices
graph: networkx graph of 2336 vertices, networkx graph of 146 vertices
[7]:
vdj[vdj.metadata_names == "vdj_v1_hs_pbmc3_b_TTTCCTCAGCGCTTAT"]
[7]:
Lazy Dandelion object with n_obs = 1 and n_contigs = 2
data: sequence_id, sequence, rev_comp, productive, v_call, d_call, j_call, sequence_alignment, germline_alignment, junction, junction_aa, v_cigar, d_cigar, j_cigar, stop_codon, vj_in_frame, locus, junction_length, np1_length, np2_length, v_sequence_start, v_sequence_end, v_germline_start, v_germline_end, d_sequence_start, d_sequence_end, d_germline_start, d_germline_end, j_sequence_start, j_sequence_end, j_germline_start, j_germline_end, v_score, v_identity, v_support, d_score, d_identity, d_support, j_score, j_identity, j_support, fwr1, fwr2, fwr3, fwr4, cdr1, cdr2, cdr3, cell_id, consensus_count, umi_count, v_call_10x, d_call_10x, j_call_10x, junction_10x, junction_10x_aa, j_call_blastn, j_identity_blastn, j_alignment_length_blastn, j_number_of_mismatches_blastn, j_number_of_gap_openings_blastn, j_sequence_start_blastn, j_sequence_end_blastn, j_germline_start_blastn, j_germline_end_blastn, j_support_blastn, j_score_blastn, j_sequence_alignment_blastn, j_germline_alignment_blastn, j_call_igblastn, j_source, j_support_igblastn, j_score_igblastn, d_call_blastn, d_identity_blastn, d_alignment_length_blastn, d_number_of_mismatches_blastn, d_number_of_gap_openings_blastn, d_sequence_start_blastn, d_sequence_end_blastn, d_germline_start_blastn, d_germline_end_blastn, d_support_blastn, d_score_blastn, d_sequence_alignment_blastn, d_germline_alignment_blastn, d_call_igblastn, d_source, d_support_igblastn, d_score_igblastn, v_call_genotyped, germline_alignment_d_mask, sample_id, c_call, c_sequence_alignment, c_germline_alignment, c_sequence_start, c_sequence_end, c_score, c_identity, c_call_10x, junction_aa_length, fwr1_aa, fwr2_aa, fwr3_aa, fwr4_aa, cdr1_aa, cdr2_aa, cdr3_aa, sequence_alignment_aa, v_sequence_alignment_aa, d_sequence_alignment_aa, j_sequence_alignment_aa, complete_vdj, j_call_multimappers, j_call_multiplicity, j_call_sequence_start_multimappers, j_call_sequence_end_multimappers, j_call_support_multimappers, mu_count, extra, ambiguous, rearrangement_status, clone_id
metadata: cell_id, clone_id, clone_id_rank, sample_id, productive_VDJ, productive_VJ, d_call_VDJ, j_call_VDJ, j_call_VJ, junction_VDJ, junction_VJ, junction_aa_VDJ, junction_aa_VJ, locus_VDJ, locus_VJ, v_call_VDJ, v_call_VJ, c_call_VDJ, c_call_VJ, umi_count_VDJ, umi_count_VJ, productive_VDJ_main, productive_VJ_main, d_call_VDJ_main, j_call_VDJ_main, j_call_VJ_main, junction_VDJ_main, junction_VJ_main, junction_aa_VDJ_main, junction_aa_VJ_main, locus_VDJ_main, locus_VJ_main, v_call_genotyped_VDJ_main, v_call_genotyped_VJ_main, c_call_VDJ_main, c_call_VJ_main, umi_count_VDJ_main, umi_count_VJ_main, isotype, isotype_main, isotype_status, locus_status, chain_status, rearrangement_status_VDJ, rearrangement_status_VJ
layout: layout for 1 vertices, layout for 0 vertices
graph: networkx graph of 1 vertices, networkx graph of 0 vertices
copy
You can deep copy the DandelionPolars object to another variable which will inherit all slots:
[8]:
vdj2 = vdj.copy()
vdj2.metadata.collect()
[8]:
| cell_id | clone_id | clone_id_rank | sample_id | productive_VDJ | productive_VJ | d_call_VDJ | j_call_VDJ | j_call_VJ | junction_VDJ | junction_VJ | junction_aa_VDJ | junction_aa_VJ | locus_VDJ | locus_VJ | v_call_VDJ | v_call_VJ | c_call_VDJ | c_call_VJ | umi_count_VDJ | umi_count_VJ | productive_VDJ_main | productive_VJ_main | d_call_VDJ_main | j_call_VDJ_main | j_call_VJ_main | junction_VDJ_main | junction_VJ_main | junction_aa_VDJ_main | junction_aa_VJ_main | locus_VDJ_main | locus_VJ_main | v_call_genotyped_VDJ_main | v_call_genotyped_VJ_main | c_call_VDJ_main | c_call_VJ_main | umi_count_VDJ_main | umi_count_VJ_main | isotype | isotype_main | isotype_status | locus_status | chain_status | rearrangement_status_VDJ | rearrangement_status_VJ |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| str | str | cat | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | i64 | i64 | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | f64 | f64 | str | str | str | str | str | str | str |
| "sc5p_v2_hs_PBMC_10k_b_AAACCTGT… | "B_VJ_36_2_3" | "1286" | "sc5p_v2_hs_PBMC_10k_b" | null | "T" | null | null | "IGKJ4" | null | "TGTCAACAATATGACGAACTTCCCGTCACT… | null | "CQQYDELPVTF" | null | "IGK" | null | "IGKV1-33*01,IGKV1D-33*01" | null | "IGKC" | 0 | 68 | null | "T" | null | null | "IGKJ4" | null | "TGTCAACAATATGACGAACTTCCCGTCACT… | null | "CQQYDELPVTF" | null | "IGK" | null | "IGKV1-33*01,IGKV1D-33*01" | null | "IGKC" | null | 68.0 | null | null | null | "Orphan IGK" | "Orphan VJ" | null | "Standard" |
| "sc5p_v2_hs_PBMC_10k_b_AAACCTGT… | "B_VDJ_42_3_1_VJ_59_2_1" | "1983" | "sc5p_v2_hs_PBMC_10k_b" | "T" | "T" | "IGHD3-22" | "IGHJ3" | "IGKJ1" | "TGTGCGACTACGTATTACTATGATAGTAGT… | "TGTCAACAGTATTATAGTTACCCTCGGACG… | "CATTYYYDSSGYYQNDAFDIW" | "CQQYYSYPRTF" | "IGH" | "IGK" | "IGHV1-69*01,IGHV1-69D*01" | "IGKV1-8*01" | "IGHM" | "IGKC" | 51 | 43 | "T" | "T" | "IGHD3-22" | "IGHJ3" | "IGKJ1" | "TGTGCGACTACGTATTACTATGATAGTAGT… | "TGTCAACAGTATTATAGTTACCCTCGGACG… | "CATTYYYDSSGYYQNDAFDIW" | "CQQYYSYPRTF" | "IGH" | "IGK" | "IGHV1-69*01,IGHV1-69D*01" | "IGKV1-8*01" | "IGHM" | "IGKC" | 51.0 | 43.0 | "IgM" | "IgM" | "IgM" | "IGH + IGK" | "Single pair" | "Standard" | "Standard" |
| "sc5p_v2_hs_PBMC_10k_b_AAACCTGT… | "B_VDJ_9_1_2_VJ_253_1_1" | "1406" | "sc5p_v2_hs_PBMC_10k_b" | "T" | "T" | null | "IGHJ3" | "IGLJ3" | "TGTGCGAGAGAGATAGAGGGGGACGGTGTT… | "TGTATGATTTGGCACAGCAGCGCTTGGGTG… | "CAREIEGDGVFEIW" | "CMIWHSSAWVV" | "IGH" | "IGL" | "IGHV1-2*02" | "IGLV5-45*02" | "IGHM" | "IGLC3" | 47 | 90 | "T" | "T" | null | "IGHJ3" | "IGLJ3" | "TGTGCGAGAGAGATAGAGGGGGACGGTGTT… | "TGTATGATTTGGCACAGCAGCGCTTGGGTG… | "CAREIEGDGVFEIW" | "CMIWHSSAWVV" | "IGH" | "IGL" | "IGHV1-2*02" | "IGLV5-45*02" | "IGHM" | "IGLC3" | 47.0 | 90.0 | "IgM" | "IgM" | "IgM" | "IGH + IGL" | "Single pair" | "Standard" | "Standard" |
| "sc5p_v2_hs_PBMC_10k_b_AAACCTGT… | "B_VDJ_246_4_2_VJ_82_1_1" | "2175" | "sc5p_v2_hs_PBMC_10k_b" | "T" | "T" | null | "IGHJ3" | "IGKJ2" | "TGTGCGAGACATATCCGTGGGAACAGATTT… | "TGTCAACAGTATTATAGTTTCCCGTACACT… | "CARHIRGNRFGNDAFDIW" | "CQQYYSFPYTF" | "IGH" | "IGK" | "IGHV5-51*03" | "IGKV1D-8*01" | "IGHM" | "IGKC" | 80 | 22 | "T" | "T" | null | "IGHJ3" | "IGKJ2" | "TGTGCGAGACATATCCGTGGGAACAGATTT… | "TGTCAACAGTATTATAGTTTCCCGTACACT… | "CARHIRGNRFGNDAFDIW" | "CQQYYSFPYTF" | "IGH" | "IGK" | "IGHV5-51*03" | "IGKV1D-8*01" | "IGHM" | "IGKC" | 80.0 | 22.0 | "IgM" | "IgM" | "IgM" | "IGH + IGK" | "Single pair" | "Standard" | "Standard" |
| "sc5p_v2_hs_PBMC_10k_b_AAACGGGA… | "B_VDJ_217_2_1_VJ_222_2_1" | "234" | "sc5p_v2_hs_PBMC_10k_b" | "T" | "T" | "IGHD6-13" | "IGHJ3" | "IGLJ2" | "TGTGCGAGAGTAGGCTATAGAGCAGCAGCT… | "TGTAACTCCCGGGACAGCAGTGGTAACCAT… | "CARVGYRAAAGTDAFDIW" | "CNSRDSSGNHVVF" | "IGH" | "IGL" | "IGHV4-4*07" | "IGLV3-19*01" | "IGHM" | "IGLC" | 18 | 14 | "T" | "T" | "IGHD6-13" | "IGHJ3" | "IGLJ2" | "TGTGCGAGAGTAGGCTATAGAGCAGCAGCT… | "TGTAACTCCCGGGACAGCAGTGGTAACCAT… | "CARVGYRAAAGTDAFDIW" | "CNSRDSSGNHVVF" | "IGH" | "IGL" | "IGHV4-4*07" | "IGLV3-19*01" | "IGHM" | "IGLC" | 18.0 | 14.0 | "IgM" | "IgM" | "IgM" | "IGH + IGL" | "Single pair" | "Standard" | "Standard" |
| … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … |
| "vdj_v1_hs_pbmc3_b_TTTCCTCAGCGC… | "B_VDJ_109_5_5_VJ_98_1_1" | "1638" | "vdj_v1_hs_pbmc3_b" | "T" | "T" | "IGHD4-17" | "IGHJ6" | "IGKJ2" | "TGTGCGAAAGCCGCCTACGGTGAGGGGCTC… | "TGCATGCAAGGTACACACTGGCCGTACACT… | "CAKAAYGEGLRYYYYGMDVW" | "CMQGTHWPYTF" | "IGH" | "IGK" | "IGHV3-30*18" | "IGKV2-30*01" | "IGHM" | "IGKC" | 11 | 28 | "T" | "T" | "IGHD4-17" | "IGHJ6" | "IGKJ2" | "TGTGCGAAAGCCGCCTACGGTGAGGGGCTC… | "TGCATGCAAGGTACACACTGGCCGTACACT… | "CAKAAYGEGLRYYYYGMDVW" | "CMQGTHWPYTF" | "IGH" | "IGK" | "IGHV3-30*18" | "IGKV2-30*01" | "IGHM" | "IGKC" | 11.0 | 28.0 | "IgM" | "IgM" | "IgM" | "IGH + IGK" | "Single pair" | "Standard" | "Standard" |
| "vdj_v1_hs_pbmc3_b_TTTCCTCAGGGA… | "B_VDJ_232_2_1_VJ_39_4_1" | "238" | "vdj_v1_hs_pbmc3_b" | "T" | "T" | "IGHD6-13" | "IGHJ2" | "IGKJ1" | "TGTGCGAGACCCCGTATAGCAGGATCTGGG… | "TGTCAACAGAGTTACAGTACCCCGTGGACG… | "CARPRIAGSGWYFDLW" | "CQQSYSTPWTF" | "IGH" | "IGK" | "IGHV4-61*12" | "IGKV1-39*01,IGKV1D-39*01" | "IGHM" | "IGKC" | 14 | 159 | "T" | "T" | "IGHD6-13" | "IGHJ2" | "IGKJ1" | "TGTGCGAGACCCCGTATAGCAGGATCTGGG… | "TGTCAACAGAGTTACAGTACCCCGTGGACG… | "CARPRIAGSGWYFDLW" | "CQQSYSTPWTF" | "IGH" | "IGK" | "IGHV4-61*12" | "IGKV1-39*01,IGKV1D-39*01" | "IGHM" | "IGKC" | 14.0 | 159.0 | "IgM" | "IgM" | "IgM" | "IGH + IGK" | "Single pair" | "Standard" | "Standard" |
| "vdj_v1_hs_pbmc3_b_TTTCCTCTCGAC… | "B_VDJ_33_5_1_VJ_41_1_1" | "1887" | "vdj_v1_hs_pbmc3_b" | "T" | "T" | "IGHD2-15" | "IGHJ5" | "IGKJ2" | "TGTGCGAGAGAGGGATATTGTAGTGGTGGT… | "TGTCAACAGAGTTACAGTACCCCTCGGACT… | "CAREGYCSGGSCYSPDPNNGWFDPW" | "CQQSYSTPRTF" | "IGH" | "IGK" | "IGHV1-46*01" | "IGKV1-39*01,IGKV1D-39*01" | "IGHM" | "IGKC" | 28 | 35 | "T" | "T" | "IGHD2-15" | "IGHJ5" | "IGKJ2" | "TGTGCGAGAGAGGGATATTGTAGTGGTGGT… | "TGTCAACAGAGTTACAGTACCCCTCGGACT… | "CAREGYCSGGSCYSPDPNNGWFDPW" | "CQQSYSTPRTF" | "IGH" | "IGK" | "IGHV1-46*01" | "IGKV1-39*01,IGKV1D-39*01" | "IGHM" | "IGKC" | 28.0 | 35.0 | "IgM" | "IgM" | "IgM" | "IGH + IGK" | "Single pair" | "Standard" | "Standard" |
| "vdj_v1_hs_pbmc3_b_TTTGCGCCATAC… | "B_VDJ_45_4_2_VJ_181_3_1" | "953" | "vdj_v1_hs_pbmc3_b" | "T" | "T" | "IGHD2-15" | "IGHJ6" | "IGLJ3" | "TGTGCGAGATCTCTGGATATTGTAGTGGTG… | "TGTGCAGCATGGGATGACAGCCTGAGTGGT… | "CARSLDIVVVVALYYYYGMDVW" | "CAAWDDSLSGWVF" | "IGH" | "IGL" | "IGHV1-69*01,IGHV1-69D*01" | "IGLV1-47*01" | "IGHM" | "IGLC3" | 32 | 28 | "T" | "T" | "IGHD2-15" | "IGHJ6" | "IGLJ3" | "TGTGCGAGATCTCTGGATATTGTAGTGGTG… | "TGTGCAGCATGGGATGACAGCCTGAGTGGT… | "CARSLDIVVVVALYYYYGMDVW" | "CAAWDDSLSGWVF" | "IGH" | "IGL" | "IGHV1-69*01,IGHV1-69D*01" | "IGLV1-47*01" | "IGHM" | "IGLC3" | 32.0 | 28.0 | "IgM" | "IgM" | "IgM" | "IGH + IGL" | "Single pair" | "Standard" | "Standard" |
| "vdj_v1_hs_pbmc3_b_TTTGGTTGTAGG… | "B_VDJ_94_6_3_VJ_190_1_1" | "1392" | "vdj_v1_hs_pbmc3_b" | "T" | "T" | null | "IGHJ4" | "IGLJ2" | "TGTGCGGGGAGTCGGTGGTTATATTCTTTT… | "TGCTGCTCATATGCAGGCAGCTACACTGTG… | "CAGSRWLYSFDYW" | "CCSYAGSYTVFF" | "IGH" | "IGL" | "IGHV3-23*01,IGHV3-23D*01" | "IGLV2-11*01" | "IGHM" | "IGLC" | 22 | 36 | "T" | "T" | null | "IGHJ4" | "IGLJ2" | "TGTGCGGGGAGTCGGTGGTTATATTCTTTT… | "TGCTGCTCATATGCAGGCAGCTACACTGTG… | "CAGSRWLYSFDYW" | "CCSYAGSYTVFF" | "IGH" | "IGL" | "IGHV3-23*01,IGHV3-23D*01" | "IGLV2-11*01" | "IGHM" | "IGLC" | 22.0 | 36.0 | "IgM" | "IgM" | "IgM" | "IGH + IGL" | "Single pair" | "Standard" | "Standard" |
Retrieving entries with update_metadata
The .metadata slot in DandelionPolars class automatically initializes whenever the .data slot is filled. However, it only returns a standard number of columns that are pre-specified. To retrieve other columns from the .data slot, we can update the metadata with ddl.update_metadata and specify the options retrieve and retrieve_mode.
The following modes determine how the retrieval is completed:
split and unique only - splits the retrieval into VDJ and VJ chains. A | will separate unique element.
split and merge - splits the retrieval into VDJ and VJ chains. A | will separate every element.
merge and unique only - smiliar to above but merged into a single column.
split - split retrieval into individual columns for each contig.
merge - merge retrieval into a single column where a | will separate every element.
For numerical columns, there’s additional options:
split and sum - splits the retrieval into VDJ and VJ chains and sum separately.
split and average - smiliar to above but average instead of sum.
sum - sum the retrievals into a single column.
average - averages the retrievals into a single column.
If retrieve_mode is not specified, it will default to split and merge
Example: retrieving fwr1 sequences
[9]:
vdj.update_metadata(retrieve="fwr1")
vdj
[9]:
Lazy Dandelion object with n_obs = 2496 and n_contigs = 5767
data: sequence_id, sequence, rev_comp, productive, v_call, d_call, j_call, sequence_alignment, germline_alignment, junction, junction_aa, v_cigar, d_cigar, j_cigar, stop_codon, vj_in_frame, locus, junction_length, np1_length, np2_length, v_sequence_start, v_sequence_end, v_germline_start, v_germline_end, d_sequence_start, d_sequence_end, d_germline_start, d_germline_end, j_sequence_start, j_sequence_end, j_germline_start, j_germline_end, v_score, v_identity, v_support, d_score, d_identity, d_support, j_score, j_identity, j_support, fwr1, fwr2, fwr3, fwr4, cdr1, cdr2, cdr3, cell_id, consensus_count, umi_count, v_call_10x, d_call_10x, j_call_10x, junction_10x, junction_10x_aa, j_call_blastn, j_identity_blastn, j_alignment_length_blastn, j_number_of_mismatches_blastn, j_number_of_gap_openings_blastn, j_sequence_start_blastn, j_sequence_end_blastn, j_germline_start_blastn, j_germline_end_blastn, j_support_blastn, j_score_blastn, j_sequence_alignment_blastn, j_germline_alignment_blastn, j_call_igblastn, j_source, j_support_igblastn, j_score_igblastn, d_call_blastn, d_identity_blastn, d_alignment_length_blastn, d_number_of_mismatches_blastn, d_number_of_gap_openings_blastn, d_sequence_start_blastn, d_sequence_end_blastn, d_germline_start_blastn, d_germline_end_blastn, d_support_blastn, d_score_blastn, d_sequence_alignment_blastn, d_germline_alignment_blastn, d_call_igblastn, d_source, d_support_igblastn, d_score_igblastn, v_call_genotyped, germline_alignment_d_mask, sample_id, c_call, c_sequence_alignment, c_germline_alignment, c_sequence_start, c_sequence_end, c_score, c_identity, c_call_10x, junction_aa_length, fwr1_aa, fwr2_aa, fwr3_aa, fwr4_aa, cdr1_aa, cdr2_aa, cdr3_aa, sequence_alignment_aa, v_sequence_alignment_aa, d_sequence_alignment_aa, j_sequence_alignment_aa, complete_vdj, j_call_multimappers, j_call_multiplicity, j_call_sequence_start_multimappers, j_call_sequence_end_multimappers, j_call_support_multimappers, mu_count, extra, ambiguous, rearrangement_status, clone_id
metadata: cell_id, clone_id, clone_id_rank, sample_id, productive_VDJ, productive_VJ, d_call_VDJ, j_call_VDJ, j_call_VJ, junction_VDJ, junction_VJ, junction_aa_VDJ, junction_aa_VJ, locus_VDJ, locus_VJ, v_call_VDJ, v_call_VJ, c_call_VDJ, c_call_VJ, umi_count_VDJ, umi_count_VJ, productive_VDJ_main, productive_VJ_main, d_call_VDJ_main, j_call_VDJ_main, j_call_VJ_main, junction_VDJ_main, junction_VJ_main, junction_aa_VDJ_main, junction_aa_VJ_main, locus_VDJ_main, locus_VJ_main, v_call_genotyped_VDJ_main, v_call_genotyped_VJ_main, c_call_VDJ_main, c_call_VJ_main, umi_count_VDJ_main, umi_count_VJ_main, isotype, isotype_main, isotype_status, locus_status, chain_status, rearrangement_status_VDJ, rearrangement_status_VJ, fwr1_VDJ, fwr1_VJ
layout: layout for 2351 vertices, layout for 148 vertices
graph: networkx graph of 2351 vertices, networkx graph of 148 vertices
Note the additional fwr1 VDJ and VJ columns in the metadata slot.
By default, dandelion will not try to merge numerical columns as it can create mixed dtype columns.
There is a new sub-function that will try and retrieve frequently used columns such as np1_length, np2_length:
[10]:
vdj.update_plus()
vdj
[10]:
Lazy Dandelion object with n_obs = 2496 and n_contigs = 5767
data: sequence_id, sequence, rev_comp, productive, v_call, d_call, j_call, sequence_alignment, germline_alignment, junction, junction_aa, v_cigar, d_cigar, j_cigar, stop_codon, vj_in_frame, locus, junction_length, np1_length, np2_length, v_sequence_start, v_sequence_end, v_germline_start, v_germline_end, d_sequence_start, d_sequence_end, d_germline_start, d_germline_end, j_sequence_start, j_sequence_end, j_germline_start, j_germline_end, v_score, v_identity, v_support, d_score, d_identity, d_support, j_score, j_identity, j_support, fwr1, fwr2, fwr3, fwr4, cdr1, cdr2, cdr3, cell_id, consensus_count, umi_count, v_call_10x, d_call_10x, j_call_10x, junction_10x, junction_10x_aa, j_call_blastn, j_identity_blastn, j_alignment_length_blastn, j_number_of_mismatches_blastn, j_number_of_gap_openings_blastn, j_sequence_start_blastn, j_sequence_end_blastn, j_germline_start_blastn, j_germline_end_blastn, j_support_blastn, j_score_blastn, j_sequence_alignment_blastn, j_germline_alignment_blastn, j_call_igblastn, j_source, j_support_igblastn, j_score_igblastn, d_call_blastn, d_identity_blastn, d_alignment_length_blastn, d_number_of_mismatches_blastn, d_number_of_gap_openings_blastn, d_sequence_start_blastn, d_sequence_end_blastn, d_germline_start_blastn, d_germline_end_blastn, d_support_blastn, d_score_blastn, d_sequence_alignment_blastn, d_germline_alignment_blastn, d_call_igblastn, d_source, d_support_igblastn, d_score_igblastn, v_call_genotyped, germline_alignment_d_mask, sample_id, c_call, c_sequence_alignment, c_germline_alignment, c_sequence_start, c_sequence_end, c_score, c_identity, c_call_10x, junction_aa_length, fwr1_aa, fwr2_aa, fwr3_aa, fwr4_aa, cdr1_aa, cdr2_aa, cdr3_aa, sequence_alignment_aa, v_sequence_alignment_aa, d_sequence_alignment_aa, j_sequence_alignment_aa, complete_vdj, j_call_multimappers, j_call_multiplicity, j_call_sequence_start_multimappers, j_call_sequence_end_multimappers, j_call_support_multimappers, mu_count, extra, ambiguous, rearrangement_status, clone_id
metadata: cell_id, clone_id, clone_id_rank, sample_id, productive_VDJ, productive_VJ, d_call_VDJ, j_call_VDJ, j_call_VJ, junction_VDJ, junction_VJ, junction_aa_VDJ, junction_aa_VJ, locus_VDJ, locus_VJ, v_call_VDJ, v_call_VJ, c_call_VDJ, c_call_VJ, umi_count_VDJ, umi_count_VJ, productive_VDJ_main, productive_VJ_main, d_call_VDJ_main, j_call_VDJ_main, j_call_VJ_main, junction_VDJ_main, junction_VJ_main, junction_aa_VDJ_main, junction_aa_VJ_main, locus_VDJ_main, locus_VJ_main, v_call_genotyped_VDJ_main, v_call_genotyped_VJ_main, c_call_VDJ_main, c_call_VJ_main, umi_count_VDJ_main, umi_count_VJ_main, isotype, isotype_main, isotype_status, locus_status, chain_status, rearrangement_status_VDJ, rearrangement_status_VJ, fwr1_VDJ, fwr1_VJ, mu_count_VDJ, mu_count_VJ, mu_count, junction_length_VDJ, junction_length_VJ, junction_aa_length_VDJ, junction_aa_length_VJ, np1_length_VDJ, np1_length_VJ, np2_length_VDJ, np2_length_VJ
layout: layout for 2351 vertices, layout for 148 vertices
graph: networkx graph of 2351 vertices, networkx graph of 148 vertices
Renaming barcodes
You can now use a simple function to rename the barcodes (both sequence and cell ids at the same time). This is useful for when you want to rename the barcodes to a more meaningful name. This only works on the indices that were initially used to create the DandelionPolars object. So if you have run the function once already, it doesn’t continuously add the prefix/suffix to the new indices. It just updates based on the original indices.
[11]:
print(vdj.data.collect()[["sequence_id", "cell_id"]]), print(vdj.metadata_names)
shape: (5_767, 2)
┌─────────────────────────────────┬─────────────────────────────────┐
│ sequence_id ┆ cell_id │
│ --- ┆ --- │
│ str ┆ str │
╞═════════════════════════════════╪═════════════════════════════════╡
│ sc5p_v2_hs_PBMC_10k_b_AAACCTGT… ┆ sc5p_v2_hs_PBMC_10k_b_AAACCTGT… │
│ sc5p_v2_hs_PBMC_10k_b_AAACCTGT… ┆ sc5p_v2_hs_PBMC_10k_b_AAACCTGT… │
│ sc5p_v2_hs_PBMC_10k_b_AAACCTGT… ┆ sc5p_v2_hs_PBMC_10k_b_AAACCTGT… │
│ sc5p_v2_hs_PBMC_10k_b_AAACCTGT… ┆ sc5p_v2_hs_PBMC_10k_b_AAACCTGT… │
│ sc5p_v2_hs_PBMC_10k_b_AAACCTGT… ┆ sc5p_v2_hs_PBMC_10k_b_AAACCTGT… │
│ … ┆ … │
│ vdj_v1_hs_pbmc3_b_TTTCCTCTCGAC… ┆ vdj_v1_hs_pbmc3_b_TTTCCTCTCGAC… │
│ vdj_v1_hs_pbmc3_b_TTTGCGCCATAC… ┆ vdj_v1_hs_pbmc3_b_TTTGCGCCATAC… │
│ vdj_v1_hs_pbmc3_b_TTTGCGCCATAC… ┆ vdj_v1_hs_pbmc3_b_TTTGCGCCATAC… │
│ vdj_v1_hs_pbmc3_b_TTTGGTTGTAGG… ┆ vdj_v1_hs_pbmc3_b_TTTGGTTGTAGG… │
│ vdj_v1_hs_pbmc3_b_TTTGGTTGTAGG… ┆ vdj_v1_hs_pbmc3_b_TTTGGTTGTAGG… │
└─────────────────────────────────┴─────────────────────────────────┘
shape: (2_496,)
Series: 'cell_id' [str]
[
"sc5p_v2_hs_PBMC_10k_b_GACGTGCT…
"sc5p_v2_hs_PBMC_10k_b_CTAACTTA…
"sc5p_v2_hs_PBMC_10k_b_GGGCATCA…
"sc5p_v2_hs_PBMC_10k_b_GGGTTGCG…
"sc5p_v2_hs_PBMC_10k_b_GTACTCCG…
…
"sc5p_v2_hs_PBMC_10k_b_AGTGTCAC…
"sc5p_v2_hs_PBMC_10k_b_AACTGGTT…
"sc5p_v2_hs_PBMC_10k_b_AGTGTCAC…
"sc5p_v2_hs_PBMC_10k_b_CCACGGAA…
"sc5p_v2_hs_PBMC_10k_b_CGACCTTG…
]
[11]:
(None, None)
[12]:
# let's add a 'test-' as a prefix. There's also the suffix option
vdj.add_sequence_prefix("test", sep="-")
print(vdj.data.collect()[["sequence_id", "cell_id"]]), print(vdj.metadata_names)
shape: (5_767, 2)
┌─────────────────────────────────┬─────────────────────────────────┐
│ sequence_id ┆ cell_id │
│ --- ┆ --- │
│ str ┆ str │
╞═════════════════════════════════╪═════════════════════════════════╡
│ test-sc5p_v2_hs_PBMC_10k_b_AAA… ┆ test-sc5p_v2_hs_PBMC_10k_b_AAA… │
│ test-sc5p_v2_hs_PBMC_10k_b_AAA… ┆ test-sc5p_v2_hs_PBMC_10k_b_AAA… │
│ test-sc5p_v2_hs_PBMC_10k_b_AAA… ┆ test-sc5p_v2_hs_PBMC_10k_b_AAA… │
│ test-sc5p_v2_hs_PBMC_10k_b_AAA… ┆ test-sc5p_v2_hs_PBMC_10k_b_AAA… │
│ test-sc5p_v2_hs_PBMC_10k_b_AAA… ┆ test-sc5p_v2_hs_PBMC_10k_b_AAA… │
│ … ┆ … │
│ test-vdj_v1_hs_pbmc3_b_TTTCCTC… ┆ test-vdj_v1_hs_pbmc3_b_TTTCCTC… │
│ test-vdj_v1_hs_pbmc3_b_TTTGCGC… ┆ test-vdj_v1_hs_pbmc3_b_TTTGCGC… │
│ test-vdj_v1_hs_pbmc3_b_TTTGCGC… ┆ test-vdj_v1_hs_pbmc3_b_TTTGCGC… │
│ test-vdj_v1_hs_pbmc3_b_TTTGGTT… ┆ test-vdj_v1_hs_pbmc3_b_TTTGGTT… │
│ test-vdj_v1_hs_pbmc3_b_TTTGGTT… ┆ test-vdj_v1_hs_pbmc3_b_TTTGGTT… │
└─────────────────────────────────┴─────────────────────────────────┘
shape: (2_496,)
Series: 'cell_id' [str]
[
"test-sc5p_v2_hs_PBMC_10k_b_AAA…
"test-sc5p_v2_hs_PBMC_10k_b_AAA…
"test-sc5p_v2_hs_PBMC_10k_b_AAA…
"test-sc5p_v2_hs_PBMC_10k_b_AAA…
"test-sc5p_v2_hs_PBMC_10k_b_AAA…
…
"test-vdj_v1_hs_pbmc3_b_TTTCCTC…
"test-vdj_v1_hs_pbmc3_b_TTTCCTC…
"test-vdj_v1_hs_pbmc3_b_TTTCCTC…
"test-vdj_v1_hs_pbmc3_b_TTTGCGC…
"test-vdj_v1_hs_pbmc3_b_TTTGGTT…
]
[12]:
(None, None)
[13]:
len(vdj._original_cell_ids.unique())
[13]:
2496
[14]:
vdj.metadata_names
[14]:
| cell_id |
|---|
| str |
| "test-sc5p_v2_hs_PBMC_10k_b_AAA… |
| "test-sc5p_v2_hs_PBMC_10k_b_AAA… |
| "test-sc5p_v2_hs_PBMC_10k_b_AAA… |
| "test-sc5p_v2_hs_PBMC_10k_b_AAA… |
| "test-sc5p_v2_hs_PBMC_10k_b_AAA… |
| … |
| "test-vdj_v1_hs_pbmc3_b_TTTCCTC… |
| "test-vdj_v1_hs_pbmc3_b_TTTCCTC… |
| "test-vdj_v1_hs_pbmc3_b_TTTCCTC… |
| "test-vdj_v1_hs_pbmc3_b_TTTGCGC… |
| "test-vdj_v1_hs_pbmc3_b_TTTGGTT… |
[15]:
vdj._original_cell_ids
[15]:
| cell_id |
|---|
| str |
| "sc5p_v2_hs_PBMC_10k_b_AAACCTGT… |
| "sc5p_v2_hs_PBMC_10k_b_AAACCTGT… |
| "sc5p_v2_hs_PBMC_10k_b_AAACCTGT… |
| "sc5p_v2_hs_PBMC_10k_b_AAACCTGT… |
| "sc5p_v2_hs_PBMC_10k_b_AAACCTGT… |
| … |
| "vdj_v1_hs_pbmc3_b_TTTCCTCTCGAC… |
| "vdj_v1_hs_pbmc3_b_TTTGCGCCATAC… |
| "vdj_v1_hs_pbmc3_b_TTTGCGCCATAC… |
| "vdj_v1_hs_pbmc3_b_TTTGGTTGTAGG… |
| "vdj_v1_hs_pbmc3_b_TTTGGTTGTAGG… |
[16]:
# same functionality as above
vdj.add_cell_prefix("test2", sep="_")
print(vdj.data.collect()[["sequence_id", "cell_id"]]), print(vdj.metadata_names)
shape: (5_767, 2)
┌─────────────────────────────────┬─────────────────────────────────┐
│ sequence_id ┆ cell_id │
│ --- ┆ --- │
│ str ┆ str │
╞═════════════════════════════════╪═════════════════════════════════╡
│ test2_sc5p_v2_hs_PBMC_10k_b_AA… ┆ test2_sc5p_v2_hs_PBMC_10k_b_AA… │
│ test2_sc5p_v2_hs_PBMC_10k_b_AA… ┆ test2_sc5p_v2_hs_PBMC_10k_b_AA… │
│ test2_sc5p_v2_hs_PBMC_10k_b_AA… ┆ test2_sc5p_v2_hs_PBMC_10k_b_AA… │
│ test2_sc5p_v2_hs_PBMC_10k_b_AA… ┆ test2_sc5p_v2_hs_PBMC_10k_b_AA… │
│ test2_sc5p_v2_hs_PBMC_10k_b_AA… ┆ test2_sc5p_v2_hs_PBMC_10k_b_AA… │
│ … ┆ … │
│ test2_vdj_v1_hs_pbmc3_b_TTTCCT… ┆ test2_vdj_v1_hs_pbmc3_b_TTTCCT… │
│ test2_vdj_v1_hs_pbmc3_b_TTTGCG… ┆ test2_vdj_v1_hs_pbmc3_b_TTTGCG… │
│ test2_vdj_v1_hs_pbmc3_b_TTTGCG… ┆ test2_vdj_v1_hs_pbmc3_b_TTTGCG… │
│ test2_vdj_v1_hs_pbmc3_b_TTTGGT… ┆ test2_vdj_v1_hs_pbmc3_b_TTTGGT… │
│ test2_vdj_v1_hs_pbmc3_b_TTTGGT… ┆ test2_vdj_v1_hs_pbmc3_b_TTTGGT… │
└─────────────────────────────────┴─────────────────────────────────┘
shape: (2_496,)
Series: 'cell_id' [str]
[
"test2_sc5p_v2_hs_PBMC_10k_b_AA…
"test2_sc5p_v2_hs_PBMC_10k_b_AA…
"test2_sc5p_v2_hs_PBMC_10k_b_AA…
"test2_sc5p_v2_hs_PBMC_10k_b_AA…
"test2_sc5p_v2_hs_PBMC_10k_b_AA…
…
"test2_vdj_v1_hs_pbmc3_b_TTTCCT…
"test2_vdj_v1_hs_pbmc3_b_TTTCCT…
"test2_vdj_v1_hs_pbmc3_b_TTTCCT…
"test2_vdj_v1_hs_pbmc3_b_TTTGCG…
"test2_vdj_v1_hs_pbmc3_b_TTTGGT…
]
[16]:
(None, None)
Simplifying the V/DJ/C call annotations
Sometimes the V/DJ/C call annotations can be quite verbose. You can simplify them with the .simplify() function. This function will remove the , and only keep the first element of the call, as well as stripping alleles. This is useful for when you want to simplify the V/DJ/C calls for plotting purposes.
[17]:
(
vdj.data.collect()[["v_call", "j_call"]],
vdj.metadata.collect()[["v_call_VDJ", "j_call_VDJ"]],
)
[17]:
(shape: (5_767, 2)
┌──────────────────────────┬────────────────────────────┐
│ v_call ┆ j_call │
│ --- ┆ --- │
│ str ┆ str │
╞══════════════════════════╪════════════════════════════╡
│ IGKV1-33*01,IGKV1D-33*01 ┆ IGKJ4*01 │
│ IGHV1-69*01,IGHV1-69D*01 ┆ IGHJ3*02 │
│ IGKV1-8*01 ┆ IGKJ1*01 │
│ IGLV5-45*02 ┆ IGLJ3*02 │
│ IGHV1-2*02 ┆ IGHJ3*02 │
│ … ┆ … │
│ IGHV1-46*01 ┆ IGHJ5*02 │
│ IGHV1-69*01,IGHV1-69D*01 ┆ IGHJ6*02 │
│ IGLV1-47*01 ┆ IGLJ3*02 │
│ IGLV2-11*01 ┆ IGLJ2*01,IGLJ3*01,IGLJ3*02 │
│ IGHV3-23*01,IGHV3-23D*01 ┆ IGHJ4*02 │
└──────────────────────────┴────────────────────────────┘,
shape: (2_496, 2)
┌──────────────────────────┬────────────┐
│ v_call_VDJ ┆ j_call_VDJ │
│ --- ┆ --- │
│ str ┆ str │
╞══════════════════════════╪════════════╡
│ null ┆ null │
│ IGHV1-69*01,IGHV1-69D*01 ┆ IGHJ3 │
│ IGHV1-2*02 ┆ IGHJ3 │
│ IGHV5-51*03 ┆ IGHJ3 │
│ IGHV4-4*07 ┆ IGHJ3 │
│ … ┆ … │
│ IGHV3-30*18 ┆ IGHJ6 │
│ IGHV4-61*12 ┆ IGHJ2 │
│ IGHV1-46*01 ┆ IGHJ5 │
│ IGHV1-69*01,IGHV1-69D*01 ┆ IGHJ6 │
│ IGHV3-23*01,IGHV3-23D*01 ┆ IGHJ4 │
└──────────────────────────┴────────────┘)
[18]:
# after
vdj.simplify()
(
vdj.data.collect()[["v_call", "j_call"]],
vdj.metadata.collect()[["v_call_VDJ", "j_call_VDJ"]],
)
[18]:
(shape: (5_767, 2)
┌──────────┬────────┐
│ v_call ┆ j_call │
│ --- ┆ --- │
│ str ┆ str │
╞══════════╪════════╡
│ IGKV1-33 ┆ IGKJ4 │
│ IGHV1-69 ┆ IGHJ3 │
│ IGKV1-8 ┆ IGKJ1 │
│ IGLV5-45 ┆ IGLJ3 │
│ IGHV1-2 ┆ IGHJ3 │
│ … ┆ … │
│ IGHV1-46 ┆ IGHJ5 │
│ IGHV1-69 ┆ IGHJ6 │
│ IGLV1-47 ┆ IGLJ3 │
│ IGLV2-11 ┆ IGLJ2 │
│ IGHV3-23 ┆ IGHJ4 │
└──────────┴────────┘,
shape: (2_496, 2)
┌────────────┬────────────┐
│ v_call_VDJ ┆ j_call_VDJ │
│ --- ┆ --- │
│ str ┆ str │
╞════════════╪════════════╡
│ null ┆ null │
│ IGHV1-69 ┆ IGHJ3 │
│ IGHV1-2 ┆ IGHJ3 │
│ IGHV5-51 ┆ IGHJ3 │
│ IGHV4-4 ┆ IGHJ3 │
│ … ┆ … │
│ IGHV3-30 ┆ IGHJ6 │
│ IGHV4-61 ┆ IGHJ2 │
│ IGHV1-46 ┆ IGHJ5 │
│ IGHV1-69 ┆ IGHJ6 │
│ IGHV3-23 ┆ IGHJ4 │
└────────────┴────────────┘)
concatenating multiple objects
This is a simple function to concatenate (append) two or more DandelionPolars class, or pandas dataframes. Note that this operates on the .data slot and not the .metadata slot.
[19]:
vdj
[19]:
Lazy Dandelion object with n_obs = 2496 and n_contigs = 5767
data: sequence_id, sequence, rev_comp, productive, v_call, d_call, j_call, sequence_alignment, germline_alignment, junction, junction_aa, v_cigar, d_cigar, j_cigar, stop_codon, vj_in_frame, locus, junction_length, np1_length, np2_length, v_sequence_start, v_sequence_end, v_germline_start, v_germline_end, d_sequence_start, d_sequence_end, d_germline_start, d_germline_end, j_sequence_start, j_sequence_end, j_germline_start, j_germline_end, v_score, v_identity, v_support, d_score, d_identity, d_support, j_score, j_identity, j_support, fwr1, fwr2, fwr3, fwr4, cdr1, cdr2, cdr3, cell_id, consensus_count, umi_count, v_call_10x, d_call_10x, j_call_10x, junction_10x, junction_10x_aa, j_call_blastn, j_identity_blastn, j_alignment_length_blastn, j_number_of_mismatches_blastn, j_number_of_gap_openings_blastn, j_sequence_start_blastn, j_sequence_end_blastn, j_germline_start_blastn, j_germline_end_blastn, j_support_blastn, j_score_blastn, j_sequence_alignment_blastn, j_germline_alignment_blastn, j_call_igblastn, j_source, j_support_igblastn, j_score_igblastn, d_call_blastn, d_identity_blastn, d_alignment_length_blastn, d_number_of_mismatches_blastn, d_number_of_gap_openings_blastn, d_sequence_start_blastn, d_sequence_end_blastn, d_germline_start_blastn, d_germline_end_blastn, d_support_blastn, d_score_blastn, d_sequence_alignment_blastn, d_germline_alignment_blastn, d_call_igblastn, d_source, d_support_igblastn, d_score_igblastn, v_call_genotyped, germline_alignment_d_mask, sample_id, c_call, c_sequence_alignment, c_germline_alignment, c_sequence_start, c_sequence_end, c_score, c_identity, c_call_10x, junction_aa_length, fwr1_aa, fwr2_aa, fwr3_aa, fwr4_aa, cdr1_aa, cdr2_aa, cdr3_aa, sequence_alignment_aa, v_sequence_alignment_aa, d_sequence_alignment_aa, j_sequence_alignment_aa, complete_vdj, j_call_multimappers, j_call_multiplicity, j_call_sequence_start_multimappers, j_call_sequence_end_multimappers, j_call_support_multimappers, mu_count, extra, ambiguous, rearrangement_status, clone_id
metadata: cell_id, clone_id, clone_id_rank, sample_id, productive_VDJ, productive_VJ, d_call_VDJ, j_call_VDJ, j_call_VJ, junction_VDJ, junction_VJ, junction_aa_VDJ, junction_aa_VJ, locus_VDJ, locus_VJ, v_call_VDJ, v_call_VJ, c_call_VDJ, c_call_VJ, umi_count_VDJ, umi_count_VJ, productive_VDJ_main, productive_VJ_main, d_call_VDJ_main, j_call_VDJ_main, j_call_VJ_main, junction_VDJ_main, junction_VJ_main, junction_aa_VDJ_main, junction_aa_VJ_main, locus_VDJ_main, locus_VJ_main, v_call_genotyped_VDJ_main, v_call_genotyped_VJ_main, c_call_VDJ_main, c_call_VJ_main, umi_count_VDJ_main, umi_count_VJ_main, isotype, isotype_main, isotype_status, locus_status, chain_status, rearrangement_status_VDJ, rearrangement_status_VJ
layout: layout for 2351 vertices, layout for 148 vertices
graph: networkx graph of 2351 vertices, networkx graph of 148 vertices
[20]:
# just simple concatenation x 3. check the difference between the cell and contig numbers between this object and just vdj
vdj_concat = ddl.tl.concat([vdj, vdj, vdj])
vdj_concat
[20]:
Lazy Dandelion object with n_obs = 2496 and n_contigs = 17301
data: sequence_id, sequence, rev_comp, productive, v_call, d_call, j_call, sequence_alignment, germline_alignment, junction, junction_aa, v_cigar, d_cigar, j_cigar, stop_codon, vj_in_frame, locus, junction_length, np1_length, np2_length, v_sequence_start, v_sequence_end, v_germline_start, v_germline_end, d_sequence_start, d_sequence_end, d_germline_start, d_germline_end, j_sequence_start, j_sequence_end, j_germline_start, j_germline_end, v_score, v_identity, v_support, d_score, d_identity, d_support, j_score, j_identity, j_support, fwr1, fwr2, fwr3, fwr4, cdr1, cdr2, cdr3, cell_id, consensus_count, umi_count, v_call_10x, d_call_10x, j_call_10x, junction_10x, junction_10x_aa, j_call_blastn, j_identity_blastn, j_alignment_length_blastn, j_number_of_mismatches_blastn, j_number_of_gap_openings_blastn, j_sequence_start_blastn, j_sequence_end_blastn, j_germline_start_blastn, j_germline_end_blastn, j_support_blastn, j_score_blastn, j_sequence_alignment_blastn, j_germline_alignment_blastn, j_call_igblastn, j_source, j_support_igblastn, j_score_igblastn, d_call_blastn, d_identity_blastn, d_alignment_length_blastn, d_number_of_mismatches_blastn, d_number_of_gap_openings_blastn, d_sequence_start_blastn, d_sequence_end_blastn, d_germline_start_blastn, d_germline_end_blastn, d_support_blastn, d_score_blastn, d_sequence_alignment_blastn, d_germline_alignment_blastn, d_call_igblastn, d_source, d_support_igblastn, d_score_igblastn, v_call_genotyped, germline_alignment_d_mask, sample_id, c_call, c_sequence_alignment, c_germline_alignment, c_sequence_start, c_sequence_end, c_score, c_identity, c_call_10x, junction_aa_length, fwr1_aa, fwr2_aa, fwr3_aa, fwr4_aa, cdr1_aa, cdr2_aa, cdr3_aa, sequence_alignment_aa, v_sequence_alignment_aa, d_sequence_alignment_aa, j_sequence_alignment_aa, complete_vdj, j_call_multimappers, j_call_multiplicity, j_call_sequence_start_multimappers, j_call_sequence_end_multimappers, j_call_support_multimappers, mu_count, extra, ambiguous, rearrangement_status, clone_id
metadata: cell_id, clone_id, clone_id_rank, sample_id, productive_VDJ, productive_VJ, d_call_VDJ, j_call_VDJ, j_call_VJ, junction_VDJ, junction_VJ, junction_aa_VDJ, junction_aa_VJ, locus_VDJ, locus_VJ, v_call_VDJ, v_call_VJ, c_call_VDJ, c_call_VJ, umi_count_VDJ, umi_count_VJ, productive_VDJ_main, productive_VJ_main, d_call_VDJ_main, j_call_VDJ_main, j_call_VJ_main, junction_VDJ_main, junction_VJ_main, junction_aa_VDJ_main, junction_aa_VJ_main, locus_VDJ_main, locus_VJ_main, v_call_genotyped_VDJ_main, v_call_genotyped_VJ_main, c_call_VDJ_main, c_call_VJ_main, umi_count_VDJ_main, umi_count_VJ_main, isotype, isotype_main, isotype_status, locus_status, chain_status, rearrangement_status_VDJ, rearrangement_status_VJ
[21]:
vdj_concat.data.collect()[["sequence_id", "cell_id"]].head()
[21]:
| sequence_id | cell_id |
|---|---|
| str | str |
| "sc5p_v2_hs_PBMC_10k_b_AAACCTGT… | "test2_sc5p_v2_hs_PBMC_10k_b_AA… |
| "sc5p_v2_hs_PBMC_10k_b_AAACCTGT… | "test2_sc5p_v2_hs_PBMC_10k_b_AA… |
| "sc5p_v2_hs_PBMC_10k_b_AAACCTGT… | "test2_sc5p_v2_hs_PBMC_10k_b_AA… |
| "sc5p_v2_hs_PBMC_10k_b_AAACCTGT… | "test2_sc5p_v2_hs_PBMC_10k_b_AA… |
| "sc5p_v2_hs_PBMC_10k_b_AAACCTGT… | "test2_sc5p_v2_hs_PBMC_10k_b_AA… |
ddl.concat also lets you add in your custom prefixes/suffixes to append to the sequence ids. If not provided, it will add -0, -1 etc. as a suffix if it detects that the sequence ids are not unique as seen above.
read/write
DandelionPolars supports multiple read/write formats. The primary format is .zipddl, which stores data as Parquet blobs inside a Zarr v3 ZipStore container with optional Blosc/Zstd compression. The legacy .h5ddl is still supported.