dandelion.preprocessing.create_germlines
- dandelion.preprocessing.create_germlines(vdj_data, germline=None, org='human', db='imgt', strain=None, genotyped_fasta=None, additional_args=[], save=None)[source]
Run CreateGermlines.py to reconstruct the germline V(D)J sequence.
- Parameters:
vdj_data (Dandelion | pd.DataFrame | str) – Dandelion object, pandas DataFrame in changeo/airr format, or file path to changeo/airr file after clones have been determined.
germline (str | None, optional) – path to germline database folder. None defaults to environmental variable.
org (Literal[“human”, “mouse”], optional) – organism of germline database.
db (Literal[“imgt”, “ogrdb”], optional) – imgt or ogrdb reference database.
strain (Literal[“c57bl6”, “balbc”, “129S1_SvImJ”, “AKR_J”, “A_J”, “BALB_c_ByJ”, “BALB_c”, “C3H_HeJ”, “C57BL_6J”, “C57BL_6”, “CAST_EiJ”, “CBA_J”, “DBA_1J”, “DBA_2J”, “LEWES_EiJ”, “MRL_MpJ”, “MSM_MsJ”, “NOD_ShiLtJ”, “NOR_LtJ”, “NZB_BlNJ”, “PWD_PhJ”, “SJL_J”] | None, optional) – strain of mouse to use for germline sequences. Only for db=”ogrdb”. Note that only “c57bl6”, “balbc”, “CAST_EiJ”, “LEWES_EiJ”, “MSM_MsJ”, “NOD_ShiLt_J” and “PWD_PhJ” contains both heavy chain and light chain germline sequences as a set. The rest will not allow igblastn and MakeDB.py to generate a successful airr table (check the failed file). “c57bl6” and “balbc” are merged databases of “C57BL_6” with “C57BL_6J” and “BALB_c” with “BALB_c_ByJ” respectively. None defaults to all combined.
genotyped_fasta (str | None, optional) – location to corrected v genotyped fasta file.
additional_args (list[str], optional) – additional arguments to pass to CreateGermlines.py.
save (str | None, optional) – if provided, saves to specified file path.
- Returns:
Dandelion object with .germlines slot populated.
- Return type: