dandelion.base.preprocessing.reassign_alleles
- dandelion.base.preprocessing.reassign_alleles(data, combined_folder, v_germline=None, germline=None, org='human', db='imgt', strain=None, novel=True, plot=True, save_plot=False, show_plot=True, figsize=(4, 3), sample_id_dictionary=None, filename_prefix=None, additional_args={'creategermlines': [], 'tigger': []})[source]
Correct allele calls based on a personalized genotype using tigger.
It uses a subject-specific genotype to correct correct preliminary allele assignments of a set of sequences derived from a single subject.
- Parameters:
data (list[str]) – list of data folders containing the .tsv files. if provided as a single string, it will first be converted to a list; this allows for the function to be run on single/multiple samples.
combined_folder (str) – name of folder for concatenated data file and genotyped files.
v_germline (str | None, optional) – path to heavy chain v germline fasta. Defaults to IGHV fasta in $GERMLINE environmental variable.
germline (str | None, optional) – path to germline database folder. None defaults to GERMLINE environmental variable.
org (Literal[“human”, “mouse”], optional) – organism of germline database.
db (Literal[“imgt”, “ogrdb”], optional) – database to use for germline sequences.
strain (Literal[“c57bl6”, “balbc”, “129S1_SvImJ”, “AKR_J”, “A_J”, “BALB_c_ByJ”, “BALB_c”, “C3H_HeJ”, “C57BL_6J”, “C57BL_6”, “CAST_EiJ”, “CBA_J”, “DBA_1J”, “DBA_2J”, “LEWES_EiJ”, “MRL_MpJ”, “MSM_MsJ”, “NOD_ShiLtJ”, “NOR_LtJ”, “NZB_BlNJ”, “PWD_PhJ”, “SJL_J”] | None, optional) – strain of mouse to use for germline sequences. Only for db=”ogrdb”. Note that only “c57bl6”, “balbc”, “CAST_EiJ”, “LEWES_EiJ”, “MSM_MsJ”, “NOD_ShiLt_J” and “PWD_PhJ” contains both heavy chain and light chain germline sequences as a set. The rest will not allow igblastn and MakeDB.py to generate a successful airr table (check the failed file). “c57bl6” and “balbc” are merged databases of “C57BL_6” with “C57BL_6J” and “BALB_c” with “BALB_c_ByJ” respectively. None defaults to all combined.
novel (bool, optional) – whether or not to run novel allele discovery during tigger-genotyping.
plot (bool, optional) – whether or not to plot reassignment summary metrics.
save_plot (bool, optional) – whether or not to save plot.
show_plot (bool, optional) – whether or not to show plot.
figsize (tuple[float, float], optional) – size of figure.
sample_id_dictionary (dict[str, str] | None, optional) – dictionary for creating a sample_id column in the concatenated file.
filename_prefix (list[str] | str | None, optional) – list of prefixes of file names preceding ‘_contig’. None defaults to ‘all’.
additional_args (dict[str, list[str]], optional) – additional arguments to pass to tigger-genotype.R and CreateGermlines.py. This accepts a dictionary with keys as the name of the sub-function (tigger or creategermlines) and the records as lists of arguments to pass to the relevant scripts/tools.
- Raises:
FileNotFoundError – if reannotated file is not found.