dandelion.polars.preprocessing.reassign_alleles

dandelion.polars.preprocessing.reassign_alleles(data, combined_folder, v_germline=None, germline=None, org='human', db='imgt', strain=None, novel=True, plot=True, save_plot=False, show_plot=True, figsize=(4, 3), sample_id_dictionary=None, filename_prefix=None, additional_args={'creategermlines': [], 'tigger': []})[source]

Correct allele calls based on a personalized genotype using tigger.

It uses a subject-specific genotype to correct correct preliminary allele assignments of a set of sequences derived from a single subject.

Parameters:
  • data (list[str]) – list of data folders containing the .tsv files. if provided as a single string, it will first be converted to a list; this allows for the function to be run on single/multiple samples.

  • combined_folder (str) – name of folder for concatenated data file and genotyped files.

  • v_germline (str | None, optional) – path to heavy chain v germline fasta. Defaults to IGHV fasta in $GERMLINE environmental variable.

  • germline (str | None, optional) – path to germline database folder. None defaults to GERMLINE environmental variable.

  • org (Literal[“human”, “mouse”], optional) – organism of germline database.

  • db (Literal[“imgt”, “ogrdb”], optional) – database to use for germline sequences.

  • strain (Literal[“c57bl6”, “balbc”, “129S1_SvImJ”, “AKR_J”, “A_J”, “BALB_c_ByJ”, “BALB_c”, “C3H_HeJ”, “C57BL_6J”, “C57BL_6”, “CAST_EiJ”, “CBA_J”, “DBA_1J”, “DBA_2J”, “LEWES_EiJ”, “MRL_MpJ”, “MSM_MsJ”, “NOD_ShiLtJ”, “NOR_LtJ”, “NZB_BlNJ”, “PWD_PhJ”, “SJL_J”] | None, optional) – strain of mouse to use for germline sequences. Only for db=”ogrdb”. Note that only “c57bl6”, “balbc”, “CAST_EiJ”, “LEWES_EiJ”, “MSM_MsJ”, “NOD_ShiLt_J” and “PWD_PhJ” contains both heavy chain and light chain germline sequences as a set. The rest will not allow igblastn and MakeDB.py to generate a successful airr table (check the failed file). “c57bl6” and “balbc” are merged databases of “C57BL_6” with “C57BL_6J” and “BALB_c” with “BALB_c_ByJ” respectively. None defaults to all combined.

  • novel (bool, optional) – whether or not to run novel allele discovery during tigger-genotyping.

  • plot (bool, optional) – whether or not to plot reassignment summary metrics.

  • save_plot (bool, optional) – whether or not to save plot.

  • show_plot (bool, optional) – whether or not to show plot.

  • figsize (tuple[float, float], optional) – size of figure.

  • sample_id_dictionary (dict[str, str] | None, optional) – dictionary for creating a sample_id column in the concatenated file.

  • filename_prefix (list[str] | str | None, optional) – list of prefixes of file names preceding ‘_contig’. None defaults to ‘all’.

  • additional_args (dict[str, list[str]], optional) – additional arguments to pass to tigger-genotype.R and CreateGermlines.py. This accepts a dictionary with keys as the name of the sub-function (tigger or creategermlines) and the records as lists of arguments to pass to the relevant scripts/tools.

Raises:

FileNotFoundError – if reannotated file is not found.