dandelion.base.io.read_seekgene_vdj

dandelion.base.io.read_seekgene_vdj(data, filename_prefix=None, prefix=None, suffix=None, sep='_', remove_malformed=True, remove_trailing_hyphen_number=False, verbose=False)[source]

A parser to read .csv and .json files directly from folder containing SeekGene VDJ outputs, or parse an existing pandas DataFrame.

SeekGene produces contig annotation files in the same format as 10x CellRanger VDJ output. This function is a convenience wrapper around read_10x_vdj() with SeekGene-specific naming for clarity.

Minimum requirement is one of either {filename_prefix}_contig_annotations.csv or all_contig_annotations.json when reading from a file path.

If .fasta, .json files are found in the same folder, additional info will be appended to the final table.

Parameters:
  • data (Path | str | pandas.DataFrame) – path to folder containing .csv and/or .json files, path to files directly, or a pandas DataFrame containing the contig annotations data.

  • filename_prefix (str | None, optional) – prefix of file name preceding ‘_contig’. None defaults to ‘all’. Only used when data is a file/folder.

  • prefix (str | None, optional) – Prefix to append to sequence_id and cell_id.

  • suffix (str | None, optional) – Suffix to append to sequence_id and cell_id.

  • sep (str, optional) – the separator to append suffix/prefix.

  • remove_malformed (bool, optional) – whether or not to remove malformed contigs.

  • remove_trailing_hyphen_number (bool, optional) – whether or not to remove the trailing hyphen number e.g. ‘-1’ from the cell/contig barcodes.

  • verbose (bool, optional) – whether or not to print messages during creation of the Dandelion object.

Returns:

Dandelion object holding the parsed data.

Return type:

Dandelion

Raises:
  • OSError – if contig_annotations.csv and all_contig_annotations.json file(s) not found in the input folder.

  • TypeError – if data is not a valid type (Path, str, or pandas.DataFrame).