TAXAPUS - your eight-armed taxonomy assistant
main @ 0d8e125

Workflow Type: Nextflow

qcif/taxapus is a modular, reproducible Nextflow workflow for the conservative taxonomy assignment to DNA sequences, designed for high-confidence, auditable results in biosecurity and biodiversity contexts. The workflow integrates multiple bioinformatics tools and databases, automates best-practice analysis steps, and produces detailed reports with supporting evidence for each taxonomic assignment.

Workflow Overview

The pipeline orchestrates a series of analytical steps, each encapsulated in a dedicated module or subworkflow. The main stages are:

  1. Environment Configuration Sets up environment variables and paths required for downstream processes, ensuring reproducibility and portability.

  2. Input Validation Checks the integrity and compatibility of input files (FASTA sequences, metadata, databases), preventing downstream errors.

  3. Sequence Search

    • BLAST Core Nucleotide Database (BLASTN): Queries input sequences against the NCBI nucleotide database using BLASTN.
    • BOLD v4 (API): Queries input sequences against the Barcode of Life Data Systems. Taxonomic lineage included in the results.
  4. Hit Extraction Parses BLAST results to extract relevant hits for each query.

  5. Taxonomic ID Extraction Retrieves taxonomic IDs for BLAST hits.

  6. Taxonomic Lineage Extraction Maps taxonomic IDs to full lineages, enabling downstream filtering and reporting.

  7. Candidate Extraction Identifies candidate species for each query, applying user-defined thresholds for identity and coverage.

  8. Supporting Evidence Evaluation

    • Publications Diversity: Assesses the diversity of data sources supporting each candidate.
    • Database Coverage: Evaluates the representation of candidates in global databases (GBIF, GenBank, BOLD).
  9. Multiple Sequence Alignment (MAFFT) Aligns candidate and query sequences to prepare for phylogenetic analysis.

  10. Phylogenetic Tree Construction (FastMe) Builds a phylogenetic tree to visualise relationships among candidates and queries.

  11. Comprehensive Reporting Generates detailed HTML and text reports, including sequence alignments, phylogenetic trees, database coverage, and all supporting evidence for each assignment.

Version History

main @ 0d8e125 (earliest) Created 1st Jul 2025 at 06:23 by Magdalena Antczak

corrections


Frozen main 0d8e125
help Creators and Submitter
Creators
  • Magdalena Antczak
  • Cameron Hyde
  • Lanxi (Daisy) Li
  • Valentine Murigneux
  • Sarah Williams
  • Michael Thang
  • Bradley Pease
  • Shaun Bochow
  • Grace Sun
Submitter
Citation
Antczak, M., Hyde, C., Li, Lanxi (Daisy), Murigneux, V., Williams, S., Thang, M., Pease, B., Bochow, S., & Sun, G. (2025). TAXAPUS - your eight-armed taxonomy assistant. WorkflowHub. https://doi.org/10.48546/WORKFLOWHUB.WORKFLOW.1782.1
License
Activity

Views: 57   Downloads: 8

Created: 1st Jul 2025 at 06:23

Last updated: 1st Jul 2025 at 07:31

help Tags

This item has not yet been tagged.

help Attributions

None

Total size: 1.69 MB
Powered by
(v.1.17.0-main)
Copyright © 2008 - 2025 The University of Manchester and HITS gGmbH