TaxoFlow: The tutorial
-
Learn at your pace
Welcome to TaxoFlow!
You will find here a carefully designed step-by-step tutorial to learn how to wrap a metagenomics pipeline using Nextflow. You will find also plenty of resources to expand your skills.
Paper and citation
TaxoFlow is accompanied by a short paper where the learning goals, structure and scope are discussed. Please consider citing it if you find Taxoflow useful:
- Yepes-García, J.; Falquet, L. TaxoFlow: The Tutorial. An Educational Nextflow Pipeline for Metagenomics Taxonomic Profiling. Preprints 2025, 2025121989. https://doi.org/10.20944/preprints202512.1989.v1
-
Additional information
Version compatibility
This tutorial uses a pinned version of Nextflow (25.10.4) with parser v2. This is important considering the rapid evolution of Nextflow, thus ensuring the proper execution of the pipeline.
Environment options
We provide a web-based training environment where everything you need to take the training is preinstalled, available through Github Codespaces (requires a free GitHub account).
Please open the link in a new tab!
If this does not suit your needs, please see the other Environment options.
Complementary training
TaxoFlow is part of the streamed course Nextflow in Action Build Smarter, Faster, Reproducible Pipelines, managed by the Swiss Institute of Bioinformatics (SIB).
- The next version of this course is scheduled on Nov. 18th-19th, 2026. More information here.
Developers
Open-source license and contribution policy
This training material is developed and maintained by BUGFri and released under an open-source license (CC BY-NC-SA) for the benefit of the community.
We welcome improvements, fixes and bug reports from the community. Please refer to GitHub issue section, where you can report issues or propose changes to the training source material. See the
README.mdin the repository for more details.Credit to Nextflow training team
TaxoFlow is inspired by the training material developed by the Nextflow training team, and important sections as the Environment options tutorial are explictly taken from their repository. We hereby express our gratitude to them, particularly to Geraldine Van der Auwera for their valuable contribution to conceive the idea of the tutorial and for her insightful feedback to implement it.
This tutorial is designed for researchers on focused metagenomics (WGS/shotgun) data analysis who are interested in developing or customizing taxonomic annotation pipelines. It builds on the Hello Nextflow and Nextflow for RNAseq beginner training and demonstrates how to use Nextflow in the specific context of metagenomics data analysis.
Specifically, this course demonstrates how to implement a simple read taxonomic annotation, starting from removing host sequences, passing through re-estimating species abundance with Bayesian statistics, until generating complete reports.
Let’s get started! Click on the “Open in GitHub Codespaces” button below to launch the training environment (preferably in a separate tab), then read on while it loads.
Environment options
This tutorial is fully packed to be used on GitHub Codespaces. If you want to use it locally, on an HPC cluster or through CodeSandbox, please check the section Environment options.
Open the link in a new tab
Learning objectives
By the end of this course, you will have learnt how to apply foundational Nextflow concepts and tooling to a typical metagenomics use case.
Concretely, you will be able to:
- Write a linear workflow to perform host removal, taxonomic annotation and species abundance re-estimation.
- Handle domain-specific files such as Kraken2 and Bracken reports resources appropriately
- Run analysis for a single sample or leverage on Nextflow’s dataflow paradigm to parallelize multi-sample analysis.
- Separate the processes and workflow in a more structured manner attempting to a first step in following nf-core guidelines in terms of reproducibility, portability, modularity, scalibility and traceability.
- Use conditionals and operators to control workflow execution.
- Include custom scripts to be run within a given process.
Prerequisites
The course assumes some minimal familiarity with the following:
- Tools and file formats commonly used in this scientific domain. We recommend this Metagenomics data analysis tutorial to get acquainted with taxonomic classification of unassembled reads.
- Experience with the command line. We recommend this online UNIX tutorial.
- Foundational Nextflow concepts and tooling covered in the Hello Nextflow and Nextflow for RNAseq beginner training. We also recommend the SIB course Nextflow in Action Build Smarter, Faster, Reproducible Pipelines
- Familiarity with VS code. We recommend this tutorial to get started.
More information about prerequisites and precourse arrangements can be found here.
For technical requirements and environment setup, see the Environment Setup directions.