Skip to content

TaxoFlow: The tutorial

  • Learn at your pace


    Welcome to TaxoFlow!

    You will find here a carefully designed step-by-step tutorial to learn how to wrap a metagenomics pipeline using Nextflow. You will find also plenty of resources to expand your skills.

    Workflow

    Paper and citation

    TaxoFlow is accompanied by a short paper where the learning goals, structure and scope are discussed. Please consider citing it if you find Taxoflow useful:

  • Additional information


    Version compatibility

    This tutorial uses a pinned version of Nextflow (25.10.4) with parser v2. This is important considering the rapid evolution of Nextflow, thus ensuring the proper execution of the pipeline.

    Environment options

    We provide a web-based training environment where everything you need to take the training is preinstalled, available through Github Codespaces (requires a free GitHub account).

    Open in GitHub Codespaces

    Please open the link in a new tab!

    If this does not suit your needs, please see the other Environment options.

    Complementary training

    TaxoFlow is part of the streamed course Nextflow in Action Build Smarter, Faster, Reproducible Pipelines, managed by the Swiss Institute of Bioinformatics (SIB).

    • The next version of this course is scheduled on Nov. 18th-19th, 2026. More information here.
    Developers
    Open-source license and contribution policy

    Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)

    This training material is developed and maintained by BUGFri and released under an open-source license (CC BY-NC-SA) for the benefit of the community.

    We welcome improvements, fixes and bug reports from the community. Please refer to GitHub issue section, where you can report issues or propose changes to the training source material. See the README.md in the repository for more details.

    Credit to Nextflow training team

    TaxoFlow is inspired by the training material developed by the Nextflow training team, and important sections as the Environment options tutorial are explictly taken from their repository. We hereby express our gratitude to them, particularly to Geraldine Van der Auwera for their valuable contribution to conceive the idea of the tutorial and for her insightful feedback to implement it.

This tutorial is designed for researchers on focused metagenomics (WGS/shotgun) data analysis who are interested in developing or customizing taxonomic annotation pipelines. It builds on the Hello Nextflow and Nextflow for RNAseq beginner training and demonstrates how to use Nextflow in the specific context of metagenomics data analysis.

Specifically, this course demonstrates how to implement a simple read taxonomic annotation, starting from removing host sequences, passing through re-estimating species abundance with Bayesian statistics, until generating complete reports.

Let’s get started! Click on the “Open in GitHub Codespaces” button below to launch the training environment (preferably in a separate tab), then read on while it loads.

Environment options

This tutorial is fully packed to be used on GitHub Codespaces. If you want to use it locally, on an HPC cluster or through CodeSandbox, please check the section Environment options.

Open in GitHub Codespaces

Open the link in a new tab

Learning objectives

By the end of this course, you will have learnt how to apply foundational Nextflow concepts and tooling to a typical metagenomics use case.

Concretely, you will be able to:

  • Write a linear workflow to perform host removal, taxonomic annotation and species abundance re-estimation.
  • Handle domain-specific files such as Kraken2 and Bracken reports resources appropriately
  • Run analysis for a single sample or leverage on Nextflow’s dataflow paradigm to parallelize multi-sample analysis.
  • Separate the processes and workflow in a more structured manner attempting to a first step in following nf-core guidelines in terms of reproducibility, portability, modularity, scalibility and traceability.
  • Use conditionals and operators to control workflow execution.
  • Include custom scripts to be run within a given process.

Prerequisites

The course assumes some minimal familiarity with the following:

More information about prerequisites and precourse arrangements can be found here.

For technical requirements and environment setup, see the Environment Setup directions.