Skip to content

Precourse preparations

Course goal

This is course is designed in two practical parts. In the first part (Part 2 - single sample), you will identify key components of the Nextflow dataflow paradigm using a validated pipeline whose purpose is to demonstrate how processes are connected. For the second part (Part 3 - multi-sample), once you are able to establish how data is flowing, you’ll collect() the knowledge from Part 1 to extend the pipeline for multi-sample analysis.

By the end of the course, you will have constructed/understood a functional workflow implemented in Nextflow DSL2, using common features such as processes, channels, modules and configuration profiles. You will also have gained experience running the workflow in a controlled environment, and you will be equipped with the necessary information to execute the pipelines on a High Performance Computing (HPC) environment.

Background knowledge

This workshop assumes learners to have a basic understanding of working with the command line on UNIX-based systems and a GitHub Codespaces account.

UNIX

You can test your UNIX skills with a quiz here. If you don’t have experience with UNIX command line, or if you are unsure whether you meet the prerequisites, please follow this online UNIX tutorial.

Software

OS

This is an OS-agnostic course that requires from only to count with a laptop, a modern browser and a GitHub Codespaces account.

All the software needed in this workflow is either:

  • Already installed in a GitHub Codespaces environment.
  • Already available in Docker containers.
  • Will be installed via containers during today’s exercises.

All information of this course is based on the official Nextflow documentation and uses Nextflow DSL2 syntax.

Nextflow version

This tutorial uses a pinned version of Nextflow (25.10.4) with parser v2. This is important considering the rapid evolution of Nextflow, thus ensuring the proper execution of the pipeline.

GitHub Codespaces - Code editor

GitHub Codespaces is one of its kind nowadays services as there are really only a few alternative options to replace in case of any problem. It provides a complete self-contained execution environment and connected to an IDE for free! However, the resources are limited on the free tier we will be using for this course. Good news is that it should be sufficient for the purpose of the course, and in normal conditions no one would (hopefully) run out the resources allocated by Codespaces on the free tier.

You can start here:

Open in GitHub Codespaces

Open the link in a new tab

This link will open VS code on your browser, and hence it is expected that you are familiar with the layout and basic functionalities VS code has. Otherwise, please check this quick tutorial before the course to understand where everything is.

Setting GitHub Codespaces

More information about setting Codespaces on the Environment setup section.

VS code video tutorial

You can find a video tutorial to learn about VS code:

Alternative installations

You can install and execute the pipeline of this tutorial locally using your local VS Code; please follow the Local Devcontainers setup. If you wish to use an HPC cluster, you will find specific instructions on the HPC installation setup. On the other hand, an alternative online computing environment is available through CodeSandbox.

Pipeline-specific tools

The specific versions of the software used by TaxoFlow are detailed here

Website colour code explanation

We tried to use a colour code throughout the website to make the different pieces of information easily distinguishable. Here’s a quick summary about the colour blocks you will encounter:

This is a supplementary piece of information

This is a tip to help you advance with the course

This is the output on the console

This is a warning about a potential problem

These are directory contents

This is an explanation about a common bug/error