Precourse preparations
Course goal
This is course is designed in two practical parts. In the first part (Part 2 - single sample), you will identify key components of the Nextflow dataflow paradigm using a validated pipeline whose purpose is to demonstrate how processes are connected. For the second part (Part 3 - multi-sample), once you are able to establish how data is flowing, you’ll collect() the knowledge from Part 1 to extend the pipeline for multi-sample analysis.
By the end of the course, you will have constructed/understood a functional workflow implemented in Nextflow DSL2, using common features such as processes, channels, modules and configuration profiles. You will also have gained experience running the workflow in a controlled environment, and you will be equipped with the necessary information to execute the pipelines on a High Performance Computing (HPC) environment.
Background knowledge
This workshop assumes learners to have a basic understanding of working with the command line on UNIX-based systems and a GitHub Codespaces account.
UNIX
You can test your UNIX skills with a quiz here. If you don’t have experience with UNIX command line, or if you are unsure whether you meet the prerequisites, please follow this online UNIX tutorial.
Software
OS
This is an OS-agnostic course that requires from only to count with a laptop, a modern browser and a GitHub Codespaces account.
All the software needed in this workflow is either:
- Already installed in a GitHub Codespaces environment.
- Already available in Docker containers.
- Will be installed via containers during today’s exercises.
All information of this course is based on the official Nextflow documentation and uses Nextflow DSL2 syntax.
Nextflow version
This tutorial uses a pinned version of Nextflow (25.10.4) with parser v2. This is important considering the rapid evolution of Nextflow, thus ensuring the proper execution of the pipeline.
GitHub Codespaces - Code editor
GitHub Codespaces is one of its kind nowadays services as there are really only a few alternative options to replace in case of any problem. It provides a complete self-contained execution environment and connected to an IDE for free! However, the resources are limited on the free tier we will be using for this course. Good news is that it should be sufficient for the purpose of the course, and in normal conditions no one would (hopefully) run out the resources allocated by Codespaces on the free tier.
You can start here:
Open the link in a new tab
This link will open VS code on your browser, and hence it is expected that you are familiar with the layout and basic functionalities VS code has. Otherwise, please check this quick tutorial before the course to understand where everything is.
Setting GitHub Codespaces
More information about setting Codespaces on the Environment setup section.
VS code video tutorial
You can find a video tutorial to learn about VS code:
Alternative installations
You can install and execute the pipeline of this tutorial locally using your local VS Code; please follow the Local Devcontainers setup. If you wish to use an HPC cluster, you will find specific instructions on the HPC installation setup. On the other hand, an alternative online computing environment is available through CodeSandbox.
Pipeline-specific tools
The specific versions of the software used by TaxoFlow are detailed here
Website colour code explanation
We tried to use a colour code throughout the website to make the different pieces of information easily distinguishable. Here’s a quick summary about the colour blocks you will encounter:
This is a supplementary piece of information
This is a tip to help you advance with the course
This is the output on the console
This is a warning about a potential problem
These are directory contents
This is an explanation about a common bug/error