Skip to content

4. Configuration

Håkon Kaspersen edited this page Oct 23, 2022 · 2 revisions

The configuration file defines the resources each process in the pipeline require. Phylogenetic analysis is resource-intensive, and it is generally recommended to run the pipeline on a HPC system due to the CPU and memory requirements. However, smaller datasets may be run on local computers.

Base configuration

A base configuration file can be found in the conf directory in the main repository, called base.config. This configuration will execute the pipeline in local mode, and is a baseline configuration that can be used on a local computer. However, it will likely not provide the resources necessary for larger datasets.

HPC configuration

The hpc.config is an example configuration for use on HPC systems, where the resources requested reflect larger datasets. The configuration is set to run with SLURM as the executor, but this can be changed by the user to reflect their system's executor (see the Nextflow documentation for more information on executors). Please note that the hpc configuration has to be edited to reflect your own system before it can be used.

Process resource definitions

Each process with a label has a specific resource requirement. These are usually needed when running bigger datasets. The label is defined in the config file. For example, the label process_high_memory_time is used for the most resource-demanding processes, and asks for 200 GB memory and 48 hours of time (this is used for queue systems such as SLURM).

	withLabel: process_high_memory_time {
                clusterOptions  = '--job-name=nxf --account=<account> --partition=bigmem'
		memory          = 200.GB
		time		= { params.time_multiplier * 48.h * task.attempt }
        }

The --account=<account> and --partition=bigmem may be specific for each HPC system, and users need to change these values to reflect their own systems.

Clone this wiki locally