-
Notifications
You must be signed in to change notification settings - Fork 3
4. Configuration
The configuration file defines the resources each process in the pipeline require. Phylogenetic analysis is resource-intensive, and it is generally recommended to run the pipeline on a HPC system due to the CPU and memory requirements. However, smaller datasets may be run on local computers.
A base configuration file can be found in the conf
directory in the main repository, called base.config
. This configuration will execute the pipeline in local mode, and is a baseline configuration that can be used on a local computer. However, it will likely not provide the resources necessary for larger datasets.
The hpc.config
is an example configuration for use on HPC systems, where the resources requested reflect larger datasets. The configuration is set to run with SLURM as the executor, but this can be changed by the user to reflect their system's executor (see the Nextflow documentation for more information on executors). Please note that the hpc
configuration has to be edited to reflect your own system before it can be used.
Each process with a label
has a specific resource requirement. These are usually needed when running bigger datasets. The label is defined in the config
file. For example, the label process_high_memory_time
is used for the most resource-demanding processes, and asks for 200 GB memory and 48 hours of time (this is used for queue systems such as SLURM).
withLabel: process_high_memory_time {
clusterOptions = '--job-name=nxf --account=<account> --partition=bigmem'
memory = 200.GB
time = { params.time_multiplier * 48.h * task.attempt }
}
The --account=<account>
and --partition=bigmem
may be specific for each HPC system, and users need to change these values to reflect their own systems.