add nextflow d30e48d

2026-04-29 23:01:54 +02:00
parent d0b12d668d
commit 97cc9058d3
2840 changed files with 730250 additions and 0 deletions
--- a/nextflow/docs/executor.md
+++ b/nextflow/docs/executor.md
@@ -0,0 +1,598 @@
+(executor-page)=
+
+# Executors
+
+In the Nextflow framework architecture, the *executor* is the component that determines the system where a pipeline process is run and supervises its execution.
+
+The executor provides an abstraction between the pipeline processes and the underlying execution system. This allows you to write the pipeline functional logic independently from the actual processing platform.
+
+In other words, you can write your pipeline script once and have it running on your computer, a cluster resource manager, or the cloud — simply change the executor definition in the Nextflow configuration file.
+
+(awsbatch-executor)=
+
+## AWS Batch
+
+Nextflow supports the [AWS Batch](https://aws.amazon.com/batch/) service that allows job submission in the cloud without having to spin out and manage a cluster of virtual machines. AWS Batch uses Docker containers to run tasks, which greatly simplifies pipeline deployment.
+
+The pipeline processes must specify the Docker image to use by defining the `container` directive, either in the pipeline script or the `nextflow.config` file.
+
+To enable this executor, set `process.executor = 'awsbatch'` in the `nextflow.config` file.
+
+The pipeline can be launched either on a local computer, or an EC2 instance. EC2 is suggested for heavy or long-running workloads. Additionally, an S3 bucket must be used as the pipeline work directory.
+
+Resource requests and other job characteristics can be controlled via the following process directives:
+
+- {ref}`process-accelerator`
+- {ref}`process-arch` (only when using Fargate platform type for AWS Batch)
+- {ref}`process-container`
+- {ref}`process-containerOptions`
+- {ref}`process-cpus`
+- {ref}`process-disk` (only when using Fargate platform type for AWS Batch)
+- {ref}`process-memory`
+- {ref}`process-queue`
+- {ref}`process-resourcelabels`
+- {ref}`process-time`
+
+The following {ref}`hints <process-hints>` are supported:
+
+- `consumableResources`: Specify [AWS Batch consumable resources](https://docs.aws.amazon.com/batch/latest/userguide/resource-aware-scheduling.html) as a list of name-value pairs. For example:
+
+  ```nextflow
+  hints consumableResources: ['my-license-a': 1, 'my-license-b': 2]
+  ```
+
+See {ref}`aws-batch` for more information.
+
+(azurebatch-executor)=
+
+## Azure Batch
+
+Nextflow supports the [Azure Batch](https://azure.microsoft.com/en-us/services/batch/) service that allows job submission in the cloud without having to spin out and manage a cluster of virtual machines. Azure Batch uses Docker containers to run tasks, which greatly simplifies pipeline deployment.
+
+The pipeline processes must specify the Docker image to use by defining the `container` directive, either in the pipeline script or the `nextflow.config` file.
+
+To enable this executor, set `process.executor = 'azurebatch'` in the `nextflow.config` file.
+
+The pipeline can be launched either on a local computer, or a cloud virtual machine. The cloud VM is suggested for heavy or long-running workloads. Additionally, an Azure Blob storage container must be used as the pipeline work directory.
+
+Resource requests and other job characteristics can be controlled via the following process directives:
+
+- {ref}`process-container`
+- {ref}`process-containerOptions`
+- {ref}`process-cpus`
+- {ref}`process-disk`
+- {ref}`process-machineType`
+- {ref}`process-memory`
+- {ref}`process-queue`
+- {ref}`process-resourcelabels`
+- {ref}`process-time`
+
+See {ref}`azure-batch` for more information.
+
+(bridge-executor)=
+
+## Bridge
+
+:::{versionadded} 22.09.1-edge
+:::
+
+[Bridge](https://github.com/cea-hpc/bridge) is an abstraction layer to ease batch system and resource manager usage in heterogeneous HPC environments.
+
+It is open source software that can be installed on top of existing classical job schedulers such as Slurm, LSF, or other schedulers. Bridge allows you to submit jobs, get information on running jobs, stop jobs, get information on the cluster system, etc.
+
+For more details on how to install the Bridge system, see the [documentation](https://github.com/cea-hpc/bridge).
+
+To enable the Bridge executor, set `process.executor = 'bridge'` in the `nextflow.config` file.
+
+Resource requests and other job characteristics can be controlled via the following process directives:
+
+- {ref}`process-clusterOptions`
+- {ref}`process-cpus`
+- {ref}`process-memory`
+- {ref}`process-queue`
+- {ref}`process-time`
+
+(flux-executor)=
+
+## Flux Executor
+
+:::{versionadded} 22.11.0-edge
+:::
+
+The `flux` executor allows you to run your pipeline script using the [Flux Framework](https://flux-framework.org).
+
+Nextflow submits each process to the cluster as a separate job using the `flux submit` command.
+
+To enable the Flux executor, set `process.executor = 'flux'` in the `nextflow.config` file.
+
+Resource requests and other job characteristics can be controlled via the following process directives:
+
+- {ref}`process-clusterOptions`
+- {ref}`process-cpus`
+- {ref}`process-queue`
+- {ref}`process-time`
+
+:::{note}
+Flux does not support the `memory` directive.
+:::
+
+:::{note}
+By default, Flux will send all output to the `.command.log` file. To send this output to stdout and stderr instead, set `flux.terminalOutput = true` in your config file.
+:::
+
+(google-batch-executor)=
+
+## Google Cloud Batch
+
+:::{versionadded} 22.07.1-edge
+:::
+
+[Google Cloud Batch](https://cloud.google.com/batch) is a managed computing service that allows the execution of containerized workloads in the Google Cloud Platform infrastructure.
+
+Nextflow provides built-in support for the Cloud Batch API, which allows the seamless deployment of Nextflow pipelines in the cloud, offloading the pipeline process executions.
+
+The pipeline processes must specify the Docker image to use by defining the `container` directive, either in the pipeline script or the `nextflow.config` file. Additionally, the pipeline work directory must be located in a Google Storage bucket.
+
+To enable this executor, set `process.executor = 'google-batch'` in the `nextflow.config` file.
+
+Resource requests and other job characteristics can be controlled via the following process directives:
+
+- {ref}`process-accelerator`
+- {ref}`process-container`
+- {ref}`process-containerOptions`
+- {ref}`process-cpus`
+- {ref}`process-disk`
+- {ref}`process-machineType`
+- {ref}`process-memory`
+- {ref}`process-resourcelabels`
+- {ref}`process-time`
+
+See the {ref}`Google Cloud Batch <google-batch>` page for further configuration details.
+
+(htcondor-executor)=
+
+## HTCondor
+
+:::{warning} *Experimental: may change in a future release.*
+:::
+
+The `condor` executor allows you to run your pipeline script by using the [HTCondor](https://research.cs.wisc.edu/htcondor/) resource manager.
+
+Nextflow manages each process as a separate job that is submitted to the cluster using the `condor_submit` command.
+
+The pipeline must be launched from a node where the `condor_submit` command is available, which is typically the cluster login node.
+
+:::{note}
+The HTCondor executor for Nextflow does not currently support HTCondor's ability to transfer input/output data to the corresponding job's compute node. Therefore, the data must be made accessible to the compute nodes through a shared file system directory from where the Nextflow workflow is executed (or specified via the `-w` option).
+:::
+
+To enable the HTCondor executor, set `process.executor = 'condor'` in the `nextflow.config` file.
+
+Resource requests and other job characteristics can be controlled via the following process directives:
+
+- {ref}`process-clusterOptions`
+- {ref}`process-cpus`
+- {ref}`process-disk`
+- {ref}`process-memory`
+- {ref}`process-time`
+
+(hyperqueue-executor)=
+
+## HyperQueue
+
+:::{versionadded} 22.05.0-edge
+:::
+
+:::{versionchanged} 24.06.0-edge
+HyperQueue 0.17.0 or later is required.
+:::
+
+:::{versionchanged} 25.01.0-edge
+HyperQueue 0.20.0 or later is required.
+:::
+
+The `hyperqueue` executor allows you to run your pipeline script by using the [HyperQueue](https://github.com/It4innovations/hyperqueue) job scheduler.
+
+Nextflow manages each process as a separate job that is submitted to the cluster using the `hq` command line tool.
+
+The pipeline must be launched from a node where the `hq` command is available, which is typically the cluster login node.
+
+To enable the HyperQueue executor, set `process.executor = 'hq'` in the `nextflow.config` file.
+
+Resource requests and other job characteristics can be controlled via the following process directives:
+
+- {ref}`process-accelerator`
+- {ref}`process-clusterOptions`
+- {ref}`process-cpus`
+- {ref}`process-memory`
+- {ref}`process-time`
+
+(k8s-executor)=
+
+## Kubernetes
+
+The `k8s` executor allows you to run a pipeline on a [Kubernetes](http://kubernetes.io/) cluster.
+
+Resource requests and other job characteristics can be controlled via the following process directives:
+
+- {ref}`process-accelerator`
+- {ref}`process-cpus`
+- {ref}`process-disk`
+- {ref}`process-memory`
+- {ref}`process-pod`
+- {ref}`process-resourcelabels`
+- {ref}`process-time`
+
+See the {ref}`Kubernetes <k8s-page>` page to learn how to set up a Kubernetes cluster to run Nextflow pipelines.
+
+(local-executor)=
+
+## Local
+
+The `local` executor is used by default. It runs the pipeline processes on the computer where Nextflow is launched. The processes are parallelized by spawning multiple threads, taking advantage of the multi-core architecture of the CPU.
+
+The `local` executor is useful for developing and testing a pipeline script on your computer, before switching to a cluster or cloud environment with production data.
+
+Resource requests and other job characteristics can be controlled via the following process directives:
+
+- {ref}`process-cpus`
+- {ref}`process-memory`
+- {ref}`process-time`
+- {ref}`process-container`
+- {ref}`process-containerOptions`
+
+:::{note}
+While the `local` executor limits the number of concurrent tasks based on requested vs available resources, it does not enforce task resource requests. In other words, it is possible for a local task to use more CPUs and memory than it requested, in which case it may starve other tasks. An exception to this behavior is when using {ref}`container-docker` or {ref}`container-podman` containers, in which case the resource requests are enforced by the container runtime.
+:::
+
+The local executor supports two types of tasks:
+- Script tasks (processes with a `script` or `shell` block) - executed via a Bash wrapper
+- Native tasks (processes with an `exec` block) - executed directly in the JVM.
+
+(lsf-executor)=
+
+## LSF
+
+The `lsf` executor allows you to run your pipeline script using a [Platform LSF](http://en.wikipedia.org/wiki/Platform_LSF) cluster.
+
+Nextflow manages each process as a separate job that is submitted to the cluster using the `bsub` command.
+
+The pipeline must be launched from a node where the `bsub` command is available, which is typically the cluster login node.
+
+To enable the LSF executor, set `process.executor = 'lsf'` in the `nextflow.config` file.
+
+Resource requests and other job characteristics can be controlled via the following process directives:
+
+- {ref}`process-clusterOptions`
+- {ref}`process-cpus`
+- {ref}`process-memory`
+- {ref}`process-queue`
+- {ref}`process-time`
+
+:::{note}
+LSF supports both *per-core* and *per-job* memory limits. Nextflow assumes that LSF works in the *per-core* mode, thus it divides the requested {ref}`process-memory` by the number of requested {ref}`process-cpus`.
+
+When LSF is configured to work in the *per-job* memory limit mode, you must specify this limit with the `perJobMemLimit` option in the {ref}`config-executor` scope of your Nextflow config file.
+
+See also the [Platform LSF documentation](https://www.ibm.com/support/knowledgecenter/SSETD4_9.1.3/lsf_config_ref/lsf.conf.lsb_job_memlimit.5.dita).
+:::
+
+(moab-executor)=
+
+## Moab
+
+:::{versionadded} 19.07.0
+:::
+
+:::{warning} *Experimental: may change in a future release.*
+:::
+
+The `moab` executor allows you to run your pipeline script using the [Moab](https://en.wikipedia.org/wiki/Moab_Cluster_Suite) resource manager by [Adaptive Computing](http://www.adaptivecomputing.com/).
+
+Nextflow manages each process as a separate job that is submitted to the cluster using the `msub` command provided by the resource manager.
+
+The pipeline must be launched from a node where the `msub` command is available, which is typically the cluster login node.
+
+To enable the `Moab` executor, set `process.executor = 'moab'` in the `nextflow.config` file.
+
+Resource requests and other job characteristics can be controlled via the following process directives:
+
+- {ref}`process-clusterOptions`
+- {ref}`process-cpus`
+- {ref}`process-memory`
+- {ref}`process-queue`
+- {ref}`process-time`
+
+(nqsii-executor)=
+
+## NQSII
+
+The `nqsii` executor allows you to run your pipeline script using the [NQSII](https://www.rz.uni-kiel.de/en/our-portfolio/hiperf/nec-linux-cluster) resource manager.
+
+Nextflow manages each process as a separate job that is submitted to the cluster using the `qsub` command provided by the scheduler.
+
+The pipeline must be launched from a node where the `qsub` command is available, which is typically the cluster login node.
+
+To enable the NQSII executor, set `process.executor = 'nqsii'` in the `nextflow.config` file.
+
+Resource requests and other job characteristics can be controlled via the following process directives:
+
+- {ref}`process-clusterOptions`
+- {ref}`process-cpus`
+- {ref}`process-memory`
+- {ref}`process-queue`
+- {ref}`process-time`
+
+(oar-executor)=
+
+## OAR
+
+:::{versionadded} 19.11.0-edge
+:::
+
+The `oar` executor allows you to run your pipeline script using the [OAR](https://oar.imag.fr) resource manager.
+
+Nextflow manages each process as a separate job that is submitted to the cluster using the `oarsub` command.
+
+The pipeline must be launched from a node where the `oarsub` command is available, which is typically the cluster login node.
+
+To enable the OAR executor set `process.executor = 'oar'` in the `nextflow.config` file.
+
+Resource requests and other job characteristics can be controlled via the following process directives:
+
+- {ref}`process-clusterOptions`
+- {ref}`process-cpus`
+- {ref}`process-memory`
+- {ref}`process-queue`
+- {ref}`process-time`
+
+When specifying `clusterOptions` as a string, multiple options must be separated by semicolons to ensure that the job script is formatted correctly:
+```groovy
+clusterOptions = '-t besteffort;--project myproject'
+```
+
+:::{versionadded} 24.04.0
+:::
+
+The same behavior can now be achieved using a string list:
+```groovy
+clusterOptions = [ '-t besteffort', '--project myproject' ]
+```
+
+See {ref}`process-clusteroptions` for details.
+
+(pbs-executor)=
+
+## PBS/Torque
+
+The `pbs` executor allows you to run your pipeline script using a resource manager from the [PBS/Torque](http://en.wikipedia.org/wiki/Portable_Batch_System) family of batch schedulers.
+
+Nextflow manages each process as a separate job that is submitted to the cluster using the `qsub` command provided by the scheduler.
+
+The pipeline must be launched from a node where the `qsub` command is available, which is typically the cluster login node.
+
+To enable the PBS executor, set `process.executor = 'pbs'` in the `nextflow.config` file.
+
+Resource requests and other job characteristics can be controlled via the following process directives:
+
+- {ref}`process-clusterOptions`
+- {ref}`process-cpus`
+- {ref}`process-memory`
+- {ref}`process-queue`
+- {ref}`process-time`
+
+(pbspro-executor)=
+
+## PBS Pro
+
+The `pbspro` executor allows you to run your pipeline script using the [PBS Pro](https://www.pbspro.org/) resource manager.
+
+Nextflow manages each process as a separate job that is submitted to the cluster using the `qsub` command provided by the scheduler.
+
+The pipeline must be launched from a node where the `qsub` command is available, which is typically the cluster login node.
+
+To enable the PBS Pro executor, set `process.executor = 'pbspro'` in the `nextflow.config` file.
+
+Resource requests and other job characteristics can be controlled via the following process directives:
+
+- {ref}`process-clusterOptions`
+- {ref}`process-cpus`
+- {ref}`process-memory`
+- {ref}`process-queue`
+- {ref}`process-time`
+
+(sge-executor)=
+
+## SGE
+
+The `sge` executor allows you to run your pipeline script using a [Sun Grid Engine](http://en.wikipedia.org/wiki/Oracle_Grid_Engine) cluster or a compatible platform ([Open Grid Engine](http://gridscheduler.sourceforge.net/), [Univa Grid Engine](http://www.univa.com/products/grid-engine.php), etc).
+
+Nextflow manages each process as a separate grid job that is submitted to the cluster using the `qsub` command.
+
+The pipeline must be launched from a node where the `qsub` command is available, which is typically the cluster login node.
+
+To enable the SGE executor, set `process.executor = 'sge'` in the `nextflow.config` file.
+
+Resource requests and other job characteristics can be controlled via the following process directives:
+
+- {ref}`process-clusterOptions`
+- {ref}`process-cpus`
+- {ref}`process-memory`
+- {ref}`process-penv`
+- {ref}`process-queue`
+- {ref}`process-time`
+
+(seqera-executor)=
+
+## Seqera
+
+:::{versionadded} 26.04.0
+:::
+
+:::{warning}
+*Preview feature: may change in a future release.*
+:::
+
+The `seqera` executor allows you to run your pipeline using the [Seqera](https://seqera.io) cloud infrastructure. It enables the seamless execution of Nextflow pipelines by offloading process executions to the Seqera scheduler service.
+
+The pipeline processes must specify the Docker image to use by defining the `container` directive, either in the pipeline script or the `nextflow.config` file. Additionally, an S3 bucket must be used as the pipeline work directory.
+
+To enable this executor, set `process.executor = 'seqera'` in the `nextflow.config` file.
+
+Resource requests and other job characteristics can be controlled via the following process directives:
+
+- {ref}`process-arch`
+- {ref}`process-container`
+- {ref}`process-containerOptions`
+- {ref}`process-cpus`
+- {ref}`process-disk`
+- {ref}`process-memory`
+- {ref}`process-time`
+
+The following {ref}`hints <process-hints>` are supported:
+
+- `machineRequirement.capacityMode`
+- `machineRequirement.diskAllocation`
+- `machineRequirement.diskEncrypted`
+- `machineRequirement.diskIops`
+- `machineRequirement.diskMountPath`
+- `machineRequirement.diskSize`
+- `machineRequirement.diskThroughputMiBps`
+- `machineRequirement.diskType`
+- `machineRequirement.machineTypes`
+- `machineRequirement.maxSpotAttempts`
+- `machineRequirement.provisioning`
+
+Each hint overrides the corresponding field of the `seqera.executor.machineRequirement` config scope on a per-process basis. Keys may be used as-is or with the `seqera/` prefix to restrict them to this executor.
+
+For example, to override the provisioning mode for a single process:
+
+```nextflow
+process hello {
+    hints 'seqera/machineRequirement.provisioning': 'spotFirst'
+
+    script:
+    """
+    your_command --here
+    """
+}
+```
+
+See {ref}`config-seqera` for the full config reference.
+
+### Disk support
+
+When the {ref}`process-disk` directive is specified, the Seqera executor provisions storage for the task container. There are two disk allocation strategies:
+
+- **task** (default): A dedicated EBS volume is created for each task at launch time. This provides isolated, high-performance storage with configurable volume type, IOPS, throughput, and encryption.
+
+- **node**: Uses the instance storage attached at the cluster level. This is shared across tasks running on the same node and does not support EBS-specific options.
+
+#### Task allocation (EBS volumes)
+
+By default, a gp3 volume with 325 MiB/s throughput is used (Fusion recommended settings). You can customize the EBS volume configuration:
+
+```groovy
+seqera {
+    executor {
+        machineRequirement {
+            diskAllocation = 'task'    // Per-task EBS volume (default)
+            diskType = 'ebs/io1'       // Use provisioned IOPS SSD
+            diskIops = 10000           // Required for io1/io2
+            diskThroughputMiBps = 500  // Throughput for gp3 volumes
+            diskEncrypted = true       // Enable KMS encryption
+            diskMountPath = '/data'    // Container mount path (default: /tmp)
+        }
+    }
+}
+```
+
+Supported volume types: `ebs/gp3` (default), `ebs/gp2`, `ebs/io1`, `ebs/io2`, `ebs/st1`, `ebs/sc1`.
+
+#### Node allocation (instance storage)
+
+To use instance storage instead of per-task EBS volumes:
+
+```groovy
+seqera {
+    executor {
+        machineRequirement {
+            diskAllocation = 'node'    // Use instance storage
+        }
+    }
+}
+```
+
+:::{note}
+When using `node` allocation, the EBS-specific options (`diskType`, `diskIops`, `diskThroughputMiBps`, `diskEncrypted`) are not applicable and will cause an error if specified.
+:::
+
+See the {ref}`seqera scope <config-seqera>` for the available configuration options.
+
+(slurm-executor)=
+
+## SLURM
+
+The `slurm` executor allows you to run your pipeline script using the [SLURM](https://slurm.schedmd.com/documentation.html) resource manager.
+
+Nextflow manages each process as a separate job that is submitted to the cluster using the `sbatch` command.
+
+The pipeline must be launched from a node where the `sbatch` command is available, which is typically the cluster login node.
+
+To enable the SLURM executor, set `process.executor = 'slurm'` in the `nextflow.config` file.
+
+Resource requests and other job characteristics can be controlled via the following process directives:
+
+- {ref}`process-clusterOptions`
+- {ref}`process-cpus`
+- {ref}`process-memory`
+- {ref}`process-queue`
+- {ref}`process-time`
+
+:::{note}
+SLURM partitions can be specified with the `queue` directive.
+:::
+
+:::{note}
+Nextflow does not provide direct support for SLURM multi-clusters. If you need to submit workflow executions to a cluster other than the current one, specify it with the `SLURM_CLUSTERS` variable in the launch environment.
+:::
+
+:::{versionadded} 23.07.0-edge
+Some SLURM clusters require memory allocations to be specified with `--mem-per-cpu` instead of `--mem`. You can specify `executor.perCpuMemAllocation = true` in the Nextflow configuration to enable this behavior. Nextflow will automatically compute the memory per CPU for each task (by default 1 CPU is used).
+:::
+
+:::{versionadded} 25.12.0-edge
+Since SLURM 24, `squeue` supports an `--only-job-state` option that ignores the partition (`-p`) or user (`-u`) filters. To enable this behavior, specify `executor.$slurm.onlyJobState = true` in your Nextflow configuration. If `SchedulerParameters=enable_job_state_cache` is enabled, you can expect improved Nextflow performance and reduced load on the SLURM controller. See [`enable_job_state_cache`](https://slurm.schedmd.com/slurm.conf.html#OPT_enable_job_state_cache) and [`--only-job-state`](https://slurm.schedmd.com/squeue.html#OPT_only-job-state) for more information.
+:::
+
+(tcs-executor)=
+
+## TCS
+
+The `tcs` executor allows you to run your pipeline script using a [Fujitsu Technical Computing Suite (TCS)](https://software.fujitsu.com/jp/manual/manualindex/p21000155e.html).
+
+Nextflow manages each process as a separate job that is submitted to the cluster using the `pjsub` command.
+
+The pipeline must be launched from a node where the `pjsub` command is available, which is typically the login node.
+
+To enable the TCS executor, set `process.executor = 'tcs'` in the `nextflow.config` file.
+
+Resource requests and other job characteristics can be controlled via the following process directives:
+
+- {ref}`process-clusterOptions`
+- {ref}`process-time`
+
+:::{note}
+Use `clusterOptions` to specify system-dependent options such as queue (resource group), CPU, and node. These options vary across target systems and are not standardized. They correspond to `-L` options in the arguments of the `pjsub` command and should be configured according to the requirements of the specific cluster environment.
+
+For example:
+
+```groovy
+process {
+  executor = 'tcs'
+  time = '00:30:00'
+  clusterOptions = '-L rscgrp=a-batch -L vnode-core=4'
+}
+```
+:::
+