add nextflow d30e48d

2026-04-29 23:01:54 +02:00
parent d0b12d668d
commit 97cc9058d3
2840 changed files with 730250 additions and 0 deletions
--- a/nextflow/docs/workflow.md
+++ b/nextflow/docs/workflow.md
@@ -0,0 +1,757 @@
+(workflow-page)=
+
+# Workflows
+
+In Nextflow, a **workflow** is a specialized function for composing {ref}`processes <process-page>` and dataflow logic:
+
+- An [entry workflow](#entry-workflow) is the entrypoint of a pipeline. It can take [parameters](#parameters) as inputs using the `params` block, and it can publish [outputs](#outputs) using the `output` block.
+
+- A [named workflow](#named-workflows) is a workflow that can be called by other workflows. It can define its own inputs and outputs, which are called *takes* and *emits*.
+
+- Both entry workflows and named workflows can contain [dataflow logic](#dataflow) such as calling processes, workflows, and channel operators.
+
+## Entry workflow
+
+A script can define up to one *entry workflow*, which does not have a name and serves as the entrypoint of the script:
+
+```nextflow
+workflow {
+    channel.of('Bonjour', 'Ciao', 'Hello', 'Hola')
+        .map { v -> "$v world!" }
+        .view()
+}
+```
+
+(workflow-params-legacy)=
+
+## Parameters
+
+Parameters can be declared by assigning a `params` property to a default value:
+
+```nextflow
+params.input = '/some/data/file'
+params.save_intermeds = false
+
+workflow {
+    if( params.input )
+        analyze(params.input, params.save_intermeds)
+    else
+        analyze(fake_input(), params.save_intermeds)
+}
+```
+
+The default value can be overridden by the command line, params file, or config file. Parameters from multiple sources are resolved in the order described in {ref}`cli-params`.
+
+(workflow-output-def)=
+
+## Outputs
+
+:::{versionadded} 25.10.0
+Workflow outputs are available as a preview in Nextflow {ref}`24.04 <workflow-outputs-first-preview>`, {ref}`24.10 <workflow-outputs-second-preview>`, and {ref}`25.04 <workflow-outputs-third-preview>`.
+:::
+
+:::{note}
+Workflow outputs are intended to replace the {ref}`publishDir <process-publishdir>` directive. See {ref}`migrating-workflow-outputs` for guidance on migrating from `publishDir` to workflow outputs.
+:::
+
+A script can define an *output block* to declare the top-level workflow outputs. Each output should be assigned in the `publish` section of the entry workflow. Any channel in the workflow can be assigned to an output, including process and subworkflow outputs.
+
+**Example:**
+
+```nextflow
+process fetch {
+    // ...
+
+    output:
+    path 'sample.txt'
+
+    // ...
+}
+
+workflow {
+    main:
+    ch_samples = fetch(params.input)
+
+    publish:
+    samples = ch_samples
+}
+
+output {
+    samples {
+        path '.'
+    }
+}
+```
+
+In the above example, the output of process `fetch` is assigned to the `samples` workflow output. How this output is published to a directory structure is described in the next section.
+
+(workflow-publishing-files)=
+
+### Publishing files
+
+Each workflow output can define how files are *published* from the work directory to a designated *output directory*.
+
+**Output directory**
+
+You can set the top-level output directory for a run using the `-output-dir` command-line option or the `outputDir` config option:
+
+```bash
+nextflow run main.nf -output-dir 'my-results'
+```
+
+```groovy
+// nextflow.config
+outputDir = 'my-results'
+```
+
+The default output directory is `results` in the launch directory.
+
+**Publish path**
+
+By default, Nextflow publishes all output files to the output directory. Each workflow output can define where to publish files within the output directory using the `path` directive:
+
+```nextflow
+workflow {
+    main:
+    ch_step1 = step1()
+    ch_step2 = step2(ch_step1)
+
+    publish:
+    step1 = ch_step1
+    step2 = ch_step2
+}
+
+output {
+    step1 {
+        path 'step1'
+    }
+    step2 {
+        path 'step2'
+    }
+}
+```
+
+The following directory structure is created:
+
+```
+results/
+└── step1/
+    └── ...
+└── step2/
+    └── ...
+```
+
+Nextflow publishes all files received by an output into the specified directory. Nextflow recursively scans lists, maps, and tuples for nested files:
+
+```nextflow
+workflow {
+    main:
+    ch_samples = channel.of(
+        tuple( [id: 'SAMP1'], [ file('1.txt'), file('2.txt') ] )
+    )
+
+    publish:
+    samples = ch_samples // 1.txt and 2.txt are published
+}
+```
+
+:::{note}
+Files that do not originate from the work directory are not published.
+:::
+
+**Dynamic publish path**
+
+The `path` directive can also be a closure which defines a custom publish path for each channel value:
+
+```nextflow
+workflow {
+    main:
+    ch_samples = channel.of(
+        [id: 'SAMP1', fastq_1: file('1.fastq'), fastq_2: file('2.fastq')]
+    )
+
+    publish:
+    samples = ch_samples
+}
+
+output {
+    samples {
+        path { sample -> "fastq/${sample.id}/" }
+    }
+}
+```
+
+The above example publishes each channel value to a different subdirectory. In this case, each pair of FASTQ files is published into a subdirectory based on the sample ID.
+
+Alternatively, you can define a different path for each individual file using the `>>` operator:
+
+```nextflow
+output {
+    samples {
+        path { sample ->
+            sample.fastq_1 >> "fastq/${sample.id}/"
+            sample.fastq_2 >> "fastq/${sample.id}/"
+        }
+    }
+}
+```
+
+Each `>>` specifies a *source file* and *publish target*. The source file should be a file or collection of files, and the publish target should be a directory or file name. If the publish target ends with a slash, Nextflow treats it as the directory in which to publish source files.
+
+When using this syntax, only files captured with the `>>` operator are saved to the output directory.
+
+**Conditional publishing**
+
+Outputs can be conditionally published using pipeline parameters:
+
+```nextflow
+output {
+    samples {
+        path { sample ->
+            sample.fastqc >> "fastqc/"
+            sample.bam >> (params.save_bams ? "align/" : null)
+        }
+    }
+}
+```
+
+In the above example, the BAM files specified by `sample.bam` are published only when `params.save_bams` is `true`.
+
+### Index files
+
+Index files are structured metadata files that catalog published outputs and their associated metadata. An index file preserves the structure of channel values, including metadata, which is more robust than encoding this information into file paths. The index file can be a CSV (`.csv`), JSON (`.json`), or YAML (`.yml`, `.yaml`) file. The channel values should be files, lists, maps, or tuples.
+
+Each output can create an index file of its published values:
+
+```nextflow
+workflow {
+    main:
+    ch_samples = channel.of(
+        [id: 1, name: 'sample 1', fastq_1: '1a.fastq', fastq_2: '1b.fastq'],
+        [id: 2, name: 'sample 2', fastq_1: '2a.fastq', fastq_2: '2b.fastq'],
+        [id: 3, name: 'sample 3', fastq_1: '3a.fastq', fastq_2: null]
+    )
+
+    publish:
+    samples = ch_samples
+}
+
+output {
+    samples {
+        path 'fastq'
+        index {
+            path 'samples.csv'
+        }
+    }
+}
+```
+
+The above example writes the following CSV file to `results/samples.csv`:
+
+```
+"1","sample 1","results/fastq/1a.fastq","results/fastq/1b.fastq"
+"2","sample 2","results/fastq/2a.fastq","results/fastq/2b.fastq"
+"3","sample 3","results/fastq/3a.fastq",""
+```
+
+You can customize the index file with additional directives, for example:
+
+```nextflow
+index {
+    path 'samples.csv'
+    header true
+    sep '|'
+}
+```
+
+This example produces the following index file:
+
+```
+"id"|"name"|"fastq_1"|"fastq_2"
+"1"|"sample 1"|"results/fastq/1a.fastq"|"results/fastq/1b.fastq"
+"2"|"sample 2"|"results/fastq/2a.fastq"|"results/fastq/2b.fastq"
+"3"|"sample 3"|"results/fastq/3a.fastq"|""
+```
+
+:::{note}
+Files that do not originate from the work directory are not published, but are included in the index file.
+:::
+
+See [Output directives](#output-directives) for the list of available index directives.
+
+(workflow-output-labels)=
+
+### Labels
+
+You can apply labels to each workflow output using the `label` directive:
+
+```nextflow
+output {
+    multiqc_report {
+        label 'qc'
+        label 'summary'
+    }
+}
+```
+
+Labels can be used to find and filter output files across workflow runs with data lineage. See {ref}`data-lineage-workflow-outputs` for details on how to query output files by label.
+
+### Output directives
+
+The following directives are available for each output in the output block:
+
+`index`
+: Create an index file containing a record of each published value.
+
+  The following directives are available in an index definition:
+
+  `header`
+  : When `true`, the keys of the first record are used as the column names (default: `false`). Can also be a list of column names. Only used for CSV files.
+
+  `path`
+  : The name of the index file relative to the base output directory (required). Can be a CSV, JSON, or YAML file.
+
+  `sep`
+  : The character used to separate values (default: `','`). Only used for CSV files.
+
+`label`
+: Attach a label to every file published by this output. Can be specified multiple times to attach multiple labels.
+: Labels are stored in the `labels` field of `FileOutput` records in the {ref}`lineage store <data-lineage-page>`.
+
+`path`
+: Specify the publish path relative to the output directory (default: `'.'`). Can be a path, a closure that defines a custom directory for each published value, or a closure that publishes individual files using the `>>` operator.
+
+Additionally, the following options from the {ref}`workflow <config-workflow>` config scope can be specified as directives:
+- `contentType`
+- `enabled`
+- `ignoreErrors`
+- `mode`
+- `overwrite`
+- `storageClass`
+- `tags`
+
+For example:
+
+```nextflow
+output {
+    samples {
+        mode 'copy'
+    }
+}
+```
+
+## Named workflows
+
+A *named workflow* is a workflow that can be called by other workflows:
+
+```nextflow
+workflow my_workflow {
+    ch_hello = hello()
+    bye( ch_hello.collect() )
+}
+
+workflow {
+    my_workflow()
+}
+```
+
+The above example defines a workflow named `my_workflow` which is called by the entry workflow. Both `hello` and `bye` could be any other process or workflow.
+
+### Takes and emits
+
+The `take:` section declares the inputs of a named workflow:
+
+```nextflow
+workflow my_workflow {
+    take:
+    data1
+    data2
+
+    main:
+    ch_hello = hello(data1, data2)
+    bye(hello)
+}
+```
+
+Inputs can be specified like arguments when calling the workflow:
+
+```nextflow
+workflow {
+    my_workflow( channel.of('/some/data') )
+}
+```
+
+The `emit:` section declares the outputs of a named workflow:
+
+```nextflow
+workflow my_workflow {
+    take:
+    data
+
+    main:
+    ch_bye = bye(hello(data))
+
+    emit:
+    ch_bye
+}
+```
+
+If an output is assigned to a name, the name can be used to reference the output from the calling workflow. For example:
+
+```nextflow
+workflow my_workflow {
+    main:
+    ch_hello = hello(data)
+    ch_bye = bye(ch_hello)
+
+    emit:
+    my_data = ch_bye
+}
+
+workflow {
+    result = my_workflow()
+    result.my_data.view()
+}
+```
+
+:::{note}
+Every output must be assigned to a name when multiple outputs are declared.
+:::
+
+(dataflow-page)=
+
+## Dataflow
+
+Workflows consist of *dataflow* logic, in which processes are connected to each other through *dataflow channels* and *dataflow values*.
+
+### Channels and values
+
+A *dataflow channel* (or simply *channel*) is an asynchronous sequence of values.
+
+The values in a channel cannot be accessed directly, but only through an operator or process. For example:
+
+```nextflow
+channel.of(1, 2, 3).view { v -> "channel emits ${v}" }
+```
+
+```console
+channel emits 1
+channel emits 2
+channel emits 3
+```
+
+A *dataflow value* is an asynchronous value.
+
+Dataflow values can be created using the {ref}`channel.value <channel-value>` factory, and they are created by processes (under {ref}`certain conditions <process-out-singleton>`).
+
+A dataflow value cannot be accessed directly, but only through an operator or process. For example:
+
+```nextflow
+channel.value(1).view { v -> "dataflow value is ${v}" }
+```
+
+```console
+dataflow value is 1
+```
+
+### Factories
+
+A channel can be created by factories in the `channel` namespace. For example, the `channel.fromPath()` factory creates a channel from a file name or glob pattern, similar to the `files()` function:
+
+```nextflow
+channel.fromPath('input/*.txt').view()
+```
+
+See {ref}`channel-factory` for the full list of channel factories.
+
+### Operators
+
+Channel operators, or *operators* for short, are functions that consume and produce channels. Because channels are asynchronous, operators are necessary to manipulate the values in a channel. Operators are particularly useful for implementing glue logic between processes.
+
+Commonly used operators include:
+
+- {ref}`operator-collect`: collect the channel values into a collection
+
+- {ref}`operator-combine`: emit the combinations of two channels
+
+- {ref}`operator-filter`: emit only the channel values that satisfy a condition
+
+- {ref}`operator-flatMap`: emit multiple values for each channel value with a closure
+
+- {ref}`operator-grouptuple`: group the channel values based on a grouping key
+
+- {ref}`operator-join`: join the values from two channels based on a matching key
+
+- {ref}`operator-map`: transform each channel value with a mapping function
+
+- {ref}`operator-mix`: emit the values from multiple channels
+
+- {ref}`operator-view`: print each channel value to standard output
+
+See {ref}`operator-page` for the full set of operators. See {ref}`stdlib-types-value` for the set of available methods for dataflow values.
+
+(workflow-process-invocation)=
+
+### Calling processes and workflows
+
+Processes and workflows are called like functions, passing their inputs as arguments:
+
+```nextflow
+process hello {
+    output:
+    path 'hello.txt', emit: txt
+
+    script:
+    """
+    your_command > hello.txt
+    """
+}
+
+process bye {
+    input:
+    path 'hello.txt'
+
+    output:
+    path 'bye.txt', emit: txt
+
+    script:
+    """
+    another_command hello.txt > bye.txt
+    """
+}
+
+workflow hello_bye {
+    take:
+    data
+
+    main:
+    hello()
+    bye(data)
+}
+
+workflow {
+    data = channel.fromPath('/some/path/*.txt')
+    hello_bye(data)
+}
+```
+
+Processes and workflows can only be called by workflows. A given process or workflow can only be called once in a given workflow. To use a process or workflow multiple times in the same workflow, {ref}`include <syntax-include>` it from another script with multiple aliases:
+
+```nextflow
+include { hello_bye as hello_bye1 } from './modules/hello_bye'
+include { hello_bye as hello_bye2 } from './modules/hello_bye'
+
+workflow {
+    data1 = channel.fromPath('data1/*.txt')
+    data2 = channel.fromPath('data2/*.txt')
+    hello_bye1(data1)
+    hello_bye2(data2)
+}
+```
+
+The "return value" of a process or workflow call is the process outputs or workflow emits, respectively. The return value can be assigned to a variable or passed into another call:
+
+```nextflow
+workflow hello_bye {
+    take:
+    data
+
+    main:
+    bye_out = bye(hello(data))
+
+    emit:
+    bye_out
+}
+
+workflow {
+    data = channel.fromPath('/some/path/*.txt')
+    bye_out = hello_bye(data)
+}
+```
+
+Named outputs can be accessed as properties of the return value:
+
+```nextflow
+workflow hello_bye {
+    take:
+    data
+
+    main:
+    hello_out = hello(data)
+    bye_out = bye(hello_out.txt)
+
+    emit:
+    bye = bye_out.txt
+}
+
+workflow {
+    data = channel.fromPath('/some/path/*.txt')
+    flow_out = hello_bye(data)
+    bye_out = flow_out.bye
+}
+```
+
+As a convenience, process and workflow outputs can also be accessed without first assigning to a variable, by using the `.out` property of the process or workflow name:
+
+```nextflow
+workflow hello_bye {
+    take:
+    data
+
+    main:
+    hello(data)
+    bye(hello.out)
+
+    emit:
+    bye = bye.out
+}
+
+workflow {
+    data = channel.fromPath('/some/path/*.txt')
+    hello_bye(data)
+    hello_bye.out.bye.view()
+}
+```
+
+:::{note}
+Process named outputs are defined using the `emit` option on a process output. See {ref}`naming process outputs <process-naming-outputs>` for more information.
+:::
+
+Workflows can be composed in the same way:
+
+```nextflow
+workflow flow1 {
+    take:
+    data
+
+    emit:
+    tack(tick(data))
+}
+
+workflow flow2 {
+    take:
+    data
+
+    emit:
+    tock(tick(data))
+}
+
+workflow {
+    data = channel.fromPath('/some/path/*.txt')
+    flow2(flow1(data))
+}
+```
+
+The same process can be called in different workflows without using an alias, like `tick` in the above example, which is used in both `flow1` and `flow2`. The workflow call stack determines the *fully qualified process name*, which is used to distinguish the different process calls, i.e. `flow1:tick` and `flow2:tick` in the above example.
+
+:::{tip}
+The fully qualified process name can be used as a {ref}`process selector <config-process-selectors>` in a Nextflow configuration file, and it takes priority over the simple process name.
+:::
+
+(workflow-special-operators)=
+
+### Special operators (`|` and `&`)
+
+:::{deprecated} 26.04.0
+These operators are not supported when {ref}`static typing <preparing-static-types>` is enabled. Use standard method calls and assignments instead.
+:::
+
+The following operators have a special meaning when used with process and workflow calls in a workflow:
+
+- The `|` *pipe* operator can be used to chain processes, operators, and workflows.
+- The `&` *and* operator can be used to call multiple processes in parallel with the same channel(s).
+
+For example:
+
+```nextflow
+process greet {
+    input:
+    val data
+
+    output:
+    val result
+
+    exec:
+    result = "$data world"
+}
+
+process to_upper {
+    input:
+    val data
+
+    output:
+    val result
+
+    exec:
+    result = data.toUpperCase()
+}
+
+workflow {
+    channel.of('Hello')
+        | map { v -> v.reverse() }
+        | (greet & to_upper)
+        | mix
+        | view
+}
+```
+
+In the above snippet, the initial channel is piped to the {ref}`operator-map` operator, which reverses the string value. Then, the result is passed to the processes `greet` and `to_upper`, which are executed in parallel. Each process outputs a channel, and the two channels are combined using the {ref}`operator-mix` operator. Finally, the result is printed using the {ref}`operator-view` operator.
+
+The same code can also be written as:
+
+```nextflow
+workflow {
+    ch = channel.of('Hello').map { v -> v.reverse() }
+    ch_greet = greet(ch)
+    ch_upper = to_upper(ch)
+    ch_greet.mix(ch_upper).view()
+}
+```
+
+(workflow-recursion)=
+
+### Process and workflow recursion
+
+:::{versionadded} 22.04.0
+:::
+
+:::{note}
+This is a preview feature and requires the `nextflow.preview.recursion` feature flag to be enabled. The syntax and behavior may change in future releases.
+:::
+
+Processes can be invoked recursively using the `recurse` method.
+
+```{literalinclude} snippets/recurse-process.nf
+:language: nextflow
+```
+
+```{literalinclude} snippets/recurse-process.out
+:language: console
+```
+
+In the above example, the `count_down` process is first invoked with the value `params.start`. On each subsequent iteration, the process is invoked again using the output from the previous iteration. The recursion continues until the specified condition is satisfied, as defined by the `until` method, which terminates the recursion.
+
+The recursive output can also be limited using the `times` method:
+
+```nextflow
+count_down
+    .recurse(params.start)
+    .times(3)
+    .view { v -> "${v}..." }
+```
+
+Workflows can also be invoked recursively:
+
+```{literalinclude} snippets/recurse-workflow.nf
+:language: nextflow
+```
+
+```{literalinclude} snippets/recurse-workflow.out
+:language: console
+```
+
+**Limitations**
+
+- A recursive process or workflow must have matching inputs and outputs, such that the outputs for each iteration can be supplied as the inputs for the next iteration.
+
+- Recursive workflows cannot use *reduction* operators such as `collect`, `reduce`, and `toList`, because these operators cause the recursion to hang indefinitely after the initial iteration.