455 lines
12 KiB
Markdown
455 lines
12 KiB
Markdown
(channel-factory)=
|
|
|
|
# Channel factories
|
|
|
|
This page describes the channel factories that are available in the `channel` namespace.
|
|
|
|
(channel-empty)=
|
|
|
|
## empty
|
|
|
|
**`empty() -> Channel`**
|
|
|
|
The `channel.empty` factory creates a channel that emits nothing:
|
|
|
|
```nextflow
|
|
channel.empty().view() // prints nothing
|
|
```
|
|
|
|
(channel-filepairs)=
|
|
|
|
## fromFilePairs
|
|
|
|
:::{note}
|
|
As a best practice, use a samplesheet instead of matching file pairs directly with glob patterns. See {ref}`static-types-samplesheet` for more information.
|
|
:::
|
|
|
|
The `channel.fromFilePairs` factory creates a channel that emits file pairs matching a [glob][glob] pattern:
|
|
|
|
```nextflow
|
|
ch = channel.fromFilePairs('/my/data/SRR*_{1,2}.fastq')
|
|
ch.view()
|
|
```
|
|
|
|
Each file pair is emitted as a 2-tuple containing the grouping key (from the `*` wildcard) and the list of files (sorted lexicographically):
|
|
|
|
```
|
|
[SRR493366, [/my/data/SRR493366_1.fastq, /my/data/SRR493366_2.fastq]]
|
|
[SRR493367, [/my/data/SRR493367_1.fastq, /my/data/SRR493367_2.fastq]]
|
|
[SRR493368, [/my/data/SRR493368_1.fastq, /my/data/SRR493368_2.fastq]]
|
|
[SRR493369, [/my/data/SRR493369_1.fastq, /my/data/SRR493369_2.fastq]]
|
|
[SRR493370, [/my/data/SRR493370_1.fastq, /my/data/SRR493370_2.fastq]]
|
|
[SRR493371, [/my/data/SRR493371_1.fastq, /my/data/SRR493371_2.fastq]]
|
|
```
|
|
|
|
The glob pattern must contain at least one `*` wildcard character.
|
|
|
|
Available options:
|
|
|
|
`checkIfExists`
|
|
: When `true`, throws an error if the file path does not exist in the file system (default: `false`).
|
|
|
|
`flat`
|
|
: When `true`, tuples are emitted with the matching files flattened instead of as a nested list (default: `false`).
|
|
|
|
`followLinks`
|
|
: When `true`, follows symbolic links when traversing a directory tree, otherwise treats them as files (default: `true`).
|
|
|
|
`hidden`
|
|
: When `true`, matches hidden files when using a glob pattern (default: `false`).
|
|
|
|
`maxDepth`
|
|
: Maximum number of directory levels to visit with the `**` wildcard (default: no limit).
|
|
|
|
`size`
|
|
: The number of expected files for each file pair (default: `2`). Set to `-1` to allow any size.
|
|
|
|
`type`
|
|
: Whether to return only files (`'file'`), only directories (`'dir'`), or both (`'any'`) when using a glob pattern. By default, only files are returned (`'file'`).
|
|
|
|
(channel-from-lineage)=
|
|
|
|
## fromLineage
|
|
|
|
:::{versionadded} 25.04.0
|
|
:::
|
|
|
|
:::{warning} *Experimental: may change in a future release.*
|
|
:::
|
|
|
|
**`fromLineage( [opts] ) -> Channel<Path>`**
|
|
|
|
The `channel.fromLineage` factory creates a channel that emits files from the {ref}`cli-lineage` store that match the given key-value params:
|
|
|
|
```nextflow
|
|
ch = channel.fromLineage(
|
|
workflowRun: 'lid://0d1d1622ced3e4edc690bec768919b45',
|
|
label: ['alpha', 'beta']
|
|
)
|
|
ch.view()
|
|
```
|
|
|
|
The above snippet emits files published by the given workflow run that are labeled as `alpha` and `beta`.
|
|
|
|
Available options:
|
|
|
|
`label`
|
|
: List of labels associated with the desired files.
|
|
|
|
`taskRun`
|
|
: LID of the task run that produced the desired files.
|
|
|
|
`workflowRun`
|
|
: LID of the workflow run that produced the desired files.
|
|
|
|
(channel-fromlist)=
|
|
|
|
## fromList
|
|
|
|
**`fromList( values: Iterable<E> ) -> Channel<E>`**
|
|
|
|
The `channel.fromList` factory creates a channel that emits each element in a collection:
|
|
|
|
```nextflow
|
|
ch = channel.fromList( ['a', 'b', 'c', 'd'] )
|
|
ch.view { v -> "value: $v" }
|
|
```
|
|
|
|
Prints:
|
|
|
|
```
|
|
value: a
|
|
value: b
|
|
value: c
|
|
value: d
|
|
```
|
|
|
|
See also: [channel.of](#of)
|
|
|
|
(channel-path)=
|
|
|
|
## fromPath
|
|
|
|
**`fromPath( pattern: String, [opts] ) -> Channel<Path>`**
|
|
|
|
The `channel.fromPath` factory creates a channel that emits file paths matching a name or [glob][glob] pattern:
|
|
|
|
```nextflow
|
|
// match single file
|
|
channel.fromPath('data/some/bigfile.txt')
|
|
|
|
// match `txt` files in `data/bag`
|
|
channel.fromPath('data/big/*.txt')
|
|
|
|
// match `fa` files in `data` and its subdirectories
|
|
channel.fromPath('data/**.fa')
|
|
|
|
// match `fa` files with same suffix in any subdirectory of `data`
|
|
channel.fromPath('data/**/*.fa')
|
|
|
|
// match file pair (`file_1.fq` and `file_2.fq`)
|
|
channel.fromPath('data/file_{1,2}.fq')
|
|
```
|
|
|
|
By default, glob patterns do not match hidden files (i.e. files with names that start with `.`). Use a glob pattern that explicitly starts with `.` or set `hidden: true` to match hidden files:
|
|
|
|
```nextflow
|
|
// match hidden files in `data`
|
|
channel.fromPath('data/.*')
|
|
channel.fromPath('data/*', hidden: true)
|
|
|
|
// match hidden files in `data` with `fa` extension
|
|
channel.fromPath('data/.*.fa')
|
|
```
|
|
|
|
By default, glob patterns only match regular files, not directories. Use the `type` option to control whether to match files, directories, or both:
|
|
|
|
```nextflow
|
|
// match only directories
|
|
channel.fromPath('data/*', type: 'dir')
|
|
|
|
// match files and directories
|
|
channel.fromPath('data/*', type: 'any')
|
|
```
|
|
|
|
Available options:
|
|
|
|
`checkIfExists`
|
|
: When `true`, throws an error if the file path does not exist in the file system (default: `false`).
|
|
|
|
`followLinks`
|
|
: When `true`, follows symbolic links when traversing a directory tree, otherwise treats them as files (default: `true`).
|
|
|
|
`glob`
|
|
: When `true`, interprets the characters `*`, `?`, `[]`, and `{}` as glob wildcards, otherwise treats them as normal characters (default: `true`).
|
|
|
|
`hidden`
|
|
: When `true`, matches hidden files when using a glob pattern (default: `false`).
|
|
|
|
`maxDepth`
|
|
: Maximum number of directory levels to visit with the `**` wildcard (default: no limit).
|
|
|
|
`relative`
|
|
: When `true`, returns file paths as relative to the top-most common directory (default: `false`).
|
|
|
|
`type`
|
|
: Whether to return only files (`'file'`), only directories (`'dir'`), or both (`'any'`) when using a glob pattern. By default, only files are returned (`'file'`).
|
|
|
|
(channel-fromsra)=
|
|
|
|
## fromSRA
|
|
|
|
:::{deprecated} 26.04.0
|
|
Use the [Entrez Direct](https://www.ncbi.nlm.nih.gov/books/NBK179288/) command-line tool to query the SRA database.
|
|
:::
|
|
|
|
The `channel.fromSRA` factory queries the [NCBI SRA](https://www.ncbi.nlm.nih.gov/sra) database and returns a channel emitting the FASTQ files matching the specified criteria i.e project or accession number(s). For example:
|
|
|
|
```nextflow
|
|
channel.fromSRA('SRP043510').view()
|
|
```
|
|
|
|
It returns:
|
|
|
|
```
|
|
[SRR1448794, ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR144/004/SRR1448794/SRR1448794.fastq.gz]
|
|
[SRR1448795, ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR144/005/SRR1448795/SRR1448795.fastq.gz]
|
|
[SRR1448792, ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR144/002/SRR1448792/SRR1448792.fastq.gz]
|
|
[SRR1448793, ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR144/003/SRR1448793/SRR1448793.fastq.gz]
|
|
[SRR1910483, ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR191/003/SRR1910483/SRR1910483.fastq.gz]
|
|
[SRR1910482, ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR191/002/SRR1910482/SRR1910482.fastq.gz]
|
|
(remaining omitted)
|
|
```
|
|
|
|
Multiple accession IDs can be specified as a list:
|
|
|
|
```nextflow
|
|
ids = ['ERR908507', 'ERR908506', 'ERR908505']
|
|
channel.fromSRA(ids).view()
|
|
```
|
|
|
|
```
|
|
[ERR908507, [ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR908/ERR908507/ERR908507_1.fastq.gz, ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR908/ERR908507/ERR908507_2.fastq.gz]]
|
|
[ERR908506, [ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR908/ERR908506/ERR908506_1.fastq.gz, ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR908/ERR908506/ERR908506_2.fastq.gz]]
|
|
[ERR908505, [ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR908/ERR908505/ERR908505_1.fastq.gz, ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR908/ERR908505/ERR908505_2.fastq.gz]]
|
|
```
|
|
|
|
Each read pair is implicitly managed and returned as a list of files.
|
|
|
|
This method uses the NCBI [ESearch](https://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.ESearch) API behind the scenes, therefore it allows the use of any query term supported by this API.
|
|
|
|
To access the ESearch API, you must provide your [NCBI API keys](https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities) through one of the following ways:
|
|
|
|
- The `apiKey` option:
|
|
```nextflow
|
|
channel.fromSRA(ids, apiKey:'0123456789abcdef')
|
|
```
|
|
|
|
- The `NCBI_API_KEY` variable in your environment:
|
|
```bash
|
|
export NCBI_API_KEY=0123456789abcdef
|
|
```
|
|
|
|
Available options:
|
|
|
|
`apiKey`
|
|
: NCBI user API key.
|
|
|
|
`cache`
|
|
: Enable/disable the caching API requests (default: `true`).
|
|
|
|
`max`
|
|
: Maximum number of entries that can be retried (default: unlimited) .
|
|
|
|
`protocol`
|
|
: Allow choosing the protocol for the resulting remote URLs. Available choices: `ftp`, `http`, `https` (default: `ftp`).
|
|
|
|
`retryPolicy`
|
|
: Set a retry policy in case the SRA request fails with a retriable error.
|
|
|
|
: Available properties:
|
|
|
|
- `delay`: Delay between attempts (default: `500ms`)
|
|
- `jitter`: Jitter value (default: `0.25`)
|
|
- `maxAttempts`: Max attempts (default: `3`)
|
|
- `maxDelay`: Max delay (default: `30s`)
|
|
|
|
: For example:
|
|
|
|
```nextflow
|
|
channel.fromSRA(ids, retryPolicy: [delay: '250ms', maxAttempts: 5])
|
|
```
|
|
|
|
(channel-interval)=
|
|
|
|
## interval
|
|
|
|
**`interval( interval: String ) -> Channel<Integer>`**
|
|
|
|
The `channel.interval` factory emits an incrementing index (starting from zero) at a periodic interval. For example:
|
|
|
|
```nextflow
|
|
channel.interval('1s').view()
|
|
```
|
|
|
|
The above snippet will emit 0, 1, 2, and so on, every second, forever. You can use an operator such as {ref}`operator-take` or {ref}`operator-until` to close the channel based on a stopping condition.
|
|
|
|
(channel-of)=
|
|
|
|
## of
|
|
|
|
**`of( values... ) -> Channel`**
|
|
|
|
The `channel.of` factory allows you to create a channel that emits each argument:
|
|
|
|
```nextflow
|
|
ch = channel.of( 1, 3, 5, 7 ).view()
|
|
```
|
|
|
|
Prints:
|
|
|
|
```
|
|
1
|
|
3
|
|
5
|
|
7
|
|
```
|
|
|
|
Ranges of values are expanded accordingly:
|
|
|
|
```nextflow
|
|
channel.of(1..23, 'X', 'Y').view()
|
|
```
|
|
|
|
Prints:
|
|
|
|
```
|
|
1
|
|
2
|
|
3
|
|
4
|
|
:
|
|
23
|
|
X
|
|
Y
|
|
```
|
|
|
|
See also: [channel.fromList](#fromlist)
|
|
|
|
(channel-topic)=
|
|
|
|
## topic
|
|
|
|
:::{versionadded} 25.04.0
|
|
:::
|
|
|
|
:::{note}
|
|
This feature was previewed in versions 24.04 and 24.10 with the `nextflow.preview.topic` feature flag.
|
|
:::
|
|
|
|
**`topic( name: String ) -> Channel`**
|
|
|
|
A *topic channel* is a channel that can receive values from many sources *implicitly* based on a matching *topic name*.
|
|
|
|
A typed process can emit values to a topic using the `topic:` section:
|
|
|
|
```nextflow
|
|
nextflow.enable.types = true
|
|
|
|
process hello {
|
|
topic:
|
|
file('hello.txt') >> 'my-topic'
|
|
|
|
// ...
|
|
}
|
|
|
|
process bye {
|
|
topic:
|
|
file('bye.txt') >> 'my-topic'
|
|
|
|
// ...
|
|
}
|
|
```
|
|
|
|
A legacy process can assign outputs in the `output:` section to a topic using the `topic` option:
|
|
|
|
```nextflow
|
|
process hello {
|
|
output:
|
|
path('hello.txt'), topic: 'my-topic'
|
|
|
|
// ...
|
|
}
|
|
```
|
|
|
|
The `channel.topic` factory returns the topic channel for the given name:
|
|
|
|
```nextflow
|
|
channel.topic('my-topic').view()
|
|
```
|
|
|
|
The above example emits all values sent to the `my-topic` topic from processes such as `hello` and `bye`.
|
|
|
|
This approach is a convenient way to collect related items from many different sources without explicitly connecting them (e.g. using the `mix` operator).
|
|
|
|
:::{warning}
|
|
Any process that consumes a topic channel (directly or indirectly) should not send any outputs to that topic, or else the pipeline will hang forever.
|
|
:::
|
|
|
|
See also: {ref}`process-typed-topics` for process outputs
|
|
|
|
(channel-value)=
|
|
|
|
## value
|
|
|
|
**`value( value: V ) -> Value<V>`**
|
|
|
|
The `channel.value` factory creates a dataflow value bound to the given argument:
|
|
|
|
```nextflow
|
|
channel.value( 'Hello there' )
|
|
channel.value( [1,2,3,4,5] )
|
|
```
|
|
|
|
The first line creates a dataflow value bound to the string `'Hello there'`. The second line creates a dataflow value bound to the list `[1,2,3,4,5]`, which is treated as a single value in dataflow logic.
|
|
|
|
(channel-watchpath)=
|
|
|
|
## watchPath
|
|
|
|
**`watchPath( pattern: String, events: String = 'create' ) -> Channel<Path>`**
|
|
|
|
The `channel.watchPath` factory creates a channel that watches a [glob][glob] pattern and emits matching files as they appear.
|
|
|
|
For example:
|
|
|
|
```nextflow
|
|
ch = channel.watchPath('/path/*.fa')
|
|
ch.view { fa -> "Fasta file: $fa" }
|
|
```
|
|
|
|
The second argument specifies which filesystem events to watch as a comma-separated string:
|
|
|
|
```nextflow
|
|
ch = channel.watchPath('/path/*.fa', 'create,modify')
|
|
ch.view { fa -> "File created or modified: $fa" }
|
|
```
|
|
|
|
By default, only new files are watched. The following events are supported:
|
|
|
|
- `create`: A new file is created
|
|
- `modify`: A file is modified
|
|
- `delete`: A file is deleted
|
|
|
|
:::{warning}
|
|
The `channel.watchPath` factory waits endlessly for matching files, which means that it will cause your pipeline to run forever. Consider using the `take` or `until` operator to apply a stopping condition (e.g. receiving 10 files, receiving a file named `DONE`).
|
|
:::
|
|
|
|
:::{note}
|
|
The `channel.watchPath` factory only works with local and shared filesystems. It does not support object storage such as S3.
|
|
:::
|
|
|
|
See also: [channel.fromPath](#frompath)
|
|
|
|
[glob]: http://docs.oracle.com/javase/tutorial/essential/io/fileOps.html#glob
|