233 lines
7.1 KiB
Markdown
233 lines
7.1 KiB
Markdown
(your-first-script)=
|
|
|
|
# Your first script
|
|
|
|
This guide details fundamental skills to run a basic Nextflow pipeline. It includes:
|
|
|
|
- Running a pipeline
|
|
- Modifying and resuming a pipeline
|
|
- Configuring a pipeline parameter
|
|
|
|
<h3>Prerequisites</h3>
|
|
|
|
You will need the following to get started:
|
|
|
|
- Nextflow version 25.10 or later. See {ref}`install-page` for installation instructions. If you have an older version, see {ref}`updating-nextflow-page` to update.
|
|
|
|
## Run a pipeline
|
|
|
|
You will run a basic Nextflow pipeline that splits a string of text into two files and then converts lowercase letters to uppercase letters. You can see the pipeline here:
|
|
|
|
```{code-block} groovy
|
|
:class: copyable
|
|
// Default parameter input
|
|
params.str = "Hello world!"
|
|
|
|
// split process
|
|
process split {
|
|
input:
|
|
val x
|
|
|
|
output:
|
|
path 'chunk_*'
|
|
|
|
script:
|
|
"""
|
|
printf '${x}' | split -b 6 - chunk_
|
|
"""
|
|
}
|
|
|
|
// convert_to_upper process
|
|
process convert_to_upper {
|
|
tag "$y"
|
|
|
|
input:
|
|
path y
|
|
|
|
output:
|
|
path 'upper_*'
|
|
|
|
script:
|
|
"""
|
|
cat $y | tr '[a-z]' '[A-Z]' > upper_${y}
|
|
"""
|
|
}
|
|
|
|
// Workflow block
|
|
workflow {
|
|
main:
|
|
ch_str = channel.of(params.str) // Create a channel using parameter input
|
|
ch_chunks = split(ch_str) // Split string into chunks and create a named channel
|
|
ch_upper = convert_to_upper(ch_chunks.flatten()) // Convert lowercase letters to uppercase letters
|
|
|
|
publish:
|
|
lower = ch_chunks.flatten()
|
|
upper = ch_upper
|
|
}
|
|
|
|
output {
|
|
lower {
|
|
path 'lower'
|
|
}
|
|
upper {
|
|
path 'upper'
|
|
}
|
|
}
|
|
```
|
|
|
|
This script defines two processes:
|
|
|
|
- `split`: takes a string input, splits it into 6-character chunks, and writes the chunks to files with the prefix `chunk_`
|
|
- `convert_to_upper`: takes files as input, transforms their contents to uppercase letters, and writes the uppercase strings to files with the prefix `upper_`
|
|
|
|
The `split` output is emitted as a single element. The `flatten` operator splits this combined element so that each file is treated as a sole element.
|
|
|
|
The workflow block is organized into two sections:
|
|
|
|
- `main:`: defines the workflow logic and how processes are connected via channels
|
|
- `publish:`: declares which channels should be published as workflow outputs
|
|
|
|
The `output` block (outside the workflow) defines where and how each output should be published. In this example, the outputs from both processes are published in subdirectories (`lower` and `upper`) in the default `results` out directory.
|
|
|
|
To run your pipeline:
|
|
|
|
1. Create a new file named `main.nf` in your current directory
|
|
2. Copy and save the above pipeline to your new file
|
|
3. Run your pipeline using the following command:
|
|
|
|
```{code-block}
|
|
:class: copyable
|
|
nextflow run main.nf
|
|
```
|
|
|
|
You will see output similar to the following:
|
|
|
|
```console
|
|
N E X T F L O W ~ version 25.10.0
|
|
|
|
Launching `main.nf` [big_wegener] DSL2 - revision: 13a41a8946
|
|
|
|
executor > local (3)
|
|
[82/457482] split (1) | 1 of 1 ✔
|
|
[2f/056a98] convert_to_upper (chunk_aa) | 2 of 2 ✔
|
|
```
|
|
|
|
Nextflow creates a `work` directory to store files used during a pipeline run. Each execution of a process is run as a separate task. The `split` process is run as one task and the `convert_to_upper` process is run as two tasks. The hexadecimal string, for example, `82/457482`, is the beginning of a unique hash. It is a prefix used to identify the task directory where the script was executed.
|
|
|
|
When the pipeline completes, you can view the output files in the `results` directory:
|
|
|
|
```{code-block} bash
|
|
:class: copyable
|
|
ls -R results/
|
|
```
|
|
|
|
You will see the published output files organized in the `lower` and `upper` subdirectories.
|
|
|
|
:::{tip}
|
|
Run your pipeline with `-ansi-log false` to see each task printed on a separate line:
|
|
|
|
```{code-block} bash
|
|
:class: copyable
|
|
nextflow run main.nf -ansi-log false
|
|
```
|
|
|
|
You will see output similar to the following:
|
|
|
|
```console
|
|
N E X T F L O W ~ version 25.10.0
|
|
Launching `main.nf` [peaceful_watson] DSL2 - revision: 13a41a8946
|
|
[43/f1f8b5] Submitted process > split (1)
|
|
[a2/5aa4b1] Submitted process > convert_to_upper (chunk_ab)
|
|
[30/ba7de0] Submitted process > convert_to_upper (chunk_aa)
|
|
```
|
|
|
|
:::
|
|
|
|
(getstarted-resume)=
|
|
|
|
## Modify and resume
|
|
|
|
Nextflow tracks task executions in a task cache, a key-value store of previously executed tasks. The task cache is used in conjunction with the work directory to recover cached tasks. If you modify and resume your pipeline, only the processes that are changed will be re-executed. The cached results will be used for tasks that don't change.
|
|
|
|
You can enable resumability using the `-resume` flag when running a pipeline. To modify and resume your pipeline:
|
|
|
|
1. Open `main.nf`
|
|
2. Replace the `convert_to_upper` process with the following:
|
|
|
|
```{code-block} groovy
|
|
:class: copyable
|
|
process convert_to_upper {
|
|
tag "$y"
|
|
|
|
input:
|
|
path y
|
|
|
|
output:
|
|
path 'upper_*'
|
|
|
|
script:
|
|
"""
|
|
rev $y > upper_${y}
|
|
"""
|
|
}
|
|
```
|
|
|
|
3. Save your changes
|
|
4. Run your updated pipeline using the following command:
|
|
|
|
```{code-block} bash
|
|
:class: copyable
|
|
nextflow run main.nf -resume
|
|
```
|
|
|
|
You will see output similar to the following:
|
|
|
|
```console
|
|
N E X T F L O W ~ version 25.10.0
|
|
|
|
Launching `main.nf` [furious_curie] DSL2 - revision: 5490f13c43
|
|
|
|
executor > local (2)
|
|
[82/457482] split (1) | 1 of 1, cached: 1 ✔
|
|
[02/9db40b] convert_to_upper (chunk_aa) | 2 of 2 ✔
|
|
```
|
|
|
|
Nextflow skips the execution of the `split` process and retrieves the results from the cache. The `convert_to_upper` process is executed twice.
|
|
|
|
See {ref}`cache-resume-page` for more information about Nextflow cache and resume functionality.
|
|
|
|
(getstarted-params)=
|
|
|
|
## Pipeline parameters
|
|
|
|
Parameters are used to control the inputs to a pipeline. They are declared by prepending a variable name to the prefix `params`, separated by dot character. Parameters can be specified on the command line by prefixing the parameter name with a double dash character, for example, `--paramName`. Parameters specified on the command line override parameters specified in a main script.
|
|
|
|
You can configure the `str` parameter in your pipeline. To modify your `str` parameter:
|
|
|
|
1. Run your pipeline using the following command:
|
|
|
|
```{code-block} bash
|
|
:class: copyable
|
|
nextflow run main.nf --str 'Bonjour le monde'
|
|
```
|
|
|
|
You will see output similar to the following:
|
|
|
|
```console
|
|
N E X T F L O W ~ version 25.10.0
|
|
|
|
Launching `main.nf` [distracted_kalam] DSL2 - revision: 082867d4d6
|
|
|
|
executor > local (4)
|
|
[55/a3a700] process > split (1) [100%] 1 of 1 ✔
|
|
[f4/af5ddd] process > convert_to_upper (chunk_ac) [100%] 3 of 3 ✔
|
|
```
|
|
|
|
The input string is now longer and the `split` process splits it into three chunks. The `convert_to_upper` process is run three times.
|
|
|
|
See {ref}`cli-params` for more information about modifying pipeline parameters.
|
|
|
|
<h2>Next steps</h2>
|
|
|
|
Your first script is a brief introduction to running pipelines, modifying and resuming pipelines, and pipeline parameters. See [training.nextflow.io](https://training.nextflow.io/) for further Nextflow training modules.
|