10 KiB
Workflow params
- Authors: Ben Sherman
- Status: accepted
- Date: 2025-08-25
- Tags: lang, static-types, params
Summary
Introduce a unified, statically typed way to declare the top-level inputs (i.e. parameters) of a workflow.
Problem Statement
Pipeline parameters in Nextflow are currently declared using property assignments:
params.reads = "$baseDir/data/ggal/ggal_gut_{1,2}.fq"
params.transcriptome = "$baseDir/data/ggal/ggal_1_48850000_49020000.Ggal71.500bpflank.fa"
params.multiqc = "$baseDir/multiqc"
This approach has several limitations:
-
No type annotations: Parameter types cannot be expressed in the script. The type of a parameter can only be inferred from its default value, which may be ambiguous (e.g., a default value of
null, aStringthat should be treated as aPath). -
Heuristic type coercion: When a parameter is supplied on the command line, Nextflow attempts to coerce the string value to the appropriate type using heuristics (e.g.,
'true'→ booleantrue,'42'→ integer42). These heuristics are not always correct and can lead to unexpected behavior. -
No built-in validation: There is no built-in way to validate that a parameter is required, or that a parameter value has the correct type. Validation must be done manually in the script, or through an external JSON Schema file (
nextflow_schema.json). -
Scattered declarations and usage: Parameters may be declared anywhere in the script or across multiple scripts, making it difficult to get a single view of the pipeline parameters. Parameters can be used anywhere in the pipeline, even outside the script where they are declared, making it impossible to validate params usage at compile-time.
Goals
-
Declare all parameters in one place in the script, with documentation.
-
Provide explicit type annotations for parameters, enabling compile-time validation and IDE support.
-
Clearly distinguish between required and optional parameters.
-
Coerce CLI parameter values based on declared types, rather than relying on heuristics.
Non-goals
-
Removing the legacy
params.foo = barsyntax -- legacy parameters must continue to work without modification. -
Changing the
paramsconfig scope -- params can still be declared in the config file, although some best practices apply. -
Replacing
nextflow_schema.json-- while theparamsblock addresses many of the same needs, existing pipelines that use a JSON Schema should not be required to migrate. A native integration withnextflow_schema.jsoncan be explored in the future. -
Supporting nested params -- the
paramsblock only supports a flat list of params. Nested params can still be used in the config, but they do not have first-class support at this time.
Decision
Introduce the params block for declaring pipeline parameters. Each parameter is declared with a name, a type, and an optional default value:
params {
// Path to the input samplesheet
input: Path
// Whether to save intermediate files
save_intermeds: Boolean = false
}
Typed parameters are used to validate parameter usage in the script, and to coerce CLI parameter values at runtime.
Core Capabilities
Parameter declaration
The params block consists of parameter declarations. Each parameter is declared as name: Type (required) or name: Type = default (optional with default):
params {
input: Path // required
extra_file: Path? // optional (defaults to null)
db_file: Path = 'db.json' // optional with default
flag: Boolean // boolean params default to false
}
All standard Nextflow types except Channel and Value can be used for parameter type annotations.
Required and optional parameters
A parameter without a default value is required. If a required parameter is not supplied at runtime (via the command line, a params file, or the config), the run fails immediately with an informative error.
A parameter with the ? suffix on its type is optional and will be null if not supplied. Boolean parameters without a default value implicitly default to false.
Type-based CLI coercion
When a parameter is supplied on the command line, Nextflow converts the string value to the declared type:
| Declared type | String input | Resolved value |
|---|---|---|
Boolean |
'true' |
true |
Integer |
'42' |
42 |
Float |
'3.14' |
3.14 |
Duration |
'1h' |
Duration.of('1h') |
MemoryUnit |
'8 GB' |
MemoryUnit.of('8 GB') |
Path |
'/data' |
Path.of('/data') |
This replaces the heuristic type detection used for legacy parameters.
Compile-time validation
Legacy parameters can be accessed globally by all scripts in the pipeline. While this approach is flexible, it prevents compile-time validation and breaks modularity.
When a module references a param, it implicitly assumes that the param will always be defined by the workflow that uses it. This assumption cannot be validated at compile-time, so if the param is missing, an error will occur only at runtime.
The params block solves this problem by defining all params in one place. It serves as the inputs for the entry workflow, similar to the take: section in named workflows. Parameters should be passed to processes and workflows as explicit inputs, so that every variable reference can be validated against local declarations.
For example, the following workflow:
// main.nf
params.input = '...'
workflow {
HELLO()
}
// hello.nf
workflow HELLO {
println "input = ${params.input}"
}
Can be rewritten as follows:
// main.nf
params {
input: String
}
workflow {
HELLO(params.input)
}
// hello.nf
workflow HELLO {
take:
input: String
main:
println "input = ${input}"
}
Typed parameters can still be used globally by all scripts for backwards compatibility. However, the type checker will only validate params used in the entry workflow and output block. Users should eventually migrate their pipelines as shown above for effective type checking.
Script and config params
Parameters can also be defined in config files:
params {
outdir = 'results'
publish_dir_mode = 'copy'
}
Config params continue to work as before. As a best practice, they should be used only to "configure the configuration."
Some config params can be replaced with native functionality, e.g., outputDir and workflow.output.mode for the above. The nf-core institutional configs, which enable users to run a pipeline with their institutional config entirely from the command line, cannot be easily replaced and provide a clear use case for config params.
Config params are also propagated to the script since the config file can overwrite script params (e.g. in a profile). However, since the script params block only allows params that were explicitly declared, it needs to be able to distinguish between config params and invalid params (e.g. command line param with a typo).
To prevent a circular dependency between the script execution and config resolution, parameters are resolved as follows:
-
Load CLI params from command line, params file
-
Load config files
- Params declared in the
paramsscope are config params - If a config setting references an undeclared param, report an error
- Params assigned in a profile are also marked as config params -- they should be used to overwrite existing params or potential script params
- CLI params override config params
- Params declared in the
-
Execute script, resolve
paramsblock- CLI params and config params override default values in
paramsblock - If a required script param is undefined, report an error
- If a CLI param is not declared in the
paramsblock and is not a config param, report an error
- CLI params and config params override default values in
In other words, params are applied in the following order (lowest to highest precedence):
- Default value in the
paramsblock - Config file (
params { param = value }) - Params file (
-params-file params.json) - Command-line arguments (
--param value)
Any parameter supplied via command line or params file must be declared in the script or config. Supplying an undeclared parameter is an error.
Links
- Community issue: #4669
- Workflow outputs ADR
- Record types ADR
Appendix
Runtime type analysis via reflection
Validating and converting params against declared types requires the type annotations to be fully available at runtime. Parameterized types such as List<String> must provide both the type (List) and the generic type arguments ([String]).
During compilation, type annotations are modeled using ClassNode, which provides the "raw" type and type arguments via getTypeClass() -> Class and getGenericsTypes() -> GenericsType[].
At runtime, type annotations are modeled using Type, for which there are two primary cases:
-
If the type is parameterized, it is a
ParameterizedType, which provides the "raw" type and type arguments viagetRawType() -> ClassandgetActualTypeArguments() -> Type[]. -
Otherwise, the type is a
Classcorresponding to the raw type.
This type information can be obtained at runtime from the following entities:
- Class fields via
Field::getGenericType() -> Type - Method parameters via
Parameter::getParameterizedType() -> Type
For this reason, the params block is compiled as a class, so that each parameter declaration is a field which can model a parameterized type.
Type annotations can be marked as nullable using the ? suffix. This marker is compiled as a custom @Nullable annotation on the corresponding field, so that the runtime can use this information.
For example, when loading a JSON file as a collection of records, Nextflow uses the given record type to validate each JSON object in the collection:
- String values that map to a record field with type
Pathare converted to Path values - If a JSON object is missing a record field that is marked as nullable, it is considered valid
While type annotations are used only at compile-time in all other contexts, they are needed at runtime for pipeline parameters in order to validate and convert external input data to the expected type.