add nextflow d30e48d

This commit is contained in:
2026-04-29 23:01:54 +02:00
parent d0b12d668d
commit 97cc9058d3
2840 changed files with 730250 additions and 0 deletions

View File

@@ -0,0 +1,176 @@
(config-scopes-page)=
# Configuration scopes
This page provides guidance on defining configuration scopes in the Nextflow runtime.
## Overview
The Nextflow configuration is defined as a collection of *scope classes*. Each scope class defines the set of available options, including their name, type, and an optional description for a specific configuration scope.
Scope classes are used to generate a configuration spec, which is in turn used for several purposes:
- Validating config options at runtime (`nextflow run` and `nextflow config`)
- Providing code intelligence in the language server (validation, hover hints, code completion)
- Generating reference documentation (in progress)
Scope classes are also used by the runtime itself as type-safe domain objects. This way, the construciton of domain objects from the configuration map is isolated from the rest of the runtime.
## Definition
### Config scopes
A *config scope* is defined as a class that implements the `ConfigScope` interface. Top-level scope classes must have the `@ScopeName` annotation, which defines the name of the config scope.
For example:
```groovy
package nextflow.hello
import nextflow.config.spec.ConfigScope
import nextflow.config.spec.ScopeName
@ScopeName('hello')
class HelloConfig implements ConfigScope {
}
```
A scope class must provide a default constructor, so that it can be instantiated as an extension point. If no such constructor is defined, the config scope will not be detected by Nextflow. In the above example, this constructor is implicitly defined because no constructors were declared.
The fully-qualified class name (in this case, `nextflow.hello.HelloConfig`) must be included in the list of extension points.
### Config options
A *config option* is defined as a field with the `@ConfigOption` annotation. The field name determines the name of the config option.
For example:
```groovy
@ConfigOption
String createMessage
```
The `@ConfigOption` annotation can specify an optional set of types that are valid in addition to the field type. For example, the `fusion.tags` option, which accepts either a String or Boolean, is declared as follows:
```groovy
@ConfigOption(types=[Boolean])
String tags
```
The field type and any additional types are included in the config spec, allowing them to be used for validation.
The field type can be any Java or Groovy class, but in practice it should be a class that can be constructed from primitive values (numbers, booleans, strings). For example, `Duration` and `MemoryUnit` are standard Nextflow types that can each be constructed from an integer or string.
### Nested scopes
A *nested scope* is defined as a field whose type is an implementation of `ConfigScope`. The field name determines the name of the nested scope.
The scope class referenced by the field type defines config options and scopes in the same manner as top-level scope classes. Unlike top-level scopes, nested scope classes do not need to use the `@ScopeName` annotation or provide a default constructor.
See `ExecutorConfig` and `ExecutorRetryConfig` for an example of how a nested scope is defined and constructed.
### Placeholder scopes
A *placeholder scope* is a config scope that applies to a collection of user-defined names.
For example, the `azure.batch.pools` scope allows the user to define a set of named pools, where each pool is configured with a standard set of options such as `autoScale`, `lowPriority`, `maxVmCount`, etc. These options are defined in a placeholder scope with a placeholder name of `<name>`. Thus, the generic name for the `autoScale` option is `azure.batch.pools.<name>.autoScale`.
A placeholder scope is defined as a field with type `Map<String, P>`, where `P` is a nested scope class which defines the scope options. The field should have the `@PlaceholderName` annotation which defines the placeholder name (e.g. `<name>`).
See `AzBatchOpts` and `AzPoolOpts` for an example of how placeholder scopes are defined and constructed.
### Descriptions
Top-level scope classes and config options should use the `@Description` annotation to provide a description of the scope or option. This description is included in the config spec, which is in turn used by the language server to provide hover hints.
For example:
```groovy
@ScopeName('hello')
@Description('''
The `hello` scope controls the behavior of the `nf-hello` plugin.
''')
class HelloConfig implements ConfigScope {
@ConfigOption
@Description('''
Message to print to standard output when a run is initialized.
''')
String createMessage
}
```
Nested scopes and placeholder scopes may also use this annotation, but will inherit the description of the top-level scope by default.
### Best practices
The Nextflow runtime adheres the following best practices where appropriate:
- Config options should be declared as public and final, so that the scope class can be used as an immutable domain object.
- Scope classes should define a constructor that initializes each field from a map, casting each map property to the required type and providing default values as needed.
- In cases where an option defaults to an environment variable, the environment map should be provided as an additional constructor argument rather than accessing the system environment directly.
- In cases where an option with a primitive type (e.g., `int`, `float`, `boolean`) can be unspecified without a default value, it should be declared with the equivalent reference type (e.g. `Integer`, `Float`, `Boolean`), otherwise it should use the primitive type.
- In cases where an option represents a path, it should be declared as a `String` and allow clients to construct paths as needed, since path construction may depend on plugins which aren't yet loaded.
For example:
```groovy
import nextflow.config.spec.ConfigOption
import nextflow.config.spec.ConfigScope
import nextflow.config.spec.ScopeName
@ScopeName('hello')
class HelloConfig implements ConfigScope {
@ConfigOption
final String createMessage
@ConfigOption
final boolean verbose
HelloConfig() {}
HelloConfig(Map opts, Map env) {
this.createMessage = opts.createMessage ?: env.get('NXF_HELLO_CREATE_MESSAGE')
this.verbose = opts.verbose as boolean
}
}
```
## Usage
### Runtime
Nextflow validates the config map after it is loaded. Top-level config scopes are loaded by the plugin system as extension points and converted into a config spec, which is used to validate the config map.
Plugins are loaded after the config is loaded and before it is validated, since plugins can also define config scopes. If a third-party plugin declares a config scope, it must be explicitly enabled in order to validate config options from the plugin. Otherwise, Nextflow will report these options as unrecognized.
Core plugins are loaded automatically based on other config options. Therefore, Nextflow only validates config from a core plugin when that plugin is loaded. Otherwise, any config options from the plugin are ignored -- they are neither validated nor reported as unrecognized.
For example, when the `process.executor` config option is set to `'awsbatch'`, the `nf-amazon` is automatically loaded. In this case, all options in the `aws` config scope will be validated. If the executor is not set to `'awsbatch'`, all `aws` options will be ignored. This way, config files can be validated appropriately without loading additional core plugins that won't be used by the run.
The scope classes themselves can be used to construct domain objects on-demand from the config map. For example, an `ExecutorConfig` can be constructed from the `executor` config scope as follows:
```groovy
new ExecutorConfig( Global.session.config.executor as Map ?: Collections.emptyMap() )
```
:::{note}
In practice, it is better to avoid the use of `Global` and provide an instance of `Session` to the client class instead.
:::
### Config spec
Config scope classes can be converted into a config spec with the `SpecNode` class, which uses reflection to extract metadata such as scope names, option names, types, and descriptions. This spec is rendered to JSON and used by the language server at build-time to provide code intelligence such as code completion and hover hints.
### Documentation
The config spec described above can be rendered to Markdown using the `MarkdownRenderer` class. It produces a Markdown document approximating the {ref}`config-options` page.
This approach to docs generation is not yet complete, and has not been incorporated into the build process yet. However, it can be used to check for discrepancies between the source code and docs when making changes. The documentation should match the `@Description` annotations as closely as possible, but may contain additional details such as version notes and extra paragraphs.

View File

@@ -0,0 +1,10 @@
(diagram-page)=
# Workflow Diagram
The following diagram is a high-level overview of the Nextflow source code in a similar style as the {ref}`workflow diagram <workflow-diagram>` visualization for Nextflow pipelines. Each node and subgraph is a class. Arrows depict the flow of data and/or communication between classes.
In general, nodes with sharp corners are "record" classes that simply hold information, while nodes with rounded edges are "function" classes that transform some input into an output. Subgraphs are either long-running classes, i.e. "places where things happen", or one of the other two types for which it was useful to expand and show internal details.
```{mermaid} diagrams/overview.mmd
```

View File

@@ -0,0 +1 @@
nextflow-merged.mmd

View File

@@ -0,0 +1,17 @@
# Class Diagrams
This directory contains class diagrams of the Nextflow source code, abridged and annotated for relevance and ease of use.
Each node is a class. Fields are selectively documented in order to show only core data structures and the classes that "own" them. Methods are not explicitly documented, but they are mentioned in certain links where appropriate.
Links between classes denote one of the following relationships:
- Inheritance (`A <|-- B`): `B` is a subclass of `A`
- Composition (`A --* B`): `A` contains `B`
- Instantiation (`A --> B : f`): `A` creates instance(s) of `B` at runtime via `A::f()`
Some links are commented out or not included at all, in order to focus on the most important classes and relationships. You can view these "hidden" links by simply uncommenting them, but I have found that their significance is sufficiently clear from the description files.
A separate diagram description is provided for each package. These files are interoperable, which means that you can combine any subset of files into a larger diagram description. The `merge-diagrams.sh` can create a merged file for you automatically, and it includes a sensible default set of packages.
You can use the [Mermaid Live Editor](https://mermaid-js.github.io/mermaid-live-editor/edit) or the [Mermaid CLI](https://github.com/mermaid-js/mermaid-cli) to render the diagram in a variety of image formats.

View File

@@ -0,0 +1,35 @@
#!/bin/bash
packages=()
packages+=("nextflow")
# packages+=("nextflow.ast")
packages+=("nextflow.cache")
packages+=("nextflow.cli")
# packages+=("nextflow.cloud.aws")
# packages+=("nextflow.cloud.aws.nio")
# packages+=("nextflow.cloud.azure")
# packages+=("nextflow.cloud.google")
packages+=("nextflow.config")
# packages+=("nextflow.container")
packages+=("nextflow.dag")
# packages+=("nextflow.executor")
# packages+=("nextflow.extension")
# packages+=("nextflow.ga4gh")
# packages+=("nextflow.k8s")
# packages+=("nextflow.plugin")
packages+=("nextflow.processor")
# packages+=("nextflow.scm")
packages+=("nextflow.script")
# packages+=("nextflow.secret")
# packages+=("nextflow.trace")
outfile="nextflow-merged.mmd"
echo "classDiagram" > ${outfile}
for package in "${packages[@]}"; do
echo "${package}"
tail -n +2 "${package}.mmd" >> ${outfile}
echo >> ${outfile}
done

View File

@@ -0,0 +1,7 @@
classDiagram
%%
%% nextflow.ast
%%
ScriptParser --> NextflowDSLImpl : parse
ScriptParser --> NextflowXformImpl : parse
ScriptParser --> OpXformImpl : parse

View File

@@ -0,0 +1,22 @@
classDiagram
%%
%% nextflow.cache
%%
Session --* CacheDB
CacheDB --* CacheStore
CacheStore <|-- DefaultCacheStore
CacheStore <|-- CloudCacheStore
class DefaultCacheStore {
uniqueId : UUID
runName : String
baseDir : Path
}
class CloudCacheStore {
uniqueId : UUID
runName : String
basePath : Path
}

View File

@@ -0,0 +1,29 @@
classDiagram
%%
%% nextflow.cli
%%
class Launcher {
cliOptions : CliOptions
command : CmdBase
}
Launcher --* CliOptions
Launcher --* CmdBase
%% CmdBase <|-- CmdClean
%% CmdBase <|-- CmdClone
%% CmdBase <|-- CmdConfig
CmdBase <|-- CmdConsole
%% CmdBase <|-- CmdDrop
%% CmdBase <|-- CmdFs
CmdBase <|-- CmdHelp
CmdBase <|-- CmdInfo
%% CmdBase <|-- CmdKubeRun
%% CmdBase <|-- CmdList
%% CmdBase <|-- CmdLog
%% CmdBase <|-- CmdNode
%% CmdBase <|-- CmdPlugin
%% CmdBase <|-- CmdPull
CmdBase <|-- CmdRun
%% CmdBase <|-- CmdSecret
%% CmdBase <|-- CmdSelfUpdate
%% CmdBase <|-- CmdView

View File

@@ -0,0 +1,21 @@
classDiagram
%%
%% nextflow.cloud.aws
%%
Executor <|-- AwsBatchExecutor
TaskHandler <|-- AwsBatchTaskHandler
BashWrapperBuilder <|-- AwsBatchScriptLauncher
AwsBatchExecutor --* AwsOptions
AwsOptions --* AwsConfig
AwsConfig --* AwsBatchConfig
AwsConfig --* AwsS3Config
AwsBatchExecutor --> ParallelPollingMonitor : init
AwsBatchExecutor --> AwsBatchTaskHandler : submit
AwsBatchTaskHandler --> AwsBatchScriptLauncher : submit
%% TaskPollingMonitor <|-- ParallelPollingMonitor
SimpleFileCopyStrategy <|-- AwsBatchFileCopyStrategy
AwsBatchScriptLauncher --* AwsBatchFileCopyStrategy

View File

@@ -0,0 +1,33 @@
classDiagram
%%
%% nextflow.cloud.aws.nio
%%
FileSystemProvider <|-- S3FileSystemProvider
S3FileSystemProvider --> S3FileSystem : newFileSystem
class S3FileSystem {
client : S3Client
endpoint : String
}
S3FileSystem --* S3Client
class S3Client {
client : AmazonS3
cannedAcl : CannedAccessControlList
kmsKeyId : String
storageEncryption : SSEAlgorithm
transferManager : TransferManager
transferPool : ExecutorService
uploadChunkSize : Long
uploadMaxThreads : Integer
}
Path <|-- S3Path
class S3Path {
bucket : String
parts : List~String~
fileSystem : S3FileSystem
}
S3Path --* S3FileSystem

View File

@@ -0,0 +1,17 @@
classDiagram
%%
%% nextflow.cloud.azure
%%
Executor <|-- AzBatchExecutor
TaskHandler <|-- AzBatchTaskHandler
BashWrapperBuilder <|-- AzBatchScriptLauncher
AzBatchExecutor --* AzConfig
AzBatchExecutor --> AzBatchService : register
AzBatchExecutor --> TaskPollingMonitor : init
AzBatchExecutor --> AzBatchTaskHandler : submit
AzBatchTaskHandler --> AzBatchScriptLauncher : submit
SimpleFileCopyStrategy <|-- AzFileCopyStrategy
AzBatchScriptLauncher --* AzFileCopyStrategy

View File

@@ -0,0 +1,13 @@
classDiagram
%%
%% nextflow.cloud.google
%%
Executor <|-- GoogleBatchExecutor
TaskHandler <|-- GoogleBatchTaskHandler
BashWrapperBuilder <|-- GoogleBatchScriptLauncher
GoogleBatchExecutor --* BatchConfig
GoogleBatchExecutor --> TaskPollingMonitor : init
GoogleBatchExecutor --> GoogleBatchTaskHandler : submit
GoogleBatchTaskHandler --> GoogleBatchScriptLauncher : submit

View File

@@ -0,0 +1,7 @@
classDiagram
%%
%% nextflow.config
%%
Session --* ConfigMap
CmdRun --> ConfigBuilder : run
ConfigBuilder --> ConfigMap : build

View File

@@ -0,0 +1,16 @@
classDiagram
%%
%% nextflow.container
%%
direction LR
BashWrapperBuilder --> ContainerBuilder : build
ContainerBuilder <|-- CharliecloudBuilder
ContainerBuilder <|-- DockerBuilder
ContainerBuilder <|-- PodmanBuilder
ContainerBuilder <|-- ShifterBuilder
ContainerBuilder <|-- SingularityBuilder
ContainerBuilder <|-- UdockerBuilder
SingularityBuilder <|-- ApptainerBuilder

View File

@@ -0,0 +1,33 @@
classDiagram
%%
%% nextflow.dag
%%
Session --* DAG
class DAG {
vertices : List~Vertex~
edges : List~Edge~
}
DAG "1" --* "*" Vertex
DAG "1" --* "*" Edge
class Vertex {
label : String
type : Type
operators : List~DataflowProcessor~
process : TaskProcessor
}
class Edge {
channel : Object
from : Vertex
to : Vertex
label : String
}
%% DagRenderer <|-- CytoscapeHtmlRenderer
%% DagRenderer <|-- CytoscapeJsRenderer
%% DagRenderer <|-- DotRenderer
%% DagRenderer <|-- GexfRenderer
%% DagRenderer <|-- GraphvizRenderer
%% DagRenderer <|-- MermaidRenderer

View File

@@ -0,0 +1,66 @@
classDiagram
%%
%% nextflow.executor
%%
ExecutorFactory --> Executor : getExecutor
class Executor {
name : String
monitor : TaskMonitor
}
Executor --* TaskMonitor
Executor --> TaskHandler : submit
TaskMonitor <|-- TaskPollingMonitor
class TaskPollingMonitor {
capacity : int
submitRateLimit : RateLimiter
pollIntervalMillis : long
dumpInterval : Duration
}
TaskPollingMonitor <|-- LocalPollingMonitor
class LocalPollingMonitor {
maxCpus : int
maxMemory : long
}
Executor <|-- AbstractGridExecutor
Executor <|-- LocalExecutor
%% Executor <|-- NopeExecutor
%% AbstractGridExecutor <|-- CondorExecutor
%% AbstractGridExecutor <|-- HyperQueueExecutor
%% AbstractGridExecutor <|-- LsfExecutor
%% AbstractGridExecutor <|-- MoabExecutor
%% AbstractGridExecutor <|-- NqsiiExecutor
%% AbstractGridExecutor <|-- OarExecutor
%% AbstractGridExecutor <|-- PbsExecutor
%% AbstractGridExecutor <|-- SgeExecutor
%% AbstractGridExecutor <|-- SlurmExecutor
%% PbsExecutor <|-- PbsProExecutor
%% SgeExecutor <|-- CrgExecutor
%% TaskHandler <|-- CachedTaskHandler
TaskHandler <|-- GridTaskHandler
TaskHandler <|-- LocalTaskHandler
TaskHandler <|-- NativeTaskHandler
%% TaskHandler <|-- NopeTaskHandler
%% TaskHandler <|-- StoredTaskHandler
class BashWrapperBuilder {
bean : TaskBean
copyStrategy : ScriptFileCopyStrategy
}
BashWrapperBuilder --* TaskBean
BashWrapperBuilder --* ScriptFileCopyStrategy
ScriptFileCopyStrategy <|-- SimpleFileCopyStrategy
class SimpleFileCopyStrategy {
stageinMode : String
stageoutMode : String
targetDir : Path
workDir : Path
}

View File

@@ -0,0 +1,30 @@
classDiagram
%%
%% nextflow.extension
%%
direction LR
ChannelEx --> DumpOp : dump
Nextflow --> GroupKey : groupKey
OperatorImpl --> BranchOp : branch
OperatorImpl --> BufferOp : buffer
OperatorImpl --> CollectFileOp : collectFile
OperatorImpl --> CollectOp : collect
OperatorImpl --> CombineOp : combine
OperatorImpl --> ConcatOp : concat
OperatorImpl --> CrossOp : cross
OperatorImpl --> GroupTupleOp : groupTuple
OperatorImpl --> JoinOp : join
OperatorImpl --> MapOp : map
OperatorImpl --> MergeOp : merge
OperatorImpl --> MixOp : mix
OperatorImpl --> MultiMapOp : multiMap
OperatorImpl --> RandomSampleOp : randomSample
OperatorImpl --> SplitOp : splitCsv, splitFasta, splitFastq, splitText
OperatorImpl --> TakeOp : take
OperatorImpl --> ToListOp : toList, toSortedList
OperatorImpl --> TransposeOp : transpose
OperatorImpl --> UntilOp : until
WorkflowBinding --> OpCall : invokeMethod

View File

@@ -0,0 +1,14 @@
classDiagram
%%
%% nextflow.ga4gh
%%
Executor <|-- TesExecutor
%% TaskHandler <|-- TesTaskHandler
%% BashWrapperBuilder <|-- TesBashBuilder
TesExecutor --> TaskPollingMonitor : init
TesExecutor --> TesTaskHandler : submit
TesTaskHandler --> TesBashBuilder : submit
%% ScriptFileCopyStrategy <|-- TesFileCopyStrategy
TesBashBuilder --* TesFileCopyStrategy

View File

@@ -0,0 +1,57 @@
classDiagram
%%
%% nextflow.k8s
%%
Executor <|-- K8sExecutor
TaskHandler <|-- K8sTaskHandler
BashWrapperBuilder <|-- K8sWrapperBuilder
K8sExecutor --> TaskPollingMonitor : init
K8sExecutor --> K8sTaskHandler : submit
K8sExecutor --* K8sClient
K8sTaskHandler --> K8sWrapperBuilder : submit
CmdKubeRun --> K8sDriverLauncher : run
class K8sDriverLauncher {
args : List~String~
cmd : CmdKubeRun
config : ConfigObject
configMapName : String
headCpus : int
headImage : String
headMemory : String
headPreScript : String
paramsFile : String
pipelineName : String
runName : String
}
K8sDriverLauncher --* K8sClient
K8sDriverLauncher --* K8sConfig
K8sClient --* ClientConfig
%% ConfigDiscovery --> ClientConfig : discover
class K8sConfig {
target : Map
podOptions : PodOptions
}
K8sConfig --* PodOptions
class PodOptions {
affinity : Map
annotations : Map
automountServiceAccountToken : boolean
configMaps : Collection~PodMountConfig~
envVars : Collection~PodEnv~
imagePullPolicy : String
imagePullSecret : String
labels : Map
nodeSelector : PodNodeSelector
priorityClassName : String
privileged : Boolean
secrets : Collection~PodMountSecret~
securityContext : PodSecurityContext
tolerations : List~Map~
volumeClaims : Collection~PodVolumeClaim~
}

View File

@@ -0,0 +1,21 @@
classDiagram
%%
%% nextflow
%%
class Nextflow
class Channel
class Session {
baseDir : Path
binding : ScriptBinding
cache : CacheDB
commandLine : String
commitId : String
config : Map
configFiles : List~Path~
dag : DAG
profile : String
runName : String
script : BaseScript
uniqueId : UUID
workDir : Path
}

View File

@@ -0,0 +1,14 @@
classDiagram
%%
%% nextflow.plugin
%%
CmdRun --> Plugins : run
Plugins --> PluginsFacade : init
PluginsFacade "1" --> "*" PluginRef : load
class PluginRef {
id : String
version : String
}

View File

@@ -0,0 +1,44 @@
classDiagram
%%
%% nextflow.processor
%%
%% ProcessDef --> TaskProcessor : run
class TaskProcessor {
config : ProcessConfig
executor : Executor
id : int
name : String
operator : DataflowProcessor
taskBody : BodyDef
}
TaskProcessor --> TaskRun : invokeTask
TaskProcessor --> PublishDir : finalizeTask
class TaskRun {
config : TaskConfig
context : TaskContext
hash : HashCode
id : TaskId
index : int
inputs : Map
name : String
outputs : Map
runType : RunType
type : ScriptType
workDir : Path
}
TaskRun --* TaskConfig
TaskRun --* TaskContext
TaskRun --> TaskBean : toTaskBean
class TaskConfig {
target : Map
binding : Map
}
class TaskContext {
holder : Map
script : Script
name : String
}

View File

@@ -0,0 +1,65 @@
classDiagram
%%
%% nextflow.scm
%%
direction LR
CmdRun --> AssetManager : run
class AssetManager {
project : String
mainScript : String
provider : RepositoryProvider
strategy : RepositoryStrategy
hub : String
providerConfigs : List~ProviderConfig~
}
class RepositoryStrategyType {
<<enumeration>>
LEGACY
MULTI_REVISION
}
AssetManager --> RepositoryStrategyType
AssetManager "1" --o "1" RepositoryStrategy
AssetManager "1" --o "1" RepositoryProvider
AssetManager "1" --* "*" ProviderConfig
class RepositoryStrategy {
<<interface>>
}
class AbstractRepositoryStrategy {
<<abstract>>
project : String
provider : RepositoryProvider
root : File
}
class LegacyRepositoryStrategy {
localPath : File
}
class MultiRevisionRepositoryStrategy {
revision : String
bareRepo : File
commitPath : File
revisionSubdir : File
}
RepositoryStrategy <|-- AbstractRepositoryStrategy
AbstractRepositoryStrategy <|-- LegacyRepositoryStrategy
AbstractRepositoryStrategy <|-- MultiRevisionRepositoryStrategy
class RepositoryProvider {
<<abstract>>
}
RepositoryStrategy --> RepositoryProvider
RepositoryProvider <|-- AzureRepositoryProvider
RepositoryProvider <|-- BitbucketRepositoryProvider
RepositoryProvider <|-- BitbucketServerRepositoryProvider
RepositoryProvider <|-- GiteaRepositoryProvider
RepositoryProvider <|-- GithubRepositoryProvider
RepositoryProvider <|-- GitlabRepositoryProvider
RepositoryProvider <|-- LocalRepositoryProvider

View File

@@ -0,0 +1,130 @@
classDiagram
%%
%% nextflow.script
%%
CmdRun --> ScriptRunner : run
class ScriptRunner {
scriptFile : ScriptFile
session : Session
}
ScriptRunner --* ScriptFile
ScriptRunner --> ScriptParser : execute
ScriptParser --> BaseScript : parse
class ScriptFile {
source : Path
main : Path
repository : String
revisionInfo : AssetManager.RevisionInfo
localPath : Path
projectName : String
}
class BaseScript {
meta : ScriptMeta
entryFlow : WorkflowDef
}
BaseScript --* ScriptBinding
BaseScript --* ScriptMeta
BaseScript --> IncludeDef : include
IncludeDef --> ScriptParser : load0
class ScriptBinding {
scriptPath : Path
args : List~String~
params : ParamsMap
configEnv : Map
entryName : String
}
class ScriptMeta {
scriptPath : Path
definitions : Map
imports : Map
module : boolean
}
ScriptMeta "1" --* "*" ComponentDef : definitions
ScriptMeta "1" --* "*" ComponentDef : imports
ComponentDef <|-- FunctionDef
ComponentDef <|-- ProcessDef
ComponentDef <|-- WorkflowDef
class FunctionDef {
target : Object
name : String
alias : String
}
class ProcessDef {
processName : String
simpleName : String
baseName : String
rawBody : Closure~BodyDef~
}
ProcessDef --* ProcessConfig
ProcessDef --* BodyDef
ProcessDef --* ChannelOut
class WorkflowDef {
name : String
body : BodyDef
declaredInputs : List~String~
declaredOutputs : List~String~
variableNames : Set~String~
}
WorkflowDef --* BodyDef
WorkflowDef --* WorkflowBinding
WorkflowDef --* ChannelOut
class ProcessConfig {
configProperties : Map
inputs : InputsList
outputs : OutputsList
}
ProcessConfig --* InputsList
ProcessConfig --* OutputsList
class BodyDef {
closure : Closure
source : String
type : ScriptType
isShell : boolean
}
class ChannelOut {
target : List~DataflowWriteChannel~
channels : Map
}
class WorkflowBinding {
vars : Map
}
class InputsList {
target : List~InParam~
}
InputsList "1" --* "*" InParam
class OutputsList {
target : List~OutParam~
}
OutputsList "1" --* "*" OutParam
%% InParam <|-- BaseInParam
%% BaseInParam <|-- EachInParam
%% BaseInParam <|-- EnvInParam
%% BaseInParam <|-- FileInParam
%% BaseInParam <|-- StdInParam
%% BaseInParam <|-- TupleInParam
%% BaseInParam <|-- ValueInParam
%% OutParam <|-- BaseOutParam
%% BaseOutParam <|-- EachOutParam
%% BaseOutParam <|-- EnvOutParam
%% BaseOutParam <|-- FileOutParam
%% BaseOutParam <|-- StdOutParam
%% BaseOutParam <|-- TupleOutParam
%% BaseOutParam <|-- ValueOutParam

View File

@@ -0,0 +1,12 @@
classDiagram
%%
%% nextflow.secret
%%
ConfigBuilder --> SecretsLoader : build
BaseScript --> SecretsLoader : run
BashWrapperBuilder --> SecretsLoader : build
SecretsLoader --> SecretsProvider : load
SecretsProvider --> Secret : getSecret
SecretsProvider <|-- LocalSecretsProvider
Secret <|-- SecretImpl

View File

@@ -0,0 +1,13 @@
classDiagram
%%
%% nextflow.trace
%%
Session --> TraceObserverFactory : init
TraceObserverFactory "1" --> "*" TraceObserver : create
TraceObserver <|-- AnsiLogObserver
TraceObserver <|-- GraphObserver
TraceObserver <|-- ReportObserver
TraceObserver <|-- TimelineObserver
TraceObserver <|-- TraceFileObserver
TraceObserver <|-- WorkflowStatsObserver

View File

@@ -0,0 +1,69 @@
flowchart TB
subgraph Launcher
subgraph CmdRun
subgraph AssetManager
ScriptFile
end
subgraph ConfigBuilder
ConfigParser([ConfigParser])
ConfigBase([ConfigBase])
end
subgraph ScriptRunner
subgraph Session
ConfigMap
DAG
ExecutorFactory([ExecutorFactory])
subgraph TaskProcessor
TaskRun
end
subgraph Executor
subgraph TaskMonitor
TaskHandler
end
TaskBean
BashWrapperBuilder([BashWrapperBuilder])
end
TraceRecord
CacheFactory([CacheFactory])
CacheDB
TraceObserver([TraceObserver])
end
ScriptParser([ScriptParser])
BaseScript([BaseScript])
subgraph ScriptMeta
WorkflowDef([WorkflowDef])
ProcessDef([ProcessDef])
FunctionDef([FunctionDef])
end
IncludeDef([IncludeDef])
OpCall([OpCall])
end
ConfigParser --> ConfigBase
ConfigBase --> ConfigMap
ScriptFile --> ScriptParser
ScriptParser --> BaseScript
BaseScript --> WorkflowDef
BaseScript --> ProcessDef
BaseScript --> FunctionDef
BaseScript --> IncludeDef
IncludeDef --> ScriptParser
WorkflowDef --> OpCall
OpCall --> DAG
ProcessDef --> DAG
DAG --> TaskRun
TaskRun --> DAG
ExecutorFactory --> Executor
ConfigMap --> Executor
ProcessDef --> TaskProcessor
ConfigMap --> TaskProcessor
TaskRun --> TaskHandler
TaskRun --> TaskBean
TaskBean --> BashWrapperBuilder
BashWrapperBuilder --> TaskHandler
CacheFactory --> CacheDB
TaskHandler --> CacheDB
TaskHandler --> TraceRecord
TraceRecord --> CacheDB
TaskHandler --> TraceObserver
end
end

View File

@@ -0,0 +1,181 @@
(contributing-page)=
# Overview
This section provides a high-level overview of the Nextflow source code for users who want to understand or contribute to it. Rather than a comprehensive API documentation, these docs simply provide a conceptual map to help you understand the key concepts of the Nextflow implementation, and to quickly find code sections of interest for further investigation.
Before you dive into code, be sure to check out the [CONTRIBUTING.md](https://github.com/nextflow-io/nextflow/blob/master/CONTRIBUTING.md) for Nextflow to learn about the many ways to contribute to the project.
## IntelliJ IDEA
The suggested development environment is [IntelliJ IDEA](https://www.jetbrains.com/idea/download/). Nextflow development with IntelliJ IDEA requires a recent version of the IDE (2019.1.2 or later).
After installing IntelliJ IDEA, use the following steps to use it with Nextflow:
1. Clone the Nextflow repository to a directory in your computer.
2. Open IntelliJ IDEA and go to **File > New > Project from Existing Sources...**.
3. Select the Nextflow project root directory in your computer and click **OK**.
4. Select **Import project from external model > Gradle** and click **Finish**.
5. After the import process completes, select **File > Project Structure...**.
6. Select **Project**, and make sure that the **SDK** field contains Java 11 (or later).
7. Go to **File > Settings > Editor > Code Style > Groovy > Imports** and apply the following settings:
* Use single class import
* Class count to use import with '*': `99`
* Names count to use static import with '*': `99`
* Imports layout:
* `import java.*`
* `import javax.*`
* *blank line*
* all other imports
* all other static imports
New files must include the appropriate license header boilerplate and the author name(s) and contact email(s) ([see for example](https://github.com/nextflow-io/nextflow/blob/e8945e8b6fc355d3f2eec793d8f288515db2f409/modules/nextflow/src/main/groovy/nextflow/Const.groovy#L1-L15)).
## Groovy
Nextflow is written in [Groovy](http://groovy-lang.org/), which is itself a programming language based on [Java](https://www.java.com/). Groovy is designed to be highly interoperable with Java -- Groovy programs compile to Java bytecode, and nearly any Java program is also a valid Groovy program. However, Groovy adds several language features (e.g. closures, list and map literals, optional typing, optional semicolons, meta-programming) and standard libraries (e.g. JSON and XML parsing) that greatly improve the overall experience of developing for the Java virtual machine.
Recommended resources for Groovy, from most reference-complete to most user-friendly, are listed below:
- [Groovy documentation](http://groovy-lang.org/documentation.html)
- [Groovy in Action](https://www.manning.com/books/groovy-in-action-second-edition)
- [Groovy: The Awesome Parts](https://www.slideshare.net/paulk_asert/awesome-groovy)
- [Groovy cheat sheet](http://www.cheat-sheets.org/saved-copy/rc015-groovy_online.pdf)
## Software Dependencies
Nextflow depends on a variety of libraries and frameworks, the most prominent of which are listed below:
- [AWS SDK for Java 1.x](https://aws.amazon.com/sdk-for-java/): AWS integration
- [Azure SDK for Java](https://learn.microsoft.com/en-us/azure/developer/java/sdk/): Azure integration
- [Google Cloud Client Libraries for Java](https://cloud.google.com/java/docs/reference): Google Cloud integration
- [GPars](http://gpars.org/1.2.1/guide/guide/dataflow.html): dataflow concurrency
- [Gradle](https://gradle.org/): build automation
- [JCommander](https://jcommander.org/): command line interface
- [JGit](https://www.eclipse.org/jgit/): Git integration
- [Kryo](https://github.com/EsotericSoftware/kryo): serialization
- [LevelDB](https://mvnrepository.com/artifact/org.iq80.leveldb/leveldb): key-value store for the cache database
- [Logback](https://logback.qos.ch/): application logging
- [PF4J](https://pf4j.org/): plugin extensions
- [Spock](https://spockframework.org/): unit testing framework
Any other integrations are likely implemented using a CLI (e.g. Conda, Docker, HPC schedulers) or REST API (e.g. Kubernetes).
## Class Diagrams
Each package has a class diagram, abridged and annotated for relevance and ease of use.
Each node is a class. Fields are selectively documented in order to show only the core data structures and the classes that "own" them. Methods are not explicitly documented, but they are mentioned in certain links where appropriate. Links are selectively documented in order to show only the most important classes and relationships.
Links between classes denote one of the following relationships:
- Inheritance (`A <|-- B`): `B` is a subclass of `A`
- Composition (`A --* B`): `A` contains `B`
- Instantiation (`A --> B : f`): `A` creates instance(s) of `B` at runtime via `A::f()`
See {ref}`packages-page` for the list of Nextflow packages.
```{warning}
Class diagrams are manually curated, so they might not always reflect the latest version of the source code.
```
## Building from source
If you are interested in modifying the source code, you only need Java 11 or later to build Nextflow from source. Nextflow uses the [Gradle](http://www.gradle.org/) build automation system, but you do not need to install Gradle to build Nextflow. In other words, if you can run Nextflow, then you can probably build it too!
To build locally from a branch (useful for testing PRs):
```bash
git clone -b <branch> git@github.com:nextflow-io/nextflow.git
cd nextflow
make compile
```
The build system will automatically download all of the necessary dependencies on the first run, which may take several minutes.
Once complete, you can run your local build of Nextflow using the `launch.sh` script in place of the `nextflow` command:
```bash
./launch.sh run <script> ...
```
Alternatively, you can build a self-contained executable with the following command:
```bash
make pack
```
It will create a binary in the `build/releases` directory which can be used in place of `nextflow`. This approach is useful when testing a pipeline that uses third-party plugins, which is not supported by `launch.sh`.
## Testing
To run the unit tests:
```bash
# run all tests
make test
# run individual test
make test module=<nextflow|plugins:nf-amazon|...> class=<package>.<class>.<method>
# refer to the Makefile for all build rules
```
When a test fails, it will give you a report that you can open in your browser to view the reason for each failed test. The **Standard output** tab is particularly useful as it shows the console output of each test.
To run the integration tests:
```bash
cd tests/checks
./qrun.sh
```
To run a specific integration test:
```bash
cd tests/checks
./qrun.sh <FOLDER>
```
To test the documentation snippets:
```bash
cd docs/snippets
./test.sh
```
Refer to the [build.yml](https://github.com/nextflow-io/nextflow/tree/master/.github/workflows/build.yml) configuration to see how to run other end-to-end tests locally.
## Debugging
### Groovy REPL
The `groovysh` command provides a command-line REPL that you can use to play around with Groovy code independently of Nextflow. The `groovyConsole` command provides a graphical REPL similar to `nextflow console`. These commands require a standalone Groovy distribution, which can be installed as described for Java on the {ref}`Installation <install-requirements>` page.
:::{note}
If you are using WSL, you must also install an X server for Windows, such as [VcXsrv](https://sourceforge.net/projects/vcxsrv/) or [Xming](http://www.straightrunning.com/XmingNotes/), in order to use these commands.
:::
### IntelliJ IDEA
:::{versionadded} 23.09.0-edge
:::
You can perform limited breakpoint debugging on a Nextflow script using IntelliJ IDEA.
1. Set a breakpoint in your Nextflow script by clicking on a line number.
2. Run `nextflow -remote-debug run <script>`
3. Select the **Run / Debug Configurations** dropdown, select **Edit Configurations...**, and create a new configuration of type **Remote JVM Debug**. Set the port that appeared in the terminal when you launched your Nextflow script. Click **OK**.
4. Select the green bug icon to begin the remote debug session. The Debug window will appear and allow you to step through and inspect your script as it runs.
Note that this approach can only be used to debug the *script* execution, which does not include the *pipeline* execution.

View File

@@ -0,0 +1,172 @@
# `nextflow.ast`
The `nextflow.ast` package implements the Nextflow language extensions as AST transforms.
## Class Diagram
```{mermaid} diagrams/nextflow.ast.mmd
```
```{note}
Some classes may be excluded from the above diagram for brevity.
```
## Notes
The Nextflow scripting language is essentially Groovy with some extensions, implemented as transformations to the abstract syntax tree (AST). Every Nextflow script is syntactically (but not semantically) valid Groovy.
You can see the effect of Nextflow's AST transforms by using the Nextflow console:
1. Run `nextflow console` to open the console
2. Enter a Nextflow script
3. Execute the script
4. Go to **Script** > **Inspect AST**
Here is the example from {ref}`your-first-script`:
```nextflow
params.str = 'Hello world!'
process split_letters {
output:
path 'chunk_*'
script:
"""
printf '${params.str}' | split -b 6 - chunk_
"""
}
process convert_to_upper {
input:
path x
output:
stdout
script:
"""
cat $x | tr '[a-z]' '[A-Z]'
"""
}
workflow {
split_letters | flatten | convert_to_upper | view { it.trim() }
}
```
Here it is after being parsed and de-sugared by Groovy:
```groovy
params.str = 'Hello world!'
process( split_letters( {
output:
path('chunk_*')
script:
"""
printf '${params.str}' | split -b 6 - chunk_
"""
} ))
process( convert_to_upper( {
input:
path(x)
output:
stdout
script:
"""
cat $x | tr '[a-z]' '[A-Z]'
"""
} ))
workflow({
split_letters | flatten | convert_to_upper | view { it.trim() }
})
```
Here it is after being transformed by Nextflow (whitespace edited for readability):
```groovy
import static nextflow.Nextflow.*
import org.apache.commons.lang.StringUtils as StringUtils
import groovy.transform.Field as Field
import java.nio.file.Path as Path
import nextflow.Channel as Channel
import nextflow.util.Duration as Duration
import nextflow.util.MemoryUnit as MemoryUnit
import nextflow.io.ValueObject as ValueObject
import nextflow.Channel as channel
@groovy.transform.BaseScript
public class script1677225313239 extends nextflow.script.BaseScript {
public script1677225313239() {
nextflow.script.ScriptMeta.get(this).setDsl1ProcessNames(['split_letters', 'convert_to_upper'])
}
public script1677225313239(final groovy.lang.Binding context) {
super.setBinding(context)
nextflow.script.ScriptMeta.get(this).setDsl1ProcessNames(['split_letters', 'convert_to_upper'])
}
public static void main(final java.lang.String[] args) {
org.codehaus.groovy.runtime.InvokerHelper.runScript(script1677225313239, args)
}
@groovy.transform.Generated
protected java.lang.Object runScript() {
params.str = 'Hello world!'
this.process('split_letters', {
this._out_path('chunk_*')
new nextflow.script.BodyDef(
{
"printf '$params.str' | split -b 6 - chunk_"
},
'"""\n printf \'${params.str}\' | split -b 6 - chunk_\n """\n',
'script',
[
new nextflow.script.TokenValRef('params.str', 8, 13)
]
)
})
this.process('convert_to_upper', {
this._in_path(new nextflow.script.TokenVar('x'))
this._out_stdout()
new nextflow.script.BodyDef(
{
"cat $x | tr '[a-z]' '[A-Z]'"
},
'"""\n cat $x | tr \'[a-z]\' \'[A-Z]\'\n """\n',
'script',
[
new nextflow.script.TokenValRef('x', 19, 8)
]
)
})
this.workflow({
new nextflow.script.BodyDef(
{
split_letters | flatten | convert_to_upper | this.view({
it.trim()
})
},
' split_letters | flatten | convert_to_upper | view { it.trim() }\n',
'workflow',
[
new nextflow.script.TokenValRef('flatten', 24, 18),
new nextflow.script.TokenValRef('split_letters', 24, 3),
new nextflow.script.TokenValRef('convert_to_upper', 24, 28)
]
)
})
}
}
```

View File

@@ -0,0 +1,21 @@
# `nextflow.cache`
The `nextflow.cache` package implements the cache database of previously executed tasks.
## Class Diagram
```{mermaid} diagrams/nextflow.cache.mmd
```
```{note}
Some classes may be excluded from the above diagram for brevity.
```
## Notes
The cache database uses [Kryo](https://github.com/EsotericSoftware/kryo) to serialize and deserialize task data. Each key-value pair in the cache database corresponds to a task. The key is the task hash, and the value consists of (1) the task `TraceRecord`, (2) the `TaskContext`, and (3) the task reference count.
The default cache store is backed by [LevelDB](https://mvnrepository.com/artifact/org.iq80.leveldb/leveldb) and is stored in `.nextflow/cache/<session-id>` relative to the launch directory.
The cloud cache store is backed by remote object storage such as Amazon S3, Azure Blob Storage, and Google Cloud Storage. It stores each task entry as a separate object.

View File

@@ -0,0 +1,17 @@
# `nextflow.cli`
The `nextflow.cli` package implements the command line interface.
## Class Diagram
```{mermaid} diagrams/nextflow.cli.mmd
```
```{note}
Some classes may be excluded from the above diagram for brevity.
```
## Notes
The `Launcher` class is the entrypoint for Nextflow. It uses [JCommander](https://jcommander.org/) to parse the command-line arguments. Additionally, there is a class for each subcommand which implements the application logic of that command. By far the most complex command is `CmdRun`.

View File

@@ -0,0 +1,13 @@
# `nextflow.cloud.aws`
The `nextflow.cloud.aws` package implements the AWS Batch executor.
## Class Diagram
```{mermaid} diagrams/nextflow.cloud.aws.mmd
```
```{note}
Some classes may be excluded from the above diagrams for brevity.
```

View File

@@ -0,0 +1,17 @@
# `nextflow.cloud.aws.nio`
The `nextflow.cloud.aws.nio` package implements the S3 filesystem.
## Class Diagram
```{mermaid} diagrams/nextflow.cloud.aws.nio.mmd
```
```{note}
Some classes may be excluded from the above diagrams for brevity.
```
## Notes
The S3 filesystem translates Java Path API calls into S3 API calls, which allows Nextflow to interact with S3 objects through the same interface for local files.

View File

@@ -0,0 +1,13 @@
# `nextflow.cloud.azure`
The `nextflow.cloud.azure` package implements the Azure Batch executor.
## Class Diagram
```{mermaid} diagrams/nextflow.cloud.azure.mmd
```
```{note}
Some classes may be excluded from the above diagrams for brevity.
```

View File

@@ -0,0 +1,13 @@
# `nextflow.cloud.google`
The `nextflow.cloud.google` package implements the Google Batch executor.
## Class Diagram
```{mermaid} diagrams/nextflow.cloud.google.mmd
```
```{note}
Some classes may be excluded from the above diagrams for brevity.
```

View File

@@ -0,0 +1,19 @@
# `nextflow.config`
The `nextflow.config` package contains the implementation of the Nextflow configuration.
## Class Diagram
```{mermaid} diagrams/nextflow.config.mmd
```
```{note}
Some classes may be excluded from the above diagram for brevity.
```
## Notes
Any command that parses Nextflow config files (`config`, `run`, etc) uses the `ConfigBuilder` to build a `ConfigMap` from a set of config files. The `ConfigBuilder` itself uses a `ConfigParser` to parse the config files.
The Nextflow configuration language is essentially Groovy with some extensions. These extensions are implemented in `ConfigBase` and `ConfigTransformImpl`.

View File

@@ -0,0 +1,19 @@
# `nextflow.container`
The `nextflow.container` package implements the integration with container runtimes.
## Class Diagram
```{mermaid} diagrams/nextflow.container.mmd
```
```{note}
Some classes may be excluded from the above diagram for brevity.
```
## Notes
The `ContainerBuilder` class is the base class for all container runtimes supported by Nextflow. It produces the container wrapper command for a given task run.
Executors that support containerized tasks insert this wrapper command into the task wrapper script (`.command.run`). Executors that are *container-native*, i.e. that launch the task wrapper itself inside a container, don't need to generate a container wrapper command.

View File

@@ -0,0 +1,19 @@
# `nextflow.dag`
The `nextflow.dag` package implements the workflow DAG and renderers for several diagram formats.
## Class Diagram
```{mermaid} diagrams/nextflow.dag.mmd
```
```{note}
Some classes may be excluded from the above diagram for brevity.
```
## Notes
The workflow DAG defines the network of processes, channels, and operators that comprise a workflow. It is produced by the execution of the Nextflow script. See [nextflow.script](nextflow.script.md) for more details.
Implementations of the `DagRenderer` interface define how to render the workflow DAG to a particular diagram format. See {ref}`workflow-diagram` for more details.

View File

@@ -0,0 +1,21 @@
# `nextflow.executor`
The `nextflow.executor` package defines the executor interface and implements several built-in executors.
## Class Diagram
```{mermaid} diagrams/nextflow.executor.mmd
```
```{note}
Some classes may be excluded from the above diagram for brevity.
```
## Notes
The `Executor` class is the base class for all Nextflow executors. The main purpose of an `Executor` is to submit tasks to an underlying compute environment, such as an HPC scheduler or cloud batch executor. It uses a `TaskMonitor` to manage the lifecycle of all tasks and a `TaskHandler` to manage each individual task. Most executors use the same polling monitor, but each executor implements its own task handler to customize it for a particular compute environment. See [nextflow.processor](nextflow.processor.md) for more details about these classes.
The built-in executors include the local executor (`LocalExecutor`) and the various grid executors (SLURM, PBS, LSF, etc), all of which extend `AbstractGridExecutor`. The `LocalExecutor` implements both "script" tasks (processes with a `script` or `shell` block) and "native" tasks (processes with an `exec` block).
The `BashWrapperBuilder` is used by executors to generate the wrapper script (`.command.run`) for a task, from a template script called `command-run.txt`, as well as the task configuration and the execution environment.

View File

@@ -0,0 +1,19 @@
# `nextflow.extension`
The `nextflow.extension` package implements the channel operators and other extension methods.
## Class Diagram
```{mermaid} diagrams/nextflow.extension.mmd
```
```{note}
Some classes may be excluded from the above diagram for brevity.
```
## Notes
Operators are implemented using the [GPars](http://gpars.org/1.2.1/guide/guide/dataflow.html) dataflow library. In general, an operator consumes one or more `DataflowReadChannel`s and produces one or more `DataflowWriteChannel`s. See {ref}`operator-page` for details about each operator.
Other notable classes include `Bolts` and `FilesEx`, which implement various extension methods used throughout the Nextflow codebase. If you see a method that doesn't appear to be implemented by the calling object, it may be implemented in one of these extension classes.

View File

@@ -0,0 +1,17 @@
# `nextflow.k8s`
The `nextflow.k8s` package implements the Kubernetes executor and the `kuberun` command.
## Class Diagram
```{mermaid} diagrams/nextflow.k8s.mmd
```
```{note}
Some classes may be excluded from the above diagram for brevity.
```
## Notes
The Kubernetes integration uses the K8s HTTP API to interact with K8s clusters, and relies on the `kubectl` command and `~/.kube/config` file for authentication.

View File

@@ -0,0 +1,21 @@
# `nextflow`
The `nextflow` package contains various top-level classes.
## Class Diagram
```{mermaid} diagrams/nextflow.mmd
```
```{note}
Some classes may be excluded from the above diagram for brevity.
```
## Notes
The `Nextflow` class implements several methods that are exposed to Nextflow scripts. See {ref}`stdlib-namespaces-global` for details.
The `Channel` class implements the channel factory methods, and it is exposed directly to Nextflow scripts. See {ref}`channel-factory` for details.
The `Session` class is the top-level representation of a Nextflow run, or "session". See [nextflow.script](nextflow.script.md) for more details about how a `Session` is created.

View File

@@ -0,0 +1,17 @@
# `nextflow.plugin`
The `nextflow.plugin` package implements the plugin manager.
## Class Diagram
```{mermaid} diagrams/nextflow.plugin.mmd
```
```{note}
Some classes may be excluded from the above diagram for brevity.
```
## Notes
The plugin system uses the [PF4J](https://pf4j.org/) library, which allows for extension classes to be loaded at runtime. Each plugin includes a manifest of extension classes, all of which extend or implement some base class in Nextflow. The `Plugins` class can be used to query the available extensions for a given base class. Extensions can be assigned a priority using the `@Priority` annotation, to ensure that certain extensions are used over others when available.

View File

@@ -0,0 +1,23 @@
# `nextflow.processor`
The `nextflow.processor` package implements the execution and monitoring of tasks.
## Class Diagram
```{mermaid} diagrams/nextflow.processor.mmd
```
```{note}
Some classes may be excluded from the above diagram for brevity.
```
## Notes
While the [`executor`](nextflow.executor.md) package defines how tasks are submitted to a particular execution backend (such as an HPC scheduler), the `processor` package defines how tasks are created and executed. As such, these packages work closely together, and in fact several components of the `Executor` interface, specifically the `TaskHandler` and `TaskMonitor`, are defined in this package.
The `TaskProcessor` is by far the largest and most complex class in this package. It implements both the dataflow operator for a given process as well as the task execution logic. In other words, it defines the mapping from an abstract process definition with input and output channels into concrete task executions.
A `TaskRun` represents a particular task execution. There is also `TaskBean`, which is a serializable representation of a task. Legends say that `TaskBean` was originally created to support a "daemon" mode in which Nextflow would run on both the head node and the worker nodes, so the Nextflow "head" would need to send tasks to the Nextflow "workers". This daemon mode was never completed, but echoes of it remain (see `CmdNode`, `DaemonLauncher`, and the `nf-ignite` plugin).
When a `TaskProcessor` receives a set of input values, it creates a `TaskRun` and submits it to an `Executor`, which in turn submits the task to a underlying execution backend. The executor's `TaskMonitor` then monitors the status of the task, and when it is completed, returns it to the task processor for finalization. If the task completed successfully, the task processor collects the task outputs and emits them on the corresponding output channels. If the task failed, the task processor will retry it if possible, or else return a task error to the workflow run.

View File

@@ -0,0 +1,63 @@
# `nextflow.scm`
The `nextflow.scm` package defines the Git provider interface and implements several built-in Git providers. It also manages local pipeline repositories using a Strategy pattern to support different repository management approaches.
## Class Diagram
```{mermaid} diagrams/nextflow.scm.mmd
```
```{note}
Some classes may be excluded from the above diagram for brevity.
```
## Architecture overview
### Repository strategies
The `AssetManager` uses the **Strategy pattern** to support different ways of managing local pipeline installations:
- **`LegacyRepositoryStrategy`**: Traditional approach where each project gets a full git clone in `$NXF_HOME/{project}` directory. Only one revision can exist at a time per project.
- **`MultiRevisionRepositoryStrategy`**: Modern approach that allows multiple revisions to coexist efficiently by:
- Maintaining a shared bare repository in `$NXF_HOME/.repos/{project}/bare/`
- Creating lightweight clones for each commit in `$NXF_HOME/.repos/{project}/clones/{commitId}/`
- Sharing git objects between revisions to minimize disk space
### Strategy selection
The `AssetManager` automatically selects the appropriate strategy based on:
1. **Environment variable**: `NXF_SCM_LEGACY=true` forces legacy mode
2. **Repository status**: Detected by checking existing repository structure:
- `UNINITIALIZED`: No repository exists, use Multi-Revision (default)
- `LEGACY_ONLY`: Only legacy `.git` directory exists, use Legacy
- `BARE_ONLY`: Only bare repository exists, use Multi-Revision
- `HYBRID`: Both exist, prefer Multi-Revision
### Repository provider
The `RepositoryProvider` class is the base class for all Git providers. It defines how to authenticate with the provider, clone a Git repository, inspect branches and tags, etc. The provider is used by repository strategies to interact with remote Git services.
## Key components
### AssetManager
Central class that manages pipeline assets. Key responsibilities include:
- Project name resolution and validation
- Strategy selection and initialization
- Provider configuration and authentication
- Repository download and checkout operations
- Coordination between strategy and provider
### RepositoryStrategy
Interface defining for repository management operations:
- `download()`: Download or update a revision
- `checkout()`: Switch to a specific revision
- `drop()`: Delete local copies
- `getLocalPath()`: Get path to working directory
- `getGit()`: Access JGit repository instance

View File

@@ -0,0 +1,21 @@
# `nextflow.script`
The `nextflow.script` package implements the parsing and execution of Nextflow scripts.
## Class Diagram
```{mermaid} diagrams/nextflow.script.mmd
```
```{note}
Some classes may be excluded from the above diagram for brevity.
```
## Notes
The execution of a Nextflow pipeline occurs in two phases. In the first phase, Nextflow parses and runs the script (using the language extensions in [nextflow.ast](nextflow.ast.md) and [nextflow.extension](nextflow.extension.md)), which produces the workflow DAG. In the second phase, Nextflow executes the workflow.
```{note}
In DSL1, there was no separation between workflow construction and execution -- dataflow operators were executed as soon as they were constructed. DSL2 introduced lazy execution in order to separate process definition from execution, and thereby facilitate subworkflows and modules.
```

View File

@@ -0,0 +1,17 @@
# `nextflow.secret`
The `nextflow.secret` package defines the secrets provider interface and implements the built-in local secrets store.
## Class Diagram
```{mermaid} diagrams/nextflow.secret.mmd
```
```{note}
Some classes may be excluded from the above diagram for brevity.
```
## Notes
The default secrets provider simply stores key-value pairs in a local JSON file.

View File

@@ -0,0 +1,17 @@
# `nextflow.trace`
The `nextflow.trace` package defines the trace observer interface and implements several built-in trace observers.
## Class Diagram
```{mermaid} diagrams/nextflow.trace.mmd
```
```{note}
Some classes may be excluded from the above diagram for brevity.
```
## Notes
The `TraceObserver` interface defines a set of hooks into the workflow execution, such as when a workflow starts and completes, when a task starts and completes, and when an output file is published. The `Session` maintains a list of all observers and triggers each hook when the corresponding event occurs. Implementing classes can use these hooks to perform custom behaviors. In fact, this interface is used to implement several core features, including the various execution reports, DAG renderer, and the integration with Seqera Platform.

View File

@@ -0,0 +1,12 @@
(packages-page)=
# Packages
The following subpages correspond to packages in the Nextflow source code:
```{toctree}
:glob:
:maxdepth: 1
nextflow*
```