Files
2026-04-29 23:01:54 +02:00

352 lines
13 KiB
Markdown

# Research: Nextflow Module System Client
**Date**: 2026-01-19
**Feature**: 251117-module-system
## Overview
This document captures technical research and decisions for implementing the Nextflow module system client. All NEEDS CLARIFICATION items from Technical Context have been resolved through codebase exploration.
---
## 1. CLI Command Structure
**Research Question**: How should `nextflow module` CLI commands be implemented?
**Decision**: JCommander native subcommands — each subcommand extends `CmdBase` directly; no trait needed
**Rationale**:
- JCommander's subcommand support handles parameter parsing automatically per subcommand
- Each subcommand (install, run, list, remove, search, info, publish) is a separate class extending CmdBase
- `ModuleRun` extends `CmdRun` to reuse pipeline execution logic (PR #6381)
- No custom `ModuleSubCmd` trait needed; cleaner architecture
- `CmdModule` is registered in `Launcher` alongside all other top-level commands
**Implemented Pattern**:
```groovy
@Parameters(commandDescription = "Manage Nextflow modules")
class CmdModule extends CmdBase implements UsageAware {
static final List<CmdBase> commands = []
static {
commands << new ModuleInstall() // extends CmdBase
commands << new ModuleRun() // extends CmdRun
commands << new ModuleList() // extends CmdBase
commands << new ModuleRemove() // extends CmdBase
commands << new ModuleSearch() // extends CmdBase
commands << new ModuleInfo() // extends CmdBase
commands << new ModulePublish() // extends CmdBase
}
void run() {
final jc = commander() // JCommander with all subcommands registered
jc.parse(args as String[])
final subcommand = jc.getCommands().get(jc.getParsedCommand()).getObjects()[0]
subcommand.run()
}
}
```
**Alternatives Considered**:
- CmdFs trait pattern: Considered initially; replaced by JCommander native subcommands — simpler and avoids custom parsing
- Separate top-level Cmd classes (CmdModuleInstall, etc.): Rejected — too many entry points
- Plugin-based CLI extension: Rejected — module system is core functionality, not optional
---
## 2. DSL Parser Extension for @scope/name
**Research Question**: How to extend `include` statement parsing for registry modules?
**Decision**: Extend `ResolveIncludeVisitor` to detect `@` prefix and delegate to a `RemoteModuleResolver` SPI loaded via Java `ServiceLoader`
**Rationale**:
- Keeps `nf-lang` decoupled from runtime module resolution (`nf-lang` has no dependency on `nextflow` module)
- SPI pattern allows plugins or custom implementations to override the default resolver
- Detection: `source.startsWith('@')` distinguishes registry vs local paths — preserves existing include behavior
- Resolution at parse time (after plugin resolution) per ADR
**Implemented Architecture**:
```
include { X } from '@scope/name'
ResolveIncludeVisitor.visitInclude() [nf-lang]
source.startsWith("@") → RemoteModuleResolverProvider.getInstance().resolve(source, baseDir)
RemoteModuleResolverProvider [nf-lang]
Java ServiceLoader discovers implementations; picks highest priority
DefaultRemoteModuleResolver [nextflow module]
Calls ModuleResolver.installModule(reference, version, autoInstall=true)
Returns Path to modules/@scope/name/main.nf
```
**Key Files**:
- `modules/nf-lang/src/main/java/nextflow/module/spi/RemoteModuleResolver.java` — SPI interface
- `modules/nf-lang/src/main/java/nextflow/module/spi/RemoteModuleResolverProvider.java` — ServiceLoader singleton
- `modules/nf-lang/src/main/java/nextflow/module/spi/FallbackRemoteModuleResolver.java` — error fallback
- `modules/nf-lang/src/main/java/nextflow/script/control/ResolveIncludeVisitor.java` — MODIFIED
- `modules/nextflow/src/main/groovy/nextflow/module/DefaultRemoteModuleResolver.groovy` — default impl
**Alternatives Considered**:
- New ANTLR grammar token for `@`: Rejected — unnecessary parser complexity
- Direct dependency from nf-lang to nextflow module: Rejected — circular dependency risk; SPI decouples cleanly
- Dot file marker for local modules: Deferred in ADR; current impl uses `@` for registry, `.`/`/` for local
---
## 3. Config Parsing for modules{} and registry{} Blocks
**Research Question**: How to add new config DSL blocks?
**Decision**: Create ModulesConfig and RegistryConfig classes implementing ConfigScope interface
**Rationale**:
- ConfigScope is an ExtensionPoint (pf4j) that ConfigBuilder automatically discovers
- Classes implementing ConfigScope and annotated with @ScopeName are automatically parsed
- No need to modify ConfigBuilder or create custom DSL parsers
- Pattern used throughout Nextflow: FusionConfig, CondaConfig, DockerConfig, etc.
- Provides type safety via @CompileStatic and validation via @ConfigOption
**Reference Implementation**:
```
Location: modules/nextflow/src/main/groovy/nextflow/fusion/FusionConfig.groovy
Pattern:
@ScopeName("modules")
@Description("Module version declarations")
@CompileStatic
class ModulesConfig implements ConfigScope {
@ConfigOption
@Description("Module version mappings")
final Map<String, String> modules = [:]
ModulesConfig() {}
ModulesConfig(Map opts) {
// Parse from config map
}
}
```
**ConfigScope Interface**:
```
Location: modules/nf-lang/src/main/java/nextflow/config/spec/ConfigScope.java
public interface ConfigScope extends ExtensionPoint {}
```
**RegistryConfig Pattern**:
```groovy
@ScopeName("registry")
@Description("Module registry configuration")
@CompileStatic
class RegistryConfig implements ConfigScope {
static final String DEFAULT_REGISTRY_URL = 'https://registry.nextflow.io/api'
@ConfigOption
final Collection<String> url // One or more URLs in priority order
@ConfigOption
final String apiKey // API key; falls back to NXF_REGISTRY_TOKEN env var
RegistryConfig() {
url = [DEFAULT_REGISTRY_URL]
apiKey = null
}
RegistryConfig(Map opts) {
url = opts.url ?: [DEFAULT_REGISTRY_URL]
apiKey = opts.apiKey as String
}
String getUrl() { url ? url[0] : DEFAULT_REGISTRY_URL }
Collection<String> getAllUrls() { url ?: [DEFAULT_REGISTRY_URL] }
String getApiKey() { apiKey ?: SysEnv.get('NXF_REGISTRY_TOKEN') }
}
```
**Integration Point**: ConfigBuilder automatically discovers and parses ConfigScope implementations via ExtensionPoint mechanism
**Alternatives Considered**:
- Custom DSL parsers (ModulesDsl/RegistryDsl): Rejected - unnecessary complexity, ConfigScope pattern handles this automatically
- JSON/YAML config file: Rejected - inconsistent with Nextflow config style
- Dedicated pipeline.yaml: Deferred per ADR Open Questions
---
## 4. Registry HTTP Communication
**Research Question**: How to communicate with module registry API?
**Decision**: Create HttpModuleRepository following HttpPluginRepository pattern
**Rationale**:
- HttpPluginRepository provides robust HTTP client with retry logic
- Uses HxClient from io.seqera.http (already a dependency)
- Handles authentication headers consistently
- Supports connection pooling and timeout configuration
**Reference Implementation**:
```
Location: modules/nf-commons/src/main/nextflow/plugin/HttpPluginRepository.groovy
Pattern:
class HttpModuleRepository {
private final URI url
private final HxClient httpClient
private final String authToken
ModuleInfo getModule(String name, String version)
List<ModuleInfo> search(String query, int limit)
Path download(String name, String version, Path target)
void publish(String name, Path bundle)
}
```
**API Endpoints** (from ADR):
```
GET /api/modules?query=<text> # Search
GET /api/modules/{name} # Get module + latest release
GET /api/modules/{name}/releases # List all releases
GET /api/modules/{name}/{version} # Get specific release
GET /api/modules/{name}/{version}/download # Download bundle
POST /api/modules/{name} # Publish (authenticated)
```
**Alternatives Considered**:
- Direct HttpClient usage: Rejected - loses retry, pooling benefits
- gRPC protocol: Rejected - registry already uses REST
---
## 5. Authentication Patterns
**Research Question**: How to handle registry authentication?
**Decision**: Support `NXF_REGISTRY_TOKEN` env var + `registry.apiKey` config field
**Rationale**:
- Environment variable provides CI/CD compatibility
- `apiKey` config field allows explicit token configuration
- Authentication is only applied to the primary (first) registry URL
- Bearer token in Authorization header (standard HTTP auth)
**Implementation**:
```
RegistryConfig.getApiKey() returns:
1. registry.apiKey config value if set
2. NXF_REGISTRY_TOKEN environment variable as fallback
3. null if neither is set (unauthenticated requests)
```
**Config Syntax**:
```nextflow
registry {
apiKey = '${NXF_REGISTRY_TOKEN}'
}
```
**Alternatives Considered**:
- Per-registry token map (`auth {}` block): Was in initial design; simplified to single `apiKey` since only the primary registry uses authentication
- Secrets file (~/.nextflow/secrets.json): Possible future enhancement
- OAuth flow: Rejected for CLI — token-based simpler
---
## 6. Checksum Verification
**Research Question**: How to implement module integrity verification?
**Decision**: SHA-256 checksum stored in `.checksum` file, verified on every run
**Rationale**:
- SHA-256 is industry standard, already used for plugin verification
- `.checksum` file stores registry-provided checksum (from X-Checksum header)
- Local checksum computed on-demand and compared
- Mismatch indicates local modification (warn, don't override)
**Implementation Pattern**:
```groovy
class ModuleChecksum {
static final String ALGORITHM = 'SHA-256'
static String compute(Path moduleDir) {
// Hash all files in module directory
// Exclude .checksum itself
// Return hex-encoded SHA-256
}
static boolean verify(Path moduleDir) {
def expected = moduleDir.resolve('.checksum').text.trim()
def actual = compute(moduleDir)
return expected == actual
}
static void save(Path moduleDir, String checksum) {
moduleDir.resolve('.checksum').text = checksum
}
}
```
**Checksum Scope**: Covers all files in module directory (main.nf, meta.yaml, README.md, etc.)
**Alternatives Considered**:
- Per-file checksums: Rejected - adds complexity, single checksum sufficient
- MD5: Rejected - SHA-256 more secure
---
## 7. Version Constraint Syntax
**Research Question**: What version constraint syntax to use for module dependencies?
**Decision**: Reuse existing Nextflow plugin version constraint syntax
**Rationale**:
- Already implemented and tested in plugin system
- Users familiar with existing `nextflowVersion` syntax
- Supports ranges, comparisons, exact versions
- No new parser code needed
**Supported Syntax**:
| Notation | Meaning | Example |
|----------|---------|---------|
| `1.2.3` | Exact version | `@nf-core/fastqc@1.0.0` |
| `>=1.2.3` | Greater or equal | `@nf-core/fastqc@>=1.0.0` |
| `<=1.2.3` | Less or equal | `@nf-core/fastqc@<=2.0.0` |
| `>=1.2.0,<2.0.0` | Range | `@nf-core/samtools@>=1.0.0,<2.0.0` |
**Reference**: Version parsing code exists in plugin system; reuse VersionNumber class
**Alternatives Considered**:
- NPM-style `^` and `~`: Rejected - inconsistent with existing Nextflow patterns
- Always latest: Rejected - breaks reproducibility
---
## 8. Tool Arguments Implementation
> **⚠️ REMOVED FROM ADR** — The tool arguments feature (`tools.<name>.args` in meta.yaml and process config) was removed from the module system ADR. It is not implemented and not planned in the current scope. The `meta.yaml` format used in the actual implementation (`ModuleSpec`) does not include tool/argument definitions.
---
## Summary of Key Decisions
| Area | Decision | Key Reference |
|------|----------|---------------|
| CLI | JCommander subcommands; each extends CmdBase (ModuleRun extends CmdRun) | CmdModule.groovy |
| DSL Parser | SPI pattern — ResolveIncludeVisitor delegates to RemoteModuleResolver; DefaultRemoteModuleResolver bridges to ModuleResolver | ResolveIncludeVisitor.java, RemoteModuleResolver.java |
| Config | ModulesConfig + RegistryConfig (ConfigScope) | FusionConfig.groovy, ConfigScope.java |
| Registry HTTP | ModuleRegistryClient using HxClient + npr-api models | HttpPluginRepository.groovy |
| Authentication | `NXF_REGISTRY_TOKEN` env var or `registry.apiKey` config field (primary registry only) | RegistryConfig.groovy |
| Checksums | SHA-256/SHA-512, `.checksum` file, download integrity via X-Checksum header | ModuleChecksum.groovy |
| Version Storage | `nextflow_spec.json` (auto-managed); `modules {}` in nextflow.config (manual alternative) | PipelineSpec.groovy |
| Version Syntax | Plugin-compatible constraints | VersionNumber class |
| Tool Args | ~~Implicit variable, parse-time validation~~**Removed from ADR** | N/A |
---
## Open Items (Deferred)
1. **Local vs managed module distinction**: Resolved — `@` prefix for registry modules only; local paths start with `.` or `/`
2. **Tool arguments**: Removed from ADR — not in scope
3. **Module version location**: Resolved — `nextflow_spec.json` (auto-managed by `module install`); `modules {}` block in `nextflow.config` supported as alternative
4. **DSL parser `@scope/name` include**: ✅ Resolved — SPI pattern implemented (T017a-d)