Files
ma/nextflow/specs/251117-module-system/spec.md
2026-04-29 23:01:54 +02:00

16 KiB

Feature Specification: Nextflow Module System Client

Feature Branch: 251117-module-system Created: 2026-01-15 Status: Draft Input: User description: "Implement Nextflow module system client based on ADR 20251114-module-system.md. Focus on client-side implementation only - CLI commands, DSL parser extensions, dependency resolution, and local storage. Registry backend is assumed to be already implemented."

Overview

This specification covers the Nextflow client-side implementation of the module system, enabling pipeline developers to:

  • Include remote modules from the Nextflow registry using @scope/name syntax
  • Manage module versions through nextflow.config
  • Use CLI commands to install, search, list, remove, publish, and run modules
  • Configure module parameters through structured meta.yaml definitions

Out of Scope: Registry backend implementation (assumed already available at registry.nextflow.io)

User Scenarios & Testing

User Story 1 - Install and Use Registry Module (Priority: P1)

A pipeline developer wants to use a pre-built module from the Nextflow registry in their workflow without manually downloading or managing module files.

Why this priority: This is the core value proposition - enabling code reuse from the ecosystem. Without this, the module system provides no benefit.

Independent Test: Can be fully tested by running nextflow module install nf-core/fastqc and then executing a workflow that includes the module. Delivers immediate value by enabling module consumption.

Acceptance Scenarios:

  1. Given a new Nextflow project with no modules installed, When user runs nextflow module install nf-core/fastqc, Then the module is downloaded to modules/@nf-core/fastqc/, a .checksum file is created, and nextflow_spec.json is updated with the version
  2. Given a workflow file with include { FASTQC } from '@nf-core/fastqc', When user runs nextflow run main.nf, Then Nextflow resolves the module from local storage and executes the process
  3. Given a module version declared in nextflow.config, When user includes the module, Then the declared version is used (not latest)

User Story 2 - Run Module Directly (Priority: P1)

A user wants to run a module directly from the command line without writing a wrapper workflow.

Why this priority: Enables immediate productivity - users can test and execute modules without boilerplate code, essential for AI agents and quick experimentation.

Independent Test: Can be tested by running nextflow module run nf-core/fastqc --input 'data/*.fq' and verifying the process executes.

Acceptance Scenarios:

  1. Given a module is available (locally or in registry), When user runs nextflow module run nf-core/fastqc --input 'data/*.fastq', Then the module is executed with the provided inputs mapped to process parameters
  2. Given a module with parameters defined in meta.yaml, When user runs nextflow module run nf-core/bwa-align --batch_size 100000, Then the parameter is validated and passed to the process
  3. Given a module is not installed locally, When user runs nextflow module run nf-core/salmon, Then the module is automatically downloaded before execution

User Story 3 - Module Parameters (Priority: P1)

A module author wants to define typed, documented parameters that provide a clear interface for module customization.

Why this priority: Critical for module usability - provides type-safe, documented parameters that enable IDE autocompletion and validation, replacing the opaque ext.args pattern.

Independent Test: Can be tested by configuring params.batch_size = 100000 in config and verifying the parameter is applied in the script.

Acceptance Scenarios:

  1. Given a module with params defined in meta.yaml, When user configures params.batch_size = 100000 in config, Then the parameter is accessible in scripts via params.batch_size
  2. Given a parameter with type validation, When user provides an invalid value type, Then a validation error is displayed
  3. Given a module with documented parameters, When user runs nextflow module run --help, Then available parameters with descriptions are listed

User Story 4 - Module Version Management (Priority: P2)

A pipeline developer wants to pin and manage module versions to ensure reproducible workflow executions.

Why this priority: Reproducibility is important for scientific workflows - version pinning ensures consistent results.

Independent Test: Can be tested by modifying nextflow.config module versions and verifying the correct version is used on workflow run.

Acceptance Scenarios:

  1. Given a module is installed at version 1.0.0, When user changes nextflow_spec.json to specify version 1.1.0 and runs the workflow, Then version 1.1.0 is automatically downloaded and replaces the local copy
  2. Given modules installed locally, When user runs nextflow module list, Then configured version, installed version, latest available version, and status are displayed for each module

User Story 5 - Module Integrity Protection (Priority: P2)

A pipeline developer who has locally modified a module (for debugging or customization) wants to be protected from accidentally losing those changes.

Why this priority: Protects user work - important for developer experience but not blocking core functionality.

Independent Test: Can be tested by modifying a module's main.nf locally, then attempting to install a different version and verifying the warning appears.

Acceptance Scenarios:

  1. Given a locally modified module (checksum mismatch with .checksum), When user tries to install a different version, Then Nextflow warns about local modifications and does NOT override
  2. Given a locally modified module, When user runs nextflow module install -force, Then the local module is replaced with the registry version
  3. Given a locally modified module, When user runs the workflow, Then a warning is displayed about checksum mismatch but execution continues

User Story 6 - Remove Module (Priority: P3)

A pipeline developer wants to remove a module they no longer need.

Why this priority: Housekeeping feature - useful but not blocking core workflows.

Independent Test: Can be tested by running nextflow module remove nf-core/fastqc and verifying files are deleted and config is updated.

Acceptance Scenarios:

  1. Given a module is installed, When user runs nextflow module remove nf-core/fastqc, Then the module directory is deleted and the entry is removed from nextflow_spec.json
  2. Given a module is referenced in workflow files, When user runs nextflow module remove, Then a warning is displayed about the reference but removal proceeds

User Story 7 - Search and Discover Modules (Priority: P3)

A pipeline developer wants to find available modules in the registry that match their analysis needs.

Why this priority: Discovery feature - useful but users can find modules through documentation or registry web UI.

Independent Test: Can be tested by running nextflow module search bwa and verifying results are displayed with name, version, and description.

Acceptance Scenarios:

  1. Given modules exist in the registry, When user runs nextflow module search alignment, Then matching modules are displayed with name, latest version, description, and download count
  2. Given user wants JSON output for scripting, When user runs nextflow module search fastqc -json, Then results are returned in parseable JSON format
  3. Given many results exist, When user runs nextflow module search quality -limit 5, Then only 5 results are returned

User Story 8 - Publish Module to Registry (Priority: P3)

A module author wants to publish their module to the Nextflow registry for others to use.

Why this priority: Ecosystem contribution feature - important for growth but users can consume modules without publishing capability.

Independent Test: Can be tested by creating a valid module structure and running nextflow module publish -dry-run to validate.

Acceptance Scenarios:

  1. Given a valid module with main.nf, meta.yaml, and README.md, When user runs nextflow module publish myorg/my-module, Then the module is uploaded to the registry and becomes available for installation
  2. Given an invalid module (missing required fields), When user runs nextflow module publish, Then validation errors are displayed listing the missing requirements
  3. Given no authentication configured, When user runs nextflow module publish, Then a clear error message indicates authentication is required

Edge Cases

  • What happens when the registry is unreachable during module resolution?
    • Nextflow uses locally cached modules if available, otherwise fails with a clear network error
  • How does the system handle circular module dependencies?
    • Dependency resolver detects cycles and fails with an error listing the cycle
  • What happens when two modules require incompatible versions of the same dependency?
    • System automatically selects the highest compatible version; if no compatible version exists, fails with error listing conflicting requirements
  • How are modules resolved when multiple registries are configured?
    • Registries are tried in order; first match wins
  • What happens when meta.yaml is missing from a module?
    • Module is treated as having no dependencies; basic functionality works
  • What happens when local module directory is corrupted or incomplete?
    • Checksum mismatch triggers warning; -force allows re-download

Requirements

Functional Requirements

DSL Parser Extension

  • FR-001: System MUST recognize @scope/name syntax in include statements as registry module references
  • FR-002: System MUST distinguish between local file paths (starting with . or /) and registry modules (starting with @)
  • FR-003: System MUST resolve module versions from nextflow_spec.json before downloading
  • FR-004: System MUST parse and validate meta.yaml files for module metadata and dependencies

Module Resolution

  • FR-005: System MUST resolve modules at workflow parse time (after plugin resolution)
  • FR-006: System MUST check local modules/@scope/name/ directory before querying registry
  • FR-007: System MUST verify module integrity using .checksum file on every run
  • FR-008: System MUST download modules from registry when not present locally or when version differs
  • FR-009: System MUST NOT override locally modified modules (checksum mismatch) unless -force is used
  • FR-010: System MUST resolve version conflicts by selecting the highest compatible version; if no compatible version exists, MUST fail with error listing conflicting requirements

Local Storage

  • FR-011: System MUST store modules in modules/@scope/name/ directory structure (single version per module)
  • FR-012: System MUST create .checksum file from registry's X-Checksum header on download
  • FR-013: System MUST store module's main.nf, meta.yaml, and supporting files in the module directory

CLI Commands

  • FR-014: System MUST provide nextflow module install [scope/name] command to download modules
  • FR-015: System MUST provide nextflow module search <query> command to search the registry
  • FR-016: System MUST provide nextflow module list command to show installed vs configured modules
  • FR-017: System MUST provide nextflow module remove scope/name command to delete modules
  • FR-018: System MUST provide nextflow module publish scope/name command to upload modules to registry
  • FR-019: System MUST provide nextflow module run scope/name command to execute modules directly
  • FR-019b: System MUST provide nextflow module info scope/name command to display module metadata and a usage template

Configuration

  • FR-020: System MUST persist module versions in nextflow_spec.json; MUST also read versions from modules {} block in nextflow.config as an alternative
  • FR-021: System MUST support registry {} block with url and apiKey fields for configuring registry URL and authentication
  • FR-022: System MUST support NXF_REGISTRY_TOKEN environment variable as fallback for registry.apiKey
  • FR-023: System MUST support multiple registry URLs with fallback ordering

Module Parameters

  • FR-024: System MUST parse module parameters from params section in meta.yaml
  • FR-025: System MUST validate module parameters against meta.yaml schema (type) at workflow parse time
  • FR-026: System MUST support boolean, integer, float, string, file, and path parameter types
  • FR-027: System MUST make module parameters accessible via standard params variable in scripts

Registry Communication

  • FR-028: System MUST communicate with registry via documented Module API endpoints
  • FR-029: System MUST handle authentication using Bearer token in Authorization header
  • FR-030: System MUST verify SHA-256 checksum on module download

Key Entities

  • Module: A reusable Nextflow process definition with main.nf entry point, optional meta.yaml manifest, and README documentation
  • Module Reference: A scoped identifier (@scope/name) pointing to a registry module
  • Module Manifest (meta.yaml): YAML file containing module metadata, version, dependencies, and parameter definitions
  • Module Parameter: A configurable parameter defined in meta.yaml with name, optional type, description, and example
  • Checksum File (.checksum): Local cache of registry checksum for integrity verification
  • Registry Configuration: Settings for registry URL, authentication, and fallback ordering

Success Criteria

Measurable Outcomes

  • SC-001: Pipeline developers can install and use a registry module within 5 minutes of starting a new project
  • SC-002: Module resolution adds less than 2 seconds to workflow startup time when modules are cached locally
  • SC-003: Users can successfully search, install, and run any module from the registry without reading documentation
  • SC-004: 100% of module version changes in nextflow.config result in automatic module updates without manual intervention
  • SC-005: Users receive clear, actionable error messages for all failure scenarios (network, validation, authentication)
  • SC-006: Module authors can publish a new module version within 3 minutes using the CLI
  • SC-007: Locally modified modules are never accidentally overwritten during normal operations

Assumptions

  • Registry backend is fully implemented and available at registry.nextflow.io with the Module API as documented in the ADR
  • Existing plugin authentication system can be reused for module registry authentication
  • Module bundle size limit of 1MB (uncompressed) is enforced by the registry
  • Network connectivity is available for initial module downloads; offline operation uses local cache only
  • The modules/ directory is intended to be committed to the pipeline's git repository
  • Version constraints in meta.yaml follow the same syntax as existing Nextflow plugin version constraints
  • SHA-256 is used for all checksum operations
  • Module parameters use standard --<param_name> CLI syntax

Dependencies

  • Registry backend API (Module API endpoints as specified in ADR)
  • Existing Nextflow plugin system (for authentication reuse)
  • Existing DSL parser infrastructure (for include statement extension)
  • Existing config parser (for modules {} and registry {} blocks)

Clarifications

Session 2026-01-19

  • Q: What should happen when incompatible dependency versions are detected? → A: Use highest compatible version automatically, warn if none exists
  • Q: When should module parameter validation occur? → A: At workflow parse time (early, before any execution)