(google-page)= # Google Cloud (google-credentials)= ## Credentials Credentials for submitting requests to the Google Cloud Batch API are picked up from your environment using [Application Default Credentials](https://github.com/googleapis/google-auth-library-java#google-auth-library-oauth2-http). Application Default Credentials are designed to use the credentials most natural to the environment in which a tool runs. The most common case will be to pick up your end-user Google credentials from your workstation. You can create these by running the command: ```bash gcloud auth application-default login ``` and running through the authentication flow. This will write a credential file to your gcloud configuration directory that will be used for any tool you run on your workstation that picks up default credentials. The next most common case would be when running on a Compute Engine VM. In this case, Application Default Credentials will pick up the Compute Engine Service Account credentials for that VM. See the [Application Default Credentials](https://github.com/googleapis/google-auth-library-java#google-auth-library-oauth2-http) documentation for how to enable other use cases. Finally, the `GOOGLE_APPLICATION_CREDENTIALS` environment variable can be used to specify location of the Google credentials file. If you don't have it, the credentials file can be downloaded from the Google Cloud Console following these steps: - Open the [Google Cloud Console](https://console.cloud.google.com) - Go to APIs & Services → Credentials - Click on the *Create credentials* (blue) drop-down and choose *Service account key*, in the following page - Select an existing *Service account* or create a new one if needed - Select JSON as *Key type* - Click the *Create* button and download the JSON file giving a name of your choice e.g. `creds.json`. Then, define the following variable replacing the path in the example with the one of your credentials file just downloaded: ```bash export GOOGLE_APPLICATION_CREDENTIALS="/path/your/file/creds.json" ``` See [Get started with Nextflow on Google Cloud Batch](https://www.nextflow.io/blog/2023/nextflow-with-gbatch.html) for more information on how to use Google Cloud Batch, including how to set the required roles for your service account. (google-batch)= ## Cloud Batch :::{versionadded} 22.07.1-edge ::: [Google Cloud Batch](https://cloud.google.com/batch) is a managed computing service that allows the execution of containerized workloads in the Google Cloud Platform infrastructure. Nextflow provides built-in support for Google Cloud Batch, allowing the seamless deployment of Nextflow pipelines in the cloud, in which tasks are offloaded to the Cloud Batch service. Read the {ref}`Google Cloud Batch executor ` section to learn more about the `google-batch` executor in Nextflow. (google-batch-config)= ### Configuration Make sure to have defined in your environment the `GOOGLE_APPLICATION_CREDENTIALS` variable. See the [Credentials](#credentials) section for details. :::{note} Make sure your Google account is allowed to access the Google Cloud Batch service by checking the [APIs & Services](https://console.cloud.google.com/apis/dashboard) dashboard. ::: Create or edit the file `nextflow.config` in your project root directory. The config must specify the following parameters: - Google Cloud Batch as Nextflow executor - The Docker container image(s) for pipeline tasks - The Google Cloud project ID and location Example: ```groovy process { executor = 'google-batch' container = 'your/container:latest' } google { project = 'your-project-id' location = 'us-central1' } ``` Notes: - A container image must be specified to execute processes. You can use a different Docker image for each process using one or more {ref}`config-process-selectors`. - Make sure to specify the project ID, not the project name. - Make sure to specify a location where Google Batch is available. Refer to the [Google Batch documentation](https://cloud.google.com/batch/docs/get-started#locations) for region availability. Read the {ref}`Google configuration` section to learn more about advanced configuration options. (google-batch-process)= ### Process definition By default, the `cpus` and `memory` directives are used to find the cheapest machine type that is available at the current location and that fits the requested resources. If `memory` is not specified, 1 GB of memory is allocated per CPU. The `machineType` directive can be used to request a specific VM instance type. It can be any predefined Google Compute Platform [machine type](https://cloud.google.com/compute/docs/machine-types) or [custom machine type](https://cloud.google.com/compute/docs/instances/creating-instance-with-custom-machine-type). ```nextflow process my_task { cpus 8 memory '40 GB' script: """ your_command --here """ } process other_task { machineType 'n1-highmem-8' script: """ your_command --here """ } ``` :::{versionadded} 23.02.0-edge ::: The `machineType` directive can also be a comma-separated list of patterns. The pattern can contain a `*` to match any number of characters and `?` to match any single character. Examples of valid patterns: `c2-*`, `m?-standard*`, `n*`. ```nextflow process my_task { cpus 8 memory '20 GB' machineType 'n2-*,c2-*,m3-*' script: """ your_command --here """ } ``` :::{versionadded} 23.12.0-edge ::: The `machineType` directive can also be an [Instance Template](https://cloud.google.com/compute/docs/instance-templates), specified as `template://`. For example: ```nextflow process my_task { cpus 8 memory '20 GB' machineType 'template://my-template' script: """ your_command --here """ } ``` :::{note} Using an instance template will overwrite the `accelerator` and `disk` directives, as well as the following Google Batch config options: `bootDiskImage`, `cpuPlatform`, `preemptible`, and `spot`. ::: To use an instance template with GPUs, you must also set the `google.batch.installGpuDrivers` config option to `true`. To use an instance template with Fusion, the instance template must include a `local-ssd` disk named `fusion` with 375 GB. See the [Google Batch documentation](https://cloud.google.com/compute/docs/disks/local-ssd) for more details about local SSDs. :::{versionadded} 23.06.0-edge ::: The `disk` directive can be used to set the boot disk size or provision a disk for scratch storage. If the disk type is specified with the `type` option, a new disk will be mounted to the task VM at `/tmp` with the requested size and type. Otherwise, it will set the boot disk size, overriding the `google.batch.bootDiskSize` config option. See the [Google Batch documentation](https://cloud.google.com/compute/docs/disks) for more information about the available disk types. Examples: ```nextflow // set the boot disk size disk 100.GB // mount a persistent disk at '/tmp' disk 100.GB, type: 'pd-standard' // mount a local SSD disk at '/tmp' (should be a multiple of 375 GB) disk 375.GB, type: 'local-ssd' ``` ### Pipeline execution The pipeline can be launched either on a local computer or a cloud instance. Pipeline input data can be stored either locally or in a Google Storage bucket. The pipeline execution must specify a Google Storage bucket where the workflow's intermediate results are stored using the `-work-dir` command line options. For example: ```bash nextflow run