gcp-variant-transforms: Network parameter not being read

I’m using this script:

#!/bin/bash
# Parameters to replace:
# The GOOGLE_CLOUD_PROJECT is the project that contains your BigQuery dataset.
GOOGLE_CLOUD_PROJECT=psjh-eacri-data
INPUT_PATTERN=https://storage.googleapis.com/gcp-public-data--gnomad/release/2.1.1/vcf/exomes/gnomad.exomes.r2.1.1.sites.*.vcf.bgz
# INPUT_PATTERN=gs://gcp-public-data--gnomad/release/2.1.1/vcf/exomes/*.vcf.bgz
OUTPUT_TABLE=eacri-genomics:gnomad.gnomad_hg19_2_1_1
TEMP_LOCATION=gs://psjh-eacri/balter/gnomad_tmp/vcf/exomes/*.vcf.bgz/tmp

COMMAND="vcf_to_bq \
    --input_pattern ${INPUT_PATTERN} \
    --output_table ${OUTPUT_TABLE} \
    --temp_location ${TEMP_LOCATION} \
    --job_name vcf-to-bigquery \
    --runner DataflowRunner \
    --zones us-east1-b \
    --network projects/phs-205720/global/networks/psjh-shared01 \
    --subnet projects/phs-205720/regions/us-east1/subnetworks/subnet01"
    
docker run -v ~/.config:/root/.config \
    gcr.io/cloud-lifesciences/gcp-variant-transforms \
    --project "${GOOGLE_CLOUD_PROJECT}" \
    --temp_location ${TEMP_LOCATION} \
    "${COMMAND}"

And, yet, the error says that the network was not specified, and the network slot is empty in the JSON output.

What change do I need to make to my script? Or, is some other format needed to specify the network?

The script template doesn’t include a network or subnet parameter at all.

base) jupyter@balter-genomics:~$ ./script.sh
 --project 'psjh-eacri-data' --temp_location 'gs://psjh-eacri/balter/gnomad_tmp/vcf/exomes/*.vcf.bgz/tmp' -- 'vcf_to_bq     --input_pattern gs://gcp-public-data--gnomad/release/2.1.1/vcf/exomes/*.vcf.bgz     --output_table eacri-genomics:gnomad.gnomad_hg19_2_1_1     --temp_location gs://psjh-eacri/balter/gnomad_tmp/vcf/exomes/*.vcf.bgz/tmp     --job_name vcf-to-bigquery     --runner DataflowRunner     --zones us-east1-b     --subnet subnet03'
Your active configuration is: [variant]
{
  "pipeline": {
    "actions": [
      {
        "commands": [
          "-c",
          "mkdir -p /mnt/google/.google/tmp"
        ],
        "entrypoint": "bash",
        "imageUri": "gcr.io/cloud-genomics-pipelines/io",
        "mounts": [
          {
            "disk": "google",
            "path": "/mnt/google"
          }
        ]
      },
      {
        "commands": [
          "-c",
          "/opt/gcp_variant_transforms/bin/vcf_to_bq --input_pattern gs://gcp-public-data--gnomad/release/2.1.1/vcf/exomes/*.vcf.bgz --output_table eacri-genomics:gnomad.gnomad_hg19_2_1_1 --temp_location gs://psjh-eacri/balter/gnomad_tmp/vcf/exomes/*.vcf.bgz/tmp --job_name vcf-to-bigquery --runner DataflowRunner --zones us-east1-b --subnet subnet03 --project psjh-eacri-data --region us-east1 --temp_location gs://psjh-eacri/balter/gnomad_tmp/vcf/exomes/*.vcf.bgz/tmp"
        ],
        "entrypoint": "bash",
        "imageUri": "gcr.io/cloud-lifesciences/gcp-variant-transforms",
        "mounts": [
          {
            "disk": "google",
            "path": "/mnt/google"
          }
        ]
      },
      {
        "alwaysRun": true,
        "commands": [
          "-c",
          "gsutil -q cp /google/logs/output gs://psjh-eacri/balter/gnomad_tmp/vcf/exomes/*.vcf.bgz/tmp/runner_logs_20210510_230717.log"
        ],
        "entrypoint": "bash",
        "imageUri": "gcr.io/cloud-genomics-pipelines/io",
        "mounts": [
          {
            "disk": "google",
            "path": "/mnt/google"
          }
        ]
      }
    ],
    "environment": {
      "TMPDIR": "/mnt/google/.google/tmp"
    },
    "resources": {
      "regions": [
        "us-east1"
      ],
      "virtualMachine": {
        "disks": [
          {
            "name": "google",
            "sizeGb": 10
          }
        ],
        "machineType": "g1-small",
        "network": {},
        "serviceAccount": {
          "scopes": [
            "https://www.googleapis.com/auth/cloud-platform",
            "https://www.googleapis.com/auth/devstorage.read_write"
          ]
        }
      }
    }
  }
}
Pipeline running as "projects/447346450878/locations/us-central1/operations/13027962545459232820" (attempt: 1, preemptible: false)
Output will be written to "gs://psjh-eacri/balter/gnomad_tmp/vcf/exomes/*.vcf.bgz/tmp/runner_logs_20210510_230717.log"
23:07:26 Worker "google-pipelines-worker-ab367d994b1cd7881ebf66950fec6c17" assigned in "us-east1-b" on a "g1-small" machine
23:07:26 Execution failed: allocating: creating instance: inserting instance: Invalid value for field 'resource.networkInterfaces[0].network': ''. The referenced network resource cannot be found.
23:07:27 Worker released
"run": operation "projects/447346450878/locations/us-central1/operations/13027962545459232820" failed: executing pipeline: Execution failed: allocating: creating instance: inserting instance: Invalid value for field 'resource.networkInterfaces[0].network': ''. The referenced network resource cannot be found. (reason: INVALID_ARGUMENT)
(base) jupyter@balter-genomics:~$ ./script.sh
 --project 'psjh-eacri-data' --temp_location 'gs://psjh-eacri/balter/gnomad_tmp/vcf/exomes/*.vcf.bgz/tmp' -- 'vcf_to_bq     --input_pattern https://storage.googleapis.com/gcp-public-data--gnomad/release/2.1.1/vcf/exomes/gnomad.exomes.r2.1.1.sites.*.vcf.bgz     --output_table eacri-genomics:gnomad.gnomad_hg19_2_1_1     --temp_location gs://psjh-eacri/balter/gnomad_tmp/vcf/exomes/*.vcf.bgz/tmp     --job_name vcf-to-bigquery     --runner DataflowRunner     --zones us-east1-b     --subnet subnet03'
Your active configuration is: [variant]
{
  "pipeline": {
    "actions": [
      {
        "commands": [
          "-c",
          "mkdir -p /mnt/google/.google/tmp"
        ],
        "entrypoint": "bash",
        "imageUri": "gcr.io/cloud-genomics-pipelines/io",
        "mounts": [
          {
            "disk": "google",
            "path": "/mnt/google"
          }
        ]
      },
      {
        "commands": [
          "-c",
          "/opt/gcp_variant_transforms/bin/vcf_to_bq --input_pattern https://storage.googleapis.com/gcp-public-data--gnomad/release/2.1.1/vcf/exomes/gnomad.exomes.r2.1.1.sites.*.vcf.bgz --output_table eacri-genomics:gnomad.gnomad_hg19_2_1_1 --temp_location gs://psjh-eacri/balter/gnomad_tmp/vcf/exomes/*.vcf.bgz/tmp --job_name vcf-to-bigquery --runner DataflowRunner --zones us-east1-b --subnet subnet03 --project psjh-eacri-data --region us-east1 --temp_location gs://psjh-eacri/balter/gnomad_tmp/vcf/exomes/*.vcf.bgz/tmp"
        ],
        "entrypoint": "bash",
        "imageUri": "gcr.io/cloud-lifesciences/gcp-variant-transforms",
        "mounts": [
          {
            "disk": "google",
            "path": "/mnt/google"
          }
        ]
      },
      {
        "alwaysRun": true,
        "commands": [
          "-c",
          "gsutil -q cp /google/logs/output gs://psjh-eacri/balter/gnomad_tmp/vcf/exomes/*.vcf.bgz/tmp/runner_logs_20210511_000846.log"
        ],
        "entrypoint": "bash",
        "imageUri": "gcr.io/cloud-genomics-pipelines/io",
        "mounts": [
          {
            "disk": "google",
            "path": "/mnt/google"
          }
        ]
      }
    ],
    "environment": {
      "TMPDIR": "/mnt/google/.google/tmp"
    },
    "resources": {
      "regions": [
        "us-east1"
      ],
      "virtualMachine": {
        "disks": [
          {
            "name": "google",
            "sizeGb": 10
          }
        ],
        "machineType": "g1-small",
        "network": {},
        "serviceAccount": {
          "scopes": [
            "https://www.googleapis.com/auth/cloud-platform",
            "https://www.googleapis.com/auth/devstorage.read_write"
          ]
        }
      }
    }
  }
}
Pipeline running as "projects/447346450878/locations/us-central1/operations/3293803574088782620" (attempt: 1, preemptible: false)
Output will be written to "gs://psjh-eacri/balter/gnomad_tmp/vcf/exomes/*.vcf.bgz/tmp/runner_logs_20210511_000846.log"
00:08:56 Worker "google-pipelines-worker-e05c2864661a5ba9f1b29012de1ac56d" assigned in "us-east1-d" on a "g1-small" machine
00:08:56 Execution failed: allocating: creating instance: inserting instance: Invalid value for field 'resource.networkInterfaces[0].network': ''. The referenced network resource cannot be found.
00:08:57 Worker released
"run": operation "projects/447346450878/locations/us-central1/operations/3293803574088782620" failed: executing pipeline: Execution failed: allocating: creating instance: inserting instance: Invalid value for field 'resource.networkInterfaces[0].network': ''. The referenced network resource cannot be found. (reason: INVALID_ARGUMENT)

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Comments: 27 (10 by maintainers)

Most upvoted comments

@abalter I think @slagelwa is referring that in the code the command parser is missing the network parameter. If you look at this, it does not parse for the network option, which is probably why it is always empty:

https://github.com/googlegenomics/gcp-variant-transforms/blob/master/docker/pipelines_runner.sh#L25

getopt -o '' -l project:,temp_location:,docker_image:,region:,subnetwork:,use_public_ips:,service_account:,location: -- "$@"