pipelines: Problem running bigquery example from ai hub

What steps did you take:

Hi i am trying to execute a bigquery to google cloud storage example which can be found in the google ai hub here:

https://aihub.cloud.google.com/p/products%2F4700cd7e-2826-4ce9-a1ad-33f4a5bf7433/v/1/downloadpage

What happened:

I downloaded the example from the ai hub and imported the zip file using the kubeflow ui. When creating a run using this pipeline i get the following error message:

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/local/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/ml/kfp_component/launcher/__main__.py", line 34, in <module>
    main()
  File "/ml/kfp_component/launcher/__main__.py", line 31, in main
    launch(args.file_or_module, args.args)
  File "kfp_component/launcher/launcher.py", line 45, in launch
    return fire.Fire(module, command=args, name=module.__name__)
  File "/usr/local/lib/python2.7/site-packages/fire/core.py", line 127, in Fire
    component_trace = _Fire(component, args, context, name)
  File "/usr/local/lib/python2.7/site-packages/fire/core.py", line 366, in _Fire
    component, remaining_args)
  File "/usr/local/lib/python2.7/site-packages/fire/core.py", line 542, in _CallCallable
    result = fn(*varargs, **kwargs)
  File "kfp_component/google/bigquery/_query.py", line 45, in query
    client = bigquery.Client(project=project_id)
  File "/usr/local/lib/python2.7/site-packages/google/cloud/bigquery/client.py", line 142, in __init__
    project=project, credentials=credentials, _http=_http
  File "/usr/local/lib/python2.7/site-packages/google/cloud/client.py", line 224, in __init__
    Client.__init__(self, credentials=credentials, _http=_http)
  File "/usr/local/lib/python2.7/site-packages/google/cloud/client.py", line 130, in __init__
    credentials, _ = google.auth.default()
  File "/usr/local/lib/python2.7/site-packages/google/auth/_default.py", line 305, in default
    credentials, project_id = checker()
  File "/usr/local/lib/python2.7/site-packages/google/auth/_default.py", line 165, in _get_explicit_environ_credentials
    os.environ[environment_vars.CREDENTIALS])
  File "/usr/local/lib/python2.7/site-packages/google/auth/_default.py", line 98, in _load_credentials_from_file
    six.raise_from(new_exc, caught_exc)
  File "/usr/local/lib/python2.7/site-packages/six.py", line 737, in raise_from
    raise value
google.auth.exceptions.DefaultCredentialsError: ('File /secret/gcp-credentials/user-gcp-sa.json is not a valid json file.', ValueError('No JSON object could be decoded',))

What did you expect to happen:

The pipeline succeeds to run and generates a file on my google cloud storage bucket and that the generated secret contains valid data.

How did you deploy Kubeflow Pipelines (KFP)?

Kubeflow is deployed on GCP using the AI Platform Pipeline UI, which uses the installer from here:

https://console.cloud.google.com/marketplace/details/google-cloud-ai-platform/kubeflow-pipelines?filter=solution-type%3Ak8s&filter=price%3Afree&filter=category%3Adeveloper-tools

Anything else you would like to add:

Looking into the kubernetes secret it seems that the definition is not correct as the Data fields seem to be empty.

 kubectl describe secrets user-gcp-sa
Name:         user-gcp-sa
Namespace:    default
Labels:       app=gcp-sa
              app.kubernetes.io/name=kubeflow-pipelines
Annotations:
Type:         Opaque

Data
====
application_default_credentials.json:  0 bytes
user-gcp-sa.json:                      0 bytes

Any hints on this would be greatly apprechiated.

/kind bug

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 17 (12 by maintainers)

Most upvoted comments

@Bobgy thank you very much for your help. Finally i was able to run this example succesfully.

For anyone coming to this, here is what i did:

clone this repository
create a venv environment
installed kubeflow sdk into the venv using the instructions from here: https://www.kubeflow.org/docs/pipelines/sdk/install-sdk/
create a component.py python file with the code from sample.ipynb upto the compile step, like this

import kfp.compiler as compiler
import kfp.components as comp
import kfp.dsl as dsl

bigquery_query_op = comp.load_component_from_url(
    'https://raw.githubusercontent.com/kubeflow/pipelines/01a23ae8672d3b18e88adf3036071496aca3552d/components/gcp/bigquery/query/component.yaml')
help(bigquery_query_op)

QUERY = 'SELECT * FROM `bigquery-public-data.stackoverflow.posts_questions` LIMIT 10'
PROJECT_ID = 'PROJECT_ID'
DATASET_ID = 'DATASET_ID'
TABLE_ID = 'TABLE_ID'
GCS_WORKING_DIR = 'gs://your/bucket/path'  # No ending slash
EXPERIMENT_NAME = 'Bigquery -Query'
OUTPUT_PATH = '{}/bigquery/query/questions.csv'.format(GCS_WORKING_DIR)


@dsl.pipeline(
    name='Bigquery query pipeline',
    description='Bigquery query pipeline'
)
def pipeline(
        query=QUERY,
        project_id=PROJECT_ID,
        dataset_id=DATASET_ID,
        table_id=TABLE_ID,
        output_gcs_path=OUTPUT_PATH,
        dataset_location='US',
        job_config=''
):
    bigquery_query_op(
        query=query,
        project_id=project_id,
        dataset_id=dataset_id,
        table_id=table_id,
        output_gcs_path=output_gcs_path,
        dataset_location=dataset_location,
        job_config=job_config)


pipeline_func = pipeline
pipeline_filename = pipeline_func.__name__ + '.zip'

compiler.Compiler().compile(pipeline_func, pipeline_filename)

execute this command: dsl-compile --py component.py --output ./output/pipeline.zip
upload this zip file in the kubeflow web ui and create a run

I also needed to make sure your compute engine default service account has the following roles:

gcloud projects add-iam-policy-binding YOUR_PROJECT --member=serviceAccount:YOUR_SA@developer.gserviceaccount.com --role=roles/storage.admin
gcloud projects add-iam-policy-binding YOUR_PROJECT --member=serviceAccount:YOUR_SA@developer.gserviceaccount.com --role=roles/bigquery.admin

oemergenc on May 8, 2020