airflow: BigQuery with impersonation_chain does not accept custom scopes

Apache Airflow version

main (development)

What happened

I always face the following error when I try to run a BigQuery query that accesses connected sheets, when I use impersonation_chain.

  File "/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/hooks/bigquery.py", line 2203, in run_query
    job = self.insert_job(configuration=configuration, project_id=self.project_id)
  File "/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/common/hooks/base_google.py", line 439, in inner_wrapper
    return func(self, *args, **kwargs)
  File "/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/hooks/bigquery.py", line 1571, in insert_job
    job.result(timeout=timeout, retry=retry)
  File "/opt/python3.8/lib/python3.8/site-packages/google/cloud/bigquery/job/query.py", line 1499, in result
    do_get_result()
  File "/opt/python3.8/lib/python3.8/site-packages/google/cloud/bigquery/job/query.py", line 1489, in do_get_result
    super(QueryJob, self).result(retry=retry, timeout=timeout)
  File "/opt/python3.8/lib/python3.8/site-packages/google/cloud/bigquery/job/base.py", line 728, in result
    return super(_AsyncJob, self).result(timeout=timeout, **kwargs)
  File "/opt/python3.8/lib/python3.8/site-packages/google/api_core/future/polling.py", line 137, in result
    raise self._exception
google.api_core.exceptions.Forbidden: 403 Access Denied: BigQuery BigQuery: Permission denied while getting Drive credentials.

I think it’s because it always uses a default scope: https://www.googleapis.com/auth/cloud-platform. We can set scopes with Airflow connections (code). However, we cannot set scopes with impersonation_chain.

What you think should happen instead

I would like the operators and hooks to accept custom scope - https://www.googleapis.com/auth/drive in this case.

How to reproduce

  1. Prepare a connected sheet.
  2. Run a task with BigQueryInsertJobOperator (or the like) to run a BigQuery query against the connected sheet, using impersonation_chain.
  3. You’ll face the error:
    403 Access Denied: BigQuery BigQuery: Permission denied while getting Drive credentials.
    

Operating System

Linux

Versions of Apache Airflow Providers

No response

Deployment

Google Cloud Composer

Deployment details

No response

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Reactions: 1
  • Comments: 19 (6 by maintainers)

Most upvoted comments

Possible workaround:

  • Step1: Extend a BigQueryHook class and overwrite GoogleBaseHook#scopes method as follows:
    class BigQueryHookWithScopes(BigQueryHook):
      def __init__(self, scopes: Sequence[str], *args, **kwargs):
          super().__init__(*args, **kwargs)
          self._scopes = scopes
    
      @property
      def scopes(self) -> Sequence[str]:
          return self._scopes
    
  • Step2: Extend a BigQuery related Operators to use the above hook as follows:
    class BigQueryExecuteQueryOperatorWithScope(BigQueryExecuteQueryOperator):
        def __init__(self, scopes, *args, **kwargs):
            super().__init__(*args, **kwargs)
            self.scopes = scopes
    
        def execute(self, context):
            self.hook = BigQueryHookWithScopes(
                scopes=self.scopes,
                gcp_conn_id=self.gcp_conn_id,
                use_legacy_sql=self.use_legacy_sql,
                delegate_to=self.delegate_to,
                location=self.location,
                impersonation_chain=self.impersonation_chain,
            )
            super().execute(context)
    

No problem. I’ll ask around to see if other people have some thoughts on this too.