airflow: BigQueryInsertJobOperator is broken on any type of job except `query`

Apache Airflow Provider(s)

google

Versions of Apache Airflow Providers

apache-airflow-providers-google==7.0.0

Apache Airflow version

2.2.5

Operating System

MacOS 12.2.1

Deployment

Official Apache Airflow Helm Chart

Deployment details

No response

What happened

We are using BigQueryInsertJobOperator to load data from parquet files in Google Cloud Storage with this kind of configuration:

BigQueryInsertJobOperator(
        task_id="load_to_bq",
        configuration={
            "load": {
                "writeDisposition": "WRITE_APPEND",
                "createDisposition": "CREATE_IF_NEEDED",
                "destinationTable": destination_table,
                "sourceUris": source_files
                "sourceFormat": "PARQUET"
            }
        }

After upgrade to apache-airflow-providers-google==7.0.0 all load jobs are now broken. I believe that problem lies in this line: https://github.com/apache/airflow/blob/5bfacf81c63668ea63e7cb48f4a708a67d0ac0a2/airflow/providers/google/cloud/operators/bigquery.py#L2170

So it’s trying to get the destination table from query job config and makes it impossible to use any other type of job.

What you think should happen instead

No response

How to reproduce

Use BigQueryInsertJobOperator to submit any type of job except query

Anything else

Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.9/site-packages/airflow/providers/google/cloud/operators/bigquery.py", line 2170, in execute
    table = job.to_api_repr()["configuration"]["query"]["destinationTable"]
KeyError: 'query'

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 9
  • Comments: 24 (12 by maintainers)

Most upvoted comments

I installed the new 8.0.0rc1 on google composer and it seems to have fixed the problem.

Thx for your help @raphaelauv

@potiuk I’ve tested it and it’s working as expected, the test details are in test status https://github.com/apache/airflow/issues/24289#issuecomment-1148963358

Thank you @raphaelauv, and everyone 👍

@DrStriky I have no idea what I’ve been doing before, I thought I tried exactly that but it didn’t work, now it worked 😃 Thank you for your patience with a gcp beginner.

@MazrimT yeah sure

On the Google composer you can define pypi packages. In this tab you need to add

apache-airflow-providers-google ==8.0.0rc2

This should install this specific version, and override the composer internal version of that package

The RC is out https://github.com/apache/airflow/issues/24289 @takuma11248250 @gilangardya I’d love your “test status” in the #24289 - that’s where everyone else will be posting theirs.

Hi @potiuk I’m having the same problem with my Composer, I can help test the RC version once it’s released.

hi. I have exactly the issue. When do you expect this issue to be resolved?

Have you read comments above @takuma11248250 ?

This is open source project - the fix will arrive when someone decides to spend the time on fixing it. If you volunteer I’d be happy assign this issue to you and assist with code review.

This will be solved and released when someone solves it. You will not get answers on when it will be fixed but by contributing and providing fixes and testing you can help with speeding it up.

Let me revert the question. Do you expect to help with it by providing all details to help us solve it ?

Can we cound you to observe the issue and when we release an RC you will help with testing @takuma11248250 ?

This is open source project - the fix will arrive when someone decides to spend the time on fixing it. If you volunteer I’d be happy assign this issue to you and assist with code review.