airflow: BigQueryInsertJobOperator is broken on any type of job except `query`
Apache Airflow Provider(s)
Versions of Apache Airflow Providers
apache-airflow-providers-google==7.0.0
Apache Airflow version
2.2.5
Operating System
MacOS 12.2.1
Deployment
Official Apache Airflow Helm Chart
Deployment details
No response
What happened
We are using BigQueryInsertJobOperator
to load data from parquet files in Google Cloud Storage with this kind of configuration:
BigQueryInsertJobOperator(
task_id="load_to_bq",
configuration={
"load": {
"writeDisposition": "WRITE_APPEND",
"createDisposition": "CREATE_IF_NEEDED",
"destinationTable": destination_table,
"sourceUris": source_files
"sourceFormat": "PARQUET"
}
}
After upgrade to apache-airflow-providers-google==7.0.0
all load jobs are now broken. I believe that problem lies in this line: https://github.com/apache/airflow/blob/5bfacf81c63668ea63e7cb48f4a708a67d0ac0a2/airflow/providers/google/cloud/operators/bigquery.py#L2170
So it’s trying to get the destination table from query
job config and makes it impossible to use any other type of job.
What you think should happen instead
No response
How to reproduce
Use BigQueryInsertJobOperator to submit any type of job except query
Anything else
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.9/site-packages/airflow/providers/google/cloud/operators/bigquery.py", line 2170, in execute
table = job.to_api_repr()["configuration"]["query"]["destinationTable"]
KeyError: 'query'
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project’s Code of Conduct
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 9
- Comments: 24 (12 by maintainers)
I installed the new 8.0.0rc1 on google composer and it seems to have fixed the problem.
Thx for your help @raphaelauv
@potiuk I’ve tested it and it’s working as expected, the test details are in test status https://github.com/apache/airflow/issues/24289#issuecomment-1148963358
Thank you @raphaelauv, and everyone 👍
@DrStriky I have no idea what I’ve been doing before, I thought I tried exactly that but it didn’t work, now it worked 😃 Thank you for your patience with a gcp beginner.
@MazrimT yeah sure
On the Google composer you can define pypi packages. In this tab you need to add
apache-airflow-providers-google ==8.0.0rc2
This should install this specific version, and override the composer internal version of that package
The RC is out https://github.com/apache/airflow/issues/24289 @takuma11248250 @gilangardya I’d love your “test status” in the #24289 - that’s where everyone else will be posting theirs.
Hi @potiuk I’m having the same problem with my Composer, I can help test the RC version once it’s released.
Have you read comments above @takuma11248250 ?
This will be solved and released when someone solves it. You will not get answers on when it will be fixed but by contributing and providing fixes and testing you can help with speeding it up.
Let me revert the question. Do you expect to help with it by providing all details to help us solve it ?
Can we cound you to observe the issue and when we release an RC you will help with testing @takuma11248250 ?
This is open source project - the fix will arrive when someone decides to spend the time on fixing it. If you volunteer I’d be happy assign this issue to you and assist with code review.