great_expectations: great_expectations : not able to connect to BigQuery instance
Describe the bug
print(validator.head()) gives
for (metric_name, metric_configuration) in metrics.items()
KeyError: ('table.head', 'batch_id=7d60dd00fbc6ebeac60da5ea671dccb0', '04166707abe073177c1dd922d3584468')
when connecting to BigQuery instance
To Reproduce I am trying to configure the great_expectations tool
and here is the documentation i followed
https://docs.greatexpectations.io/docs/guides/connecting_to_your_data/database/bigquery
and i was able to create
import os
from ruamel import yaml
import great_expectations as ge
from great_expectations.core.batch import BatchRequest, RuntimeBatchRequest
gcp_project = 'my-gcp-project'
bigquery_dataset = 'my-dataset'
CONNECTION_STRING = f"bigquery://{gcp_project}/{bigquery_dataset}"
context = ge.get_context()
datasource_config = {
"name": "my_bigquery_datasource",
"class_name": "Datasource",
"execution_engine": {
"class_name": "SqlAlchemyExecutionEngine",
"connection_string": "bigquery://my-gcp-project/my-dataset",
},
"data_connectors": {
"default_runtime_data_connector_name": {
"class_name": "RuntimeDataConnector",
"batch_identifiers": ["default_identifier_name"],
},
"default_inferred_data_connector_name": {
"class_name": "InferredAssetSqlDataConnector",
"name": "whole_table",
},
},
}
datasource_config["execution_engine"]["connection_string"] = CONNECTION_STRING
context.test_yaml_config(yaml.dump(datasource_config))
context.add_datasource(**datasource_config)
batch_request = RuntimeBatchRequest(
datasource_name="my_bigquery_datasource",
data_connector_name="default_runtime_data_connector_name",
data_asset_name="json_data", # this can be anything that identifies this data
runtime_parameters={"query": "SELECT * from mytable where colA='someVal' LIMIT 10"},
batch_identifiers={"default_identifier_name": "default_identifier"},
batch_spec_passthrough={
"bigquery_temp_table": "ge_temp"
}, # this is the name of the table you would like to use a 'temp_table'
)
context.create_expectation_suite(
expectation_suite_name="test_suite", overwrite_existing=True
)
validator = context.get_validator(
batch_request=batch_request, expectation_suite_name="test_suite"
)
print(validator.head())
assert isinstance(validator, ge.validator.validator.Validator)
however, when i run this i get
Traceback (most recent call last):
File "geBq.py", line 55, in <module>
print(validator.head())
File "/some/path/python3.7/site-packages/great_expectations/validator/validator.py", line 1620, in head
"fetch_all": fetch_all,
File "/some/path/python3.7/site-packages/great_expectations/validator/validator.py", line 364, in get_metric
return self.get_metrics({"_": metric})["_"]
File "/some/path/python3.7/site-packages/great_expectations/validator/validator.py", line 359, in get_metrics
for (metric_name, metric_configuration) in metrics.items()
File "/some/path/python3.7/site-packages/great_expectations/validator/validator.py", line 359, in <dictcomp>
for (metric_name, metric_configuration) in metrics.items()
KeyError: ('table.head', 'batch_id=7d60dd00fbc6ebeac60da5ea671dccb0', '04166707abe073177c1dd922d3584468')
what does this error mean? how can i resolve it? i just want to connect to my BigQuery table and then run some validations on it. Btw, i am able to query this table on the BigQuery console
Expected behavior I should be able to successfully connect Great Expectations with my data on BigQuery instance
Environment (please complete the following information):
- Operating System: Linux
- Great Expectations Version: 0.13.37
Additional context Please let me know if I can add any other details
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 4
- Comments: 18 (7 by maintainers)
@pmariac with BatchRequest it works but using RuntimeBatchRequest it returns that error as well. I think it is because of this: ( this is my code which returns the same error as you, I debug it a little bit and come across that query error)
I already tried different types of queries, with and without parenthesis, etc but seems not be working because it is expecting a table after the FROM and it is not working for subquerys as well
@talagluck I think it’s worth reopening this issue.
Got the exact same error here.
I have the BigQuery Read Session User permission. I’m auth using
At first, everything is working as expected :
But on calling the validator.head() method I got the same error as abtpst:
Hi @abtpst ! We’ve recently done some work around
RuntimeBatchRequest. Would you please confirm whether this is still an issue?Gotcha - thanks for the update, @FredRodrigues! We currently have a few other RuntimeBatchRequest issues in progress, so I was wondering if this was related, but this feels specific to BigQuery. I’m going to re-open this, and take a look over the next week.
I currently have the exact same issue. At first, it was a problem of permissions, because I have read-only access to that dataset. I solved it by using the create_temp_table to False. Now I am having the same issue as you. I even reproduce the exact same code, only changing the project id, and this error happens on a dataset that has read-only access. The project where I have full access, the code works. Hope this info helps.