great_expectations: "catch_exception" not functioning with v3 API
Describe the bug I am using the v3 API, SparkDFExecutionEngine with the following classes ExpectationConfiguration RuntimeBatchRequest context.get_validator().validate().
When I include “catch_exception” in the expectation kwargs or in context.get_validator().validate(catch_exceptions=True), an exception is still thrown.
Calculating Metrics: 50%|█████ | 4/8 [00:00<00:00, 5.30it/s]Traceback (most recent call last):
File "/opt/project/src/main/glue/reproduce.py", line 92, in <module>
expectation_suite=suite,
File "/usr/local/lib/python3.6/site-packages/great_expectations/validator/validator.py", line 1209, in validate
"result_format": result_format,
File "/usr/local/lib/python3.6/site-packages/great_expectations/validator/validator.py", line 474, in graph_validate
metrics = self.resolve_validation_graph(graph, metrics, runtime_configuration)
File "/usr/local/lib/python3.6/site-packages/great_expectations/validator/validator.py", line 519, in resolve_validation_graph
runtime_configuration=runtime_configuration,
File "/usr/local/lib/python3.6/site-packages/great_expectations/validator/validator.py", line 560, in _resolve_metrics
metrics_to_resolve, metrics, runtime_configuration
File "/usr/local/lib/python3.6/site-packages/great_expectations/execution_engine/execution_engine.py", line 282, in resolve_metrics
**metric_provider_kwargs
File "/usr/local/lib/python3.6/site-packages/great_expectations/expectations/metrics/metric_provider.py", line 58, in inner_func
return metric_fn(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/great_expectations/expectations/metrics/map_metric_provider.py", line 465, in inner_func
message=f'Error: The column "{column_name}" in BatchData does not exist.'
great_expectations.exceptions.exceptions.ExecutionEngineError: Error: The column "unknown_column" in BatchData does not exist.
Calculating Metrics: 50%|█████ | 4/8 [00:00<00:00, 4.45it/s]
To Reproduce Code to reproduce the behavior:
from great_expectations.data_context import BaseDataContext
from great_expectations.data_context.types.base import InMemoryStoreBackendDefaults
from great_expectations.data_context.types.base import DataContextConfig
from great_expectations.core import ExpectationSuite, ExpectationConfiguration
from great_expectations.core.batch import RuntimeBatchRequest
from pyspark.sql import SparkSession
import pandas as pd
data_context_config = DataContextConfig(
datasources={
"spark_datasource": {
"execution_engine": {
"class_name": "SparkDFExecutionEngine",
"module_name": "great_expectations.execution_engine",
},
"class_name": "Datasource",
"module_name": "great_expectations.datasource",
"data_connectors": {
"runtime_data_connector": {
"class_name": "RuntimeDataConnector",
"batch_identifiers": [
"domain_id",
"component_name"
]
}
}
}
},
validation_operators={
"action_list_operator": {
"class_name": "ActionListValidationOperator",
"action_list": [
{
"name": "store_validation_result",
"action": {"class_name": "StoreValidationResultAction"},
},
{
"name": "store_evaluation_params",
"action": {"class_name": "StoreEvaluationParametersAction"},
},
{
"name": "update_data_docs",
"action": {"class_name": "UpdateDataDocsAction"},
},
],
}
},
expectations_store_name="expectations_store",
validations_store_name="validations_store",
evaluation_parameter_store_name="evaluation_parameter_store",
checkpoint_store_name="checkpoint_store",
store_backend_defaults=InMemoryStoreBackendDefaults(),
)
context = BaseDataContext(project_config=data_context_config)
suite: ExpectationSuite = context.create_expectation_suite("suite", overwrite_existing=True)
expectation_configuration = ExpectationConfiguration(
expectation_type='expect_column_values_to_not_be_null',
kwargs={
'catch_exceptions': True, # expect exceptions to be caught
'result_format': 'SUMMARY',
'include_config': False,
'column': 'unknown_column' # intentionally incorrect column to force error
},
meta={
"Notes": "Some notes"
}
)
suite.add_expectation(expectation_configuration=expectation_configuration)
pandasDF = pd.DataFrame(data=[['Scott'], ['Jeff'], ['Thomas'], ['Ann']], columns=['Name'])
spark = SparkSession.builder.appName("local").getOrCreate()
sparkDF = spark.createDataFrame(pandasDF)
sparkDF.show()
runtime_batch_request = RuntimeBatchRequest(
datasource_name="spark_datasource",
data_connector_name="runtime_data_connector",
data_asset_name="insert_your_data_asset_name_here",
runtime_parameters={
"batch_data": sparkDF
},
batch_identifiers={
"domain_id": "ininfsgi283",
"component_name": "some_component",
}
)
validator = context.get_validator(
batch_request=runtime_batch_request,
expectation_suite=suite,
).validate()
results = validator.results
print(results)
Environment (please complete the following information):
- Operating System: MacOS
- Great Expectations Version: 0.13.26
Thanks for your time.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 19 (10 by maintainers)
I see - thank you for the clarification, @jacylnkan! I am hoping to have a better estimate for the resolution of this issue next week.
@talagluck this is what I have as a work around.
If you could keep me updated on this issues expected completion date that would be great. It will allow my team and I to plan accordingly. Thanks for your help.
Thanks for submitting, @KentonParton ! We will review internally and respond over the next week or so.