great_expectations: "catch_exception" not functioning with v3 API

Describe the bug I am using the v3 API, SparkDFExecutionEngine with the following classes ExpectationConfiguration RuntimeBatchRequest context.get_validator().validate().

When I include “catch_exception” in the expectation kwargs or in context.get_validator().validate(catch_exceptions=True), an exception is still thrown.

Calculating Metrics:  50%|█████     | 4/8 [00:00<00:00,  5.30it/s]Traceback (most recent call last):
  File "/opt/project/src/main/glue/reproduce.py", line 92, in <module>
    expectation_suite=suite,
  File "/usr/local/lib/python3.6/site-packages/great_expectations/validator/validator.py", line 1209, in validate
    "result_format": result_format,
  File "/usr/local/lib/python3.6/site-packages/great_expectations/validator/validator.py", line 474, in graph_validate
    metrics = self.resolve_validation_graph(graph, metrics, runtime_configuration)
  File "/usr/local/lib/python3.6/site-packages/great_expectations/validator/validator.py", line 519, in resolve_validation_graph
    runtime_configuration=runtime_configuration,
  File "/usr/local/lib/python3.6/site-packages/great_expectations/validator/validator.py", line 560, in _resolve_metrics
    metrics_to_resolve, metrics, runtime_configuration
  File "/usr/local/lib/python3.6/site-packages/great_expectations/execution_engine/execution_engine.py", line 282, in resolve_metrics
    **metric_provider_kwargs
  File "/usr/local/lib/python3.6/site-packages/great_expectations/expectations/metrics/metric_provider.py", line 58, in inner_func
    return metric_fn(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/great_expectations/expectations/metrics/map_metric_provider.py", line 465, in inner_func
    message=f'Error: The column "{column_name}" in BatchData does not exist.'
great_expectations.exceptions.exceptions.ExecutionEngineError: Error: The column "unknown_column" in BatchData does not exist.
Calculating Metrics:  50%|█████     | 4/8 [00:00<00:00,  4.45it/s]

To Reproduce Code to reproduce the behavior:

from great_expectations.data_context import BaseDataContext
from great_expectations.data_context.types.base import InMemoryStoreBackendDefaults
from great_expectations.data_context.types.base import DataContextConfig
from great_expectations.core import ExpectationSuite, ExpectationConfiguration
from great_expectations.core.batch import RuntimeBatchRequest
from pyspark.sql import SparkSession
import pandas as pd

data_context_config = DataContextConfig(
    datasources={
        "spark_datasource": {
            "execution_engine": {
                "class_name": "SparkDFExecutionEngine",
                "module_name": "great_expectations.execution_engine",
            },
            "class_name": "Datasource",
            "module_name": "great_expectations.datasource",
            "data_connectors": {
                "runtime_data_connector": {
                    "class_name": "RuntimeDataConnector",
                    "batch_identifiers": [
                        "domain_id",
                        "component_name"
                    ]
                }
            }
        }
    },
    validation_operators={
        "action_list_operator": {
            "class_name": "ActionListValidationOperator",
            "action_list": [
                {
                    "name": "store_validation_result",
                    "action": {"class_name": "StoreValidationResultAction"},
                },
                {
                    "name": "store_evaluation_params",
                    "action": {"class_name": "StoreEvaluationParametersAction"},
                },
                {
                    "name": "update_data_docs",
                    "action": {"class_name": "UpdateDataDocsAction"},
                },
            ],
        }
    },
    expectations_store_name="expectations_store",
    validations_store_name="validations_store",
    evaluation_parameter_store_name="evaluation_parameter_store",
    checkpoint_store_name="checkpoint_store",
    store_backend_defaults=InMemoryStoreBackendDefaults(),
)

context = BaseDataContext(project_config=data_context_config)
suite: ExpectationSuite = context.create_expectation_suite("suite", overwrite_existing=True)

expectation_configuration = ExpectationConfiguration(
    expectation_type='expect_column_values_to_not_be_null',
    kwargs={
        'catch_exceptions': True,  # expect exceptions to be caught
        'result_format': 'SUMMARY',
        'include_config': False,
        'column': 'unknown_column'  # intentionally incorrect column to force error
    },
    meta={
        "Notes": "Some notes"
    }
)
suite.add_expectation(expectation_configuration=expectation_configuration)

pandasDF = pd.DataFrame(data=[['Scott'], ['Jeff'], ['Thomas'], ['Ann']], columns=['Name'])

spark = SparkSession.builder.appName("local").getOrCreate()
sparkDF = spark.createDataFrame(pandasDF)
sparkDF.show()

runtime_batch_request = RuntimeBatchRequest(
    datasource_name="spark_datasource",
    data_connector_name="runtime_data_connector",
    data_asset_name="insert_your_data_asset_name_here",
    runtime_parameters={
        "batch_data": sparkDF
    },
    batch_identifiers={
        "domain_id": "ininfsgi283",
        "component_name": "some_component",
    }
)
validator = context.get_validator(
    batch_request=runtime_batch_request,
    expectation_suite=suite,
).validate()
results = validator.results
print(results)

Environment (please complete the following information):

  • Operating System: MacOS
  • Great Expectations Version: 0.13.26

Thanks for your time.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 19 (10 by maintainers)

Most upvoted comments

I see - thank you for the clarification, @jacylnkan! I am hoping to have a better estimate for the resolution of this issue next week.

@talagluck this is what I have as a work around.

    def process(self):
        data_context_config = self.set_data_context_config()
        context = BaseDataContext(project_config=data_context_config)

        batch_df = self.load_batch()
        runtime_batch_request = self.create_runtime_batch_request(batch_df)
        results = []
        failure_results = []
        for expectation in self.rules_list:
            try:
                suite: ExpectationSuite = context.create_expectation_suite("suite", overwrite_existing=True)
                expectation_configuration = ExpectationConfiguration(
                    expectation_type=expectation["method_name"],
                    kwargs=expectation["kwargs"],
                    meta=expectation["meta"]
                )
                suite.add_expectation(expectation_configuration=expectation_configuration)

                validator = context.get_validator(
                    batch_request=runtime_batch_request,
                    expectation_suite=suite,
                ).validate()
                results.extend(self.format_results(validator.results))
            except Exception as ex:
                failure_results.append(self.capture_exception_details(expectation["meta"], ex.message, traceback.format_exc()))
                continue

        return results, failure_results

If you could keep me updated on this issues expected completion date that would be great. It will allow my team and I to plan accordingly. Thanks for your help.

Thanks for submitting, @KentonParton ! We will review internally and respond over the next week or so.