great_expectations: batch_request passed in to SimpleCheckpoint errors

Describe the bug When creating a SimpleCheckpoint and passing in a batch_request: BatchRequest, it seems to treat the batch_request as a dictionary instead of a BatchRequest and errors out AttributeError: 'BatchRequest' object has no attribute 'items' here

To Reproduce Steps to reproduce the behavior:



import great_expectations as ge
from great_expectations.core.batch import RuntimeBatchRequest
from great_expectations.cli.datasource import sanitize_yaml_and_save_datasource
from great_expectations.checkpoint import SimpleCheckpoint

import pandas as pd


context = ge.get_context()
df = pd.DataFrame({"col1": ["a", "a", "b", "c"], "col2": [1, 2, 3, 4]})
config = """
    name: my_pandas_datasource
    class_name: Datasource
    execution_engine:
        class_name: PandasExecutionEngine
    data_connectors:
        my_runtime_data_connector:
          class_name: RuntimeDataConnector
          batch_identifiers:
              - some_batch_identifier_so_this_can_work
"""



context.test_yaml_config(
    yaml_config=config
)
sanitize_yaml_and_save_datasource(context, config, overwrite_existing=True)


runtime_batch_request = RuntimeBatchRequest(
    datasource_name="my_pandas_datasource",
    data_connector_name="my_runtime_data_connector",
    data_asset_name="insert_your_data_asset_name_here",
    runtime_parameters={
      "batch_data": df
    },
    batch_identifiers={
        "some_batch_identifier_so_this_can_work": "blah",
    }
)
my_checkpoint = SimpleCheckpoint(
    name="my_checkpoint",
    data_context=context,
    batch_request = runtime_batch_request
)

>>

AttributeError                            Traceback (most recent call last)
<ipython-input-1-b72466658554> in <module>
     43     name="my_checkpoint",
     44     data_context=context,
---> 45     batch_request = runtime_batch_request
     46 )

~/.pyenv/versions/anaconda3-2020.02/envs/sandbox/lib/python3.7/site-packages/great_expectations/checkpoint/checkpoint.py in __init__(self, name, data_context, config_version, template_name, module_name, class_name, run_name_template, expectation_suite_name, batch_request, action_list, evaluation_parameters, runtime_configuration, validations, profilers, validation_operator_name, batches, site_names, slack_webhook, notify_on, notify_with, **kwargs)
    672             slack_webhook=slack_webhook,
    673             notify_on=notify_on,
--> 674             notify_with=notify_with,
    675         ).build()
    676 

~/.pyenv/versions/anaconda3-2020.02/envs/sandbox/lib/python3.7/site-packages/great_expectations/checkpoint/configurator.py in build(self)
    112         self._validate_slack_configuration()
    113 
--> 114         return self._build_checkpoint_config()
    115 
    116     def _build_checkpoint_config(self) -> CheckpointConfig:

~/.pyenv/versions/anaconda3-2020.02/envs/sandbox/lib/python3.7/site-packages/great_expectations/checkpoint/configurator.py in _build_checkpoint_config(self)
    134                         "config_version": self.other_kwargs.pop("config_version", 1.0)
    135                         or 1.0,
--> 136                         **self.other_kwargs,
    137                     }
    138                 )

~/.pyenv/versions/anaconda3-2020.02/envs/sandbox/lib/python3.7/site-packages/great_expectations/data_context/types/base.py in update(self, other_config, runtime_kwargs)
   1743                 updated_batch_request = nested_update(
   1744                     batch_request,
-> 1745                     other_batch_request,
   1746                 )
   1747                 self._batch_request = updated_batch_request

~/.pyenv/versions/anaconda3-2020.02/envs/sandbox/lib/python3.7/site-packages/great_expectations/core/util.py in nested_update(d, u, dedup)
     69 ):
     70     """update d with items from u, recursively and joining elements"""
---> 71     for k, v in u.items():
     72         if isinstance(v, Mapping):
     73             d[k] = nested_update(d.get(k, {}), v, dedup=dedup)

AttributeError: 'RuntimeBatchRequest' object has no attribute 'items'

Expected behavior I expected the above to work since I’m passing in batch_request: BatchRequest

Environment (please complete the following information):

  • Operating System: MacOS
  • Great Expectations Version: 0.13.17

Additional context None

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 16 (7 by maintainers)

Most upvoted comments

@cdkini I’m very new on great expectations, I want to work with custom query using BigQuery

I tried to following this code , base on this url : How to load a database table, view, or query result as a batch

I facing this error :

Loaded ExpectationSuite "bigquery.data_quality.store_info_missing" containing 0 expectations.
Traceback (most recent call last):
  File "/Users/herry/Project/python-flash-coffee/airflow/dags/include/great_expectations/uncommitted/test.py", line 20, in <module>
    my_validator: Validator = context.get_validator(
  File "/Users/herry/Project/python/airflow/env/lib/python3.9/site-packages/great_expectations/data_context/data_context/abstract_data_context.py", line 932, in get_validator
    self.get_batch_list(
  File "/Users/herry/Project/python/airflow/env/lib/python3.9/site-packages/great_expectations/core/usage_statistics/usage_statistics.py", line 294, in usage_statistics_wrapped_method
    result = func(*args, **kwargs)
  File "/Users/herry/Project/python/airflow/env/lib/python3.9/site-packages/great_expectations/data_context/data_context/base_data_context.py", line 1229, in get_batch_list
    return super().get_batch_list(
  File "/Users/herry/Project/python/airflow/env/lib/python3.9/site-packages/great_expectations/data_context/data_context/abstract_data_context.py", line 1103, in get_batch_list
    return datasource.get_batch_list_from_batch_request(batch_request=batch_request)
  File "/Users/herry/Project/python/airflow/env/lib/python3.9/site-packages/great_expectations/datasource/new_datasource.py", line 160, in get_batch_list_from_batch_request
    raise ValueError(
ValueError: RuntimeBatchRequests must specify exactly one corresponding BatchDefinition

Here is the python script

import great_expectations as ge
from great_expectations import DataContext
from great_expectations.core import ExpectationSuite
from great_expectations.core.batch import RuntimeBatchRequest
from great_expectations.exceptions import DataContextError
from great_expectations.validator.validator import Validator

context: DataContext = ge.get_context()  

expectation_suite_name = "bigquery.data_quality.store_info_missing"

try:
    suite = context.get_expectation_suite(expectation_suite_name=expectation_suite_name)
    print(f'Loaded ExpectationSuite "{suite.expectation_suite_name}" containing {len(suite.expectations)} expectations.')
except DataContextError:
    suite = context.create_expectation_suite(expectation_suite_name=expectation_suite_name)
    print(f'Created ExpectationSuite "{suite.expectation_suite_name}".')


my_validator: Validator = context.get_validator(
    datasource_name = "production",
    data_connector_name = "default_inferred_data_connector_name",
    data_asset_name = "test1.store", # this can be anything that identifies this data_asset for you
    runtime_parameters={
        "query": "SELECT * FROM test1.store LIMIT 10"
    },
    batch_identifiers={
        "some_key_maybe_pipeline_stage": "validation_stage",
        "some_other_key_maybe_run_id": 1234567890
    },
    # Use batch_spec_passthrough to control whether the associated SqlAlchemy Execution Engine will create
    # a temporary table
    batch_spec_passthrough={
      "create_temp_table": False  # if not provided, this defaults to True
    },
    expectation_suite = suite,
)

my_validator.active_batch.head()

I cannot find out on stackoverflow of github issue about this issue any help from your side ?

Thanks

Hey @lyra-victor @geertjan-garvis @rr-chiranjeevi-ds thanks so much for point this out!

I apologize for the delay; the core team has been focusing on a few other features so this fell into our backlog. @bhcastleton put this on my radar earlier today and I’ll definitely be making it a priority moving forward.

It looks like there’s a mismatch of types when we get to the nested_update function so I’ll have to do a bit of digging to see why that is. To be fully transparent, I am a bit new to this part of the codebase but I’ll keep this thread updated with any findings.

Thanks again for your patience 🙏🏽

A PR has been issued and is going through the review process. You can see the changes made at #3152 and make any comments/suggestions there if you wish!

We’ll hopefully have this merged and ready to use shortly.

@lyra-victor Thank you for reporting. We will post here once we look at this issue deeper.