great_expectations: Data docs does not contain the results of my expectations

Describe the bug This is probably not a bug and is user error but I didn’t see a suitable template… I am trying to run expectations via code (not the CLI) as a part of my ETL pipeline in order to validate data before it goes to production. I want to save the expectation results json and upload it to S3 and setup a S3-hosted data docs to pull from those results & the expectation suite.

To Reproduce Steps to reproduce the behavior:

Run the below attached code
Note that when data docs opens, it only contains the expectations, and not the results of those expectations.

Expected behavior Data docs contains the results of my expectations.

Environment (please complete the following information):

Operating System: Linux & MacOS
Great Expectations Version: [e.g. 0.14.1]

Additional context If there is a better way to do this that say better leverages existing great_expectations features, do please point me in that direction. Notably, I couldn’t make the CLI configuration of my great_expectations.yml work for me, as I need this to run dynamically in a pipeline, uploading to different locations depending on client.

import great_expectations as ge
from great_expectations.data_context.types.base import DataContextConfig, DatasourceConfig, FilesystemStoreBackendDefaults
from great_expectations.data_context import BaseDataContext
import numpy as np
import pandas as pd
import json
import os
from datetime import datetime
from great_expectations.data_context.types.resource_identifiers import (ExpectationSuiteIdentifier,
    ValidationResultIdentifier,
)

df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))

abs_path = os.getcwd() + '/great_expectations'

data_context_config = DataContextConfig(
    datasources={
        "my_pandas_datasource": DatasourceConfig(
            class_name="PandasDatasource",
        )
    },
    store_backend_defaults=FilesystemStoreBackendDefaults(root_directory=abs_path),
)
context = BaseDataContext(project_config=data_context_config)

domain_name = 'test'

suite = context.create_expectation_suite(domain_name, overwrite_existing=True)

batch_kwargs = {
    "datasource": 'my_pandas_datasource',
    "dataset": df,
    "data_asset_name": domain_name,
}

batch = context.get_batch(batch_kwargs, "test")

print(batch.head())

batch.expect_table_row_count_to_be_between(max_value=250, min_value=10)

batch.expect_table_column_count_to_equal(value=4)

batch.expect_table_columns_to_match_ordered_list(
    column_list=[
        "A",
        "B",
        "C",
        "D",
    ]
)

batch.expect_column_values_to_not_be_null(column="A",
    result_format='COMPLETE')

batch.expect_column_values_to_be_null(column="A",
    result_format='COMPLETE')

batch.expect_column_values_to_be_in_set(
    column="A",
    value_set=["A", "B", "C", "D", "E", "F"],
    result_format='COMPLETE'
)

results = batch.validate()


# This step is optional, but useful - evaluate the Expectations against the current batch of data
run_id = {
"run_name": domain_name,
"run_time": datetime.now()
}
results = batch.validate(expectation_suite=None,
                                run_id=None,
                                data_context=context,
                                evaluation_parameters=None,
                                catch_exceptions=True,
                                only_return_failures=False,
                                run_name=domain_name,
                                run_time=datetime.now(),)

# save the Expectation Suite (by default to a JSON file in great_expectations/expectations folder
# batch.save_expectation_suite(suite, domain_name, discard_failed_expectations=False)
batch.save_expectation_suite(discard_failed_expectations=False)

# Neither details nor meta (I inferred this as expected) seem to contain an expectation_suite_identifier
# expectation_suite_identifier = list(results["details"].keys())[0]
# expectation_suite_identifier = list(results["meta"].keys())[0]
# print('expectation_suite_identifier')
# print(expectation_suite_identifier)

validation_result_identifier = ValidationResultIdentifier(
    expectation_suite_identifier=domain_name,
    # expectation_suite_identifier=expectation_suite_identifier,
    batch_identifier=batch.batch_kwargs.to_id(),
    run_id=run_id
)

# This doesn't work
# context.build_data_docs()

# Neither does this
# context.build_data_docs(domain_name, results)
# context.open_data_docs(domain_name)

# Neither does this
# context.build_data_docs(domain_name, suite_identifier)
# context.open_data_docs(suite_identifier)
# context.open_data_docs(validation_result_identifier)

# Neither does this
suite_identifier = ExpectationSuiteIdentifier(expectation_suite_name=domain_name)
context.build_data_docs(domain_name, suite_identifier)
context.open_data_docs()

with open('validation_results.json', 'w') as f:
    f.write(str(results))

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 18 (18 by maintainers)

Most upvoted comments

Apologies as well - we’ve had a few different issues pertaining to result_format at the same time, and I think the unexpected_index_list issue might have gotten buried in a thread for another, separate but related, issue. But also, our tests for this are passing, so I’m wondering if this piece might just be a configuration issue, and I’m trying to understand where that is coming from.

talagluck on Nov 24, 2021