great_expectations: Data docs does not contain the results of my expectations
Describe the bug This is probably not a bug and is user error but I didn’t see a suitable template… I am trying to run expectations via code (not the CLI) as a part of my ETL pipeline in order to validate data before it goes to production. I want to save the expectation results json and upload it to S3 and setup a S3-hosted data docs to pull from those results & the expectation suite.
To Reproduce Steps to reproduce the behavior:
- Run the below attached code
- Note that when data docs opens, it only contains the expectations, and not the results of those expectations.
Expected behavior Data docs contains the results of my expectations.
Environment (please complete the following information):
- Operating System: Linux & MacOS
- Great Expectations Version: [e.g. 0.14.1]
Additional context If there is a better way to do this that say better leverages existing great_expectations features, do please point me in that direction. Notably, I couldn’t make the CLI configuration of my great_expectations.yml work for me, as I need this to run dynamically in a pipeline, uploading to different locations depending on client.
import great_expectations as ge
from great_expectations.data_context.types.base import DataContextConfig, DatasourceConfig, FilesystemStoreBackendDefaults
from great_expectations.data_context import BaseDataContext
import numpy as np
import pandas as pd
import json
import os
from datetime import datetime
from great_expectations.data_context.types.resource_identifiers import (ExpectationSuiteIdentifier,
ValidationResultIdentifier,
)
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
abs_path = os.getcwd() + '/great_expectations'
data_context_config = DataContextConfig(
datasources={
"my_pandas_datasource": DatasourceConfig(
class_name="PandasDatasource",
)
},
store_backend_defaults=FilesystemStoreBackendDefaults(root_directory=abs_path),
)
context = BaseDataContext(project_config=data_context_config)
domain_name = 'test'
suite = context.create_expectation_suite(domain_name, overwrite_existing=True)
batch_kwargs = {
"datasource": 'my_pandas_datasource',
"dataset": df,
"data_asset_name": domain_name,
}
batch = context.get_batch(batch_kwargs, "test")
print(batch.head())
batch.expect_table_row_count_to_be_between(max_value=250, min_value=10)
batch.expect_table_column_count_to_equal(value=4)
batch.expect_table_columns_to_match_ordered_list(
column_list=[
"A",
"B",
"C",
"D",
]
)
batch.expect_column_values_to_not_be_null(column="A",
result_format='COMPLETE')
batch.expect_column_values_to_be_null(column="A",
result_format='COMPLETE')
batch.expect_column_values_to_be_in_set(
column="A",
value_set=["A", "B", "C", "D", "E", "F"],
result_format='COMPLETE'
)
results = batch.validate()
# This step is optional, but useful - evaluate the Expectations against the current batch of data
run_id = {
"run_name": domain_name,
"run_time": datetime.now()
}
results = batch.validate(expectation_suite=None,
run_id=None,
data_context=context,
evaluation_parameters=None,
catch_exceptions=True,
only_return_failures=False,
run_name=domain_name,
run_time=datetime.now(),)
# save the Expectation Suite (by default to a JSON file in great_expectations/expectations folder
# batch.save_expectation_suite(suite, domain_name, discard_failed_expectations=False)
batch.save_expectation_suite(discard_failed_expectations=False)
# Neither details nor meta (I inferred this as expected) seem to contain an expectation_suite_identifier
# expectation_suite_identifier = list(results["details"].keys())[0]
# expectation_suite_identifier = list(results["meta"].keys())[0]
# print('expectation_suite_identifier')
# print(expectation_suite_identifier)
validation_result_identifier = ValidationResultIdentifier(
expectation_suite_identifier=domain_name,
# expectation_suite_identifier=expectation_suite_identifier,
batch_identifier=batch.batch_kwargs.to_id(),
run_id=run_id
)
# This doesn't work
# context.build_data_docs()
# Neither does this
# context.build_data_docs(domain_name, results)
# context.open_data_docs(domain_name)
# Neither does this
# context.build_data_docs(domain_name, suite_identifier)
# context.open_data_docs(suite_identifier)
# context.open_data_docs(validation_result_identifier)
# Neither does this
suite_identifier = ExpectationSuiteIdentifier(expectation_suite_name=domain_name)
context.build_data_docs(domain_name, suite_identifier)
context.open_data_docs()
with open('validation_results.json', 'w') as f:
f.write(str(results))
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 18 (18 by maintainers)
Apologies as well - we’ve had a few different issues pertaining to
result_formatat the same time, and I think theunexpected_index_listissue might have gotten buried in a thread for another, separate but related, issue. But also, our tests for this are passing, so I’m wondering if this piece might just be a configuration issue, and I’m trying to understand where that is coming from.