dbt-expectations: --store-failures with query context

Hi!

I have question / suggestion regarding behaviour of --store-failures when used together with dbt-expectations.

We are looking to use dbt-expectations, but we are lacking some traceability features.

Let’s look at this meaningless column level assertion / test (from https://github.com/dbt-labs/jaffle_shop):

  - name: orders
    columns:
      - name: order_id
        tests:
          - unique
          - not_null
          - dbt_expectations.expect_column_values_to_be_between:
                min_value: 10
                max_value: 40

When dbt is run with --store-failures, single boolean column will be created in the persistent storage. To make this data source of truth and actionable, we want to add context or details to the results of the query. For example:

  - name: orders
    columns:
      - name: order_id
        tests:
          - unique
          - not_null
          - dbt_expectations.expect_column_values_to_be_between:
                query_context: 'order_id, customer_id, order_date, status'
                min_value: 10
                max_value: 40

This can be archived with simple changes to the macros. You can see example here:

https://github.com/calogica/dbt-expectations/compare/main...badal-io:dbt-expectations:add_failure_context

This way, result table can have more context and details (row-level, in some cases), which might be useful not just for us.

Would love to get some feedback feasibility of this. We are happy to create PR from our side.

This is possible to implement without breaking any of the existing tests for current users.

About this issue

Original URL
State: open
Created 2 years ago
Reactions: 9
Comments: 18 (5 by maintainers)

Most upvoted comments

Hi guys, Just wondering if these changes were going forward. It would be really useful to have this 😃

reinososamanta on Jan 19, 2023

Hi @artemboiko1 - sorry, haven’t had a ton of time to look into this. However, one thought I had was that rather than adding a query_context parameter to every test, I’d probably prefer we follow something similar to the dbt-core model and simply

emit all columns from the model (select *) for non-grouped tests, or
emit all grouped columns from model when the test is on aggregated data

This would still require going through all tests to make sure this works as intended. Also, I wouldn’t know how to test this in the integration_tests suite. Any ideas?

clausherther on Aug 17, 2022

Wow I have just spent the last two days trying to figure out how to do just this!

I ended up overriding the dbt_expectations tests with slightly modified versions that accepted a contextual_column_list param. I then added a loop in the set expression to write this columns out with a max aggregation.

{% test expect_multicolumn_sum_to_equal_with_context(
    model,
    column_list,
    sum_total,
    group_by=None,
    row_condition=None,
    contextual_column_list=None
    ) %}

{% set expression %}

{% for column in contextual_column_list %}
max({{ column }}) as {{ column }},
{% endfor %}

{% for column in column_list %}
sum({{ column }}){% if not loop.last %} + {% endif %}
{% endfor %} = {{ sum_total }}
{% endset %}

{{ dbt_expectations.expression_is_true(model,
    expression=expression,
    group_by_columns=group_by,
    row_condition=row_condition
) }}

{% endtest %}

This allows me trace much more about test runs.

I’d like to get the start time, end time and invocation id of the run stored so that I could see how tests have behaved over multiple runs.

I’d also like to change the audit table naming convention so I don’t have to check all audit tables. Ideally the logs would all go in one table with the test name as column.

I’ll keep an eye on this ticket and may be able to help if needed.

James-DBA-Anderson on Aug 25, 2022

That order of pull requests works for me. Although 3. should also be followed by a check of the integration_tests, right? In any case, I would want to release all of these as version 0.60 since it’s a potentially breaking change.

clausherther on Aug 17, 2022

@clausherther

dbt-core schema assertions already select * as the context for the failure, example:

select *
from `ld-snowplow`.`jaffle_shop`.`stg_orders`
where order_id is null

for this model

models:
  - name: stg_orders
    columns:
      - name: order_id
        tests:
          - unique
          - not_null

so the issue is dbt-expectations-specific

artemboiko1 on Aug 12, 2022