pandera: Is there a way to generate informative error on custom checks?

Hello, I’m trying to create a custom check such as Check(lambda g: g.mean() >0.1, error="mean over 0.1") Is there a way adding more info to the error message such as f"mean over 0.1 and the current mean is {g.mean()}"?

Thanks šŸ™

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 15

Most upvoted comments

Action Items

  • refactor groupby behavior to expect a pandas Groupby object
  • add agg kwarg to Check that simply wraps DataFrame.agg and Series.agg depending on the schema context
  • support callback function for the error kwarg:
CheckObj = Union[pd.Series, pd.DataFrame, pd.Groupby]  # series or dataframe can be the output of `agg` operation

def error_callback(check_obj: CheckObj) -> str:
    return f"check failed with values {check_obj}"

If the additional agg argument feels too clunky, we can go the direction of specialized GroupbyCheck and AggCheck subclasses.

I can give a shot at implementing this!

yeah, the implementation for (3) at validation time (Check.__call__) would do something like:

check_output = check_fn(agg_fn(series))

So you can do something like

def agg_fn(series):
    return series.mean(), series.std()

def check_fn(check_obj):
    # unpack the values
    mean, std = check_obj
    ...

Or

def agg_fn(series):
    return series.agg(["mean", "std"])

def check_fn(agg_series):
    agg_series["mean"]
    agg_series["std"]
    ...

Of course specifying an agg_fn would raise an exception if element_wise == True or groupby is specified.