ibis: bug: maximum recursion depth with join operation

What happened?

We’re seeing a RecursionError: maximum recursion depth exceeded while calling a Python object when running a JOIN: source_difference = source.join(differences, join_keys, how="outer") Both ‘source’ and ‘differences’ are pandas.Table()s with many columns (~120).

We don’t hit this error with smaller, less wide tables. I’ve provided a abridged version of the stack trace below - it does look like there is a cyclical portion of the code when testing if left and right tables have a common parent expr here.

Trying to understand if this is a Python limitation due to how wide the table is, or an Ibis bug. Appreciate the help!

What version of ibis are you using?

5.1.0

What backend(s) are you using, if any?

Pandas

Relevant log output

Traceback (most recent call last):
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/ibis/common/grounds.py", line 210, in __cached_equals__
    result = self.__cache__[key]
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/ibis/common/caching.py", line 46, in __getitem__
    value, _ = self._data[identifiers]
KeyError: (139856786236240, 139856788868000)

During handling of the above exception, another exception occurred:
...
During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/ibis/common/grounds.py", line 210, in __cached_equals__
    result = self.__cache__[key]
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/ibis/common/caching.py", line 46, in __getitem__
    value, _ = self._data[identifiers]
KeyError: (139856787188608, 139856789577136)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/x/home/user/new_fl/dvt4/bin/data-validation", line 11, in <module>
    load_entry_point('google-pso-data-validator==4.1.0', 'console_scripts', 'data-validation')()
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/data_validation/__main__.py", line 581, in main
    run_validation_configs(args)
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/data_validation/__main__.py", line 551, in run_validation_configs
    config_runner(args)
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/data_validation/__main__.py", line 304, in config_runner
    run_validations(args, config_managers)
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/data_validation/__main__.py", line 478, in run_validations
    run_validation(
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/data_validation/__main__.py", line 461, in run_validation
    validator.execute()
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/data_validation/data_validation.py", line 96, in execute
    result_df = self._execute_validation(
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/data_validation/data_validation.py", line 314, in _execute_validation
    result_df = combiner.generate_report(
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/data_validation/combiner.py", line 83, in generate_report
    joined = _join_pivots(source_pivot, target_pivot, differences_pivot, join_on_fields)
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/data_validation/combiner.py", line 317, in _join_pivots
    source_difference = source.join(differences, join_keys, how="outer")[
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/ibis/expr/types/relations.py", line 2497, in join
    expr = klass(left, right, predicates).to_expr()
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/ibis/common/grounds.py", line 25, in __call__
    return cls.__create__(*args, **kwargs)
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/ibis/common/grounds.py", line 99, in __create__
    return super().__create__(**kwargs)
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/ibis/common/grounds.py", line 33, in __create__
    return type.__call__(cls, *args, **kwargs)
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/ibis/expr/operations/relations.py", line 178, in __init__
    if left.equals(right):
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/ibis/expr/operations/core.py", line 24, in equals
    return self.__cached_equals__(other)
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/ibis/common/grounds.py", line 212, in __cached_equals__
    result = self.__equals__(other)
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/ibis/common/grounds.py", line 239, in __equals__
    return self.__args__ == other.__args__
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/ibis/common/grounds.py", line 187, in __eq__
    return self.__cached_equals__(other)
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/ibis/common/grounds.py", line 212, in __cached_equals__
    result = self.__equals__(other)
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/ibis/common/grounds.py", line 239, in __equals__
    return self.__args__ == other.__args__
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/ibis/common/grounds.py", line 187, in __eq__
    return self.__cached_equals__(other)
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/ibis/common/grounds.py", line 212, in __cached_equals__
    result = self.__equals__(other)
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/ibis/common/grounds.py", line 239, in __equals__
    return self.__args__ == other.__args__
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/ibis/common/grounds.py", line 187, in __eq__
    return self.__cached_equals__(other)
File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/ibis/common/grounds.py", line 210, in __cached_equals__
    result = self.__cache__[key]
  File "/x/home/user/new_fl/dvt4/lib/python3.8/site-packages/ibis/common/caching.py", line 45, in __getitem__
    identifiers = tuple(id(item) for item in key)
RecursionError: maximum recursion depth exceeded while calling a Python object

Code of Conduct

  • I agree to follow this project’s Code of Conduct

About this issue

  • Original URL
  • State: closed
  • Created 10 months ago
  • Comments: 23 (16 by maintainers)

Most upvoted comments

@nehanene15 I think for your case it’s a viable workaround, but I don’t think it’s best practice 😄, I think it’s a bug in ibis that we will try to address.

Ok, I can reproduce it with this

def test_large_join():
    source = pd.read_csv(
        "https://github.com/ibis-project/ibis/files/12580336/source_pivot.csv",
        index_col=0,
    )
    diffs = pd.read_csv(
        "https://github.com/ibis-project/ibis/files/12580340/differences_pivot.csv",
        index_col=0,
    )
    con = ibis.pandas.connect({"source": source, "diffs": diffs})
    n = 200
    source = ibis.union(*[con.tables.source for _ in range(n)])
    diffs = ibis.union(*[con.tables.diffs for _ in range(n)])

    join_keys = set(source.columns) & set(diffs.columns)
    join = source.join(diffs, join_keys, how="outer").select(
        [source[key] for key in join_keys]
        + [
            source["validation_type"],
            source["aggregation_type"],
            source["table_name"],
            source["column_name"],
            source["primary_keys"],
            source["num_random_rows"],
            source["agg_value"],
            diffs["difference"],
            diffs["pct_difference"],
            diffs["pct_threshold"],
            diffs["validation_status"],
        ],
    )
    df = join.execute()
    assert not df.empty