dask: Error during .compute() with pandas release candidate 1.1.0rc0 installed

What happened: During testing of Featuretools with pandas release candidate 1.1.0rc0, new failures are happening during testing that have not happened previously. All of the failures happen during a .compute() operation on a Dask dataframe.

For reference I am including a stack trace of the error below. Frankly, I am not sure if this is an issue with pandas, Dask, some interaction between the two or something else entirely, but was hoping to find some help in tracking down the root of the issue.

Full Stack Trace
[gw0] node down: Traceback (most recent call last):
  File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/execnet/gateway_base.py", line 1400, in _save
    dispatch = self._dispatch[tp]
KeyError: <enum 'ExitCode'>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/execnet/gateway_base.py", line 1084, in executetask
    do_exec(co, loc)  # noqa
  File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/xdist/remote.py", line 287, in <module>
    config.hook.pytest_cmdline_main(config=config)
  File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/pluggy/hooks.py", line 286, in __call__
    return self._hookexec(self, self.get_hookimpls(), kwargs)
  File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/pluggy/manager.py", line 93, in _hookexec
    return self._inner_hookexec(hook, methods, kwargs)
  File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/pluggy/manager.py", line 87, in <lambda>
    firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,
  File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/pluggy/callers.py", line 208, in _multicall
    return outcome.get_result()
  File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/pluggy/callers.py", line 80, in get_result
    raise ex[1].with_traceback(ex[2])
  File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/pluggy/callers.py", line 187, in _multicall
    res = hook_impl.function(*args)
  File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/_pytest/main.py", line 228, in pytest_cmdline_main
    return wrap_session(config, _main)
  File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/_pytest/main.py", line 221, in wrap_session
    session=session, exitstatus=session.exitstatus
  File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/pluggy/hooks.py", line 286, in __call__
    return self._hookexec(self, self.get_hookimpls(), kwargs)
  File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/pluggy/manager.py", line 93, in _hookexec
    return self._inner_hookexec(hook, methods, kwargs)
  File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/pluggy/manager.py", line 87, in <lambda>
    firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,
  File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/pluggy/callers.py", line 203, in _multicall
    gen.send(outcome)
  File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/xdist/remote.py", line 45, in pytest_sessionfinish
    self.sendevent("workerfinished", workeroutput=self.config.workeroutput)
  File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/xdist/remote.py", line 30, in sendevent
    self.channel.send((name, kwargs))
  File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/execnet/gateway_base.py", line 729, in send
    self.gateway._send(Message.CHANNEL_DATA, self.id, dumps_internal(item))
  File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/execnet/gateway_base.py", line 1371, in dumps_internal
    return _Serializer().save(obj)
  File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/execnet/gateway_base.py", line 1389, in save
    self._save(obj)
  File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/execnet/gateway_base.py", line 1407, in _save
    dispatch(self, obj)
  File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/execnet/gateway_base.py", line 1494, in save_tuple
    self._save(item)
  File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/execnet/gateway_base.py", line 1407, in _save
    dispatch(self, obj)
  File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/execnet/gateway_base.py", line 1490, in save_dict
    self._write_setitem(key, value)
  File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/execnet/gateway_base.py", line 1484, in _write_setitem
    self._save(value)
  File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/execnet/gateway_base.py", line 1407, in _save
    dispatch(self, obj)
  File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/execnet/gateway_base.py", line 1490, in save_dict
    self._write_setitem(key, value)
  File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/execnet/gateway_base.py", line 1484, in _write_setitem
    self._save(value)
  File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/execnet/gateway_base.py", line 1405, in _save
    raise DumpError("can't serialize {}".format(tp))
execnet.gateway_base.DumpError: can't serialize <enum 'ExitCode'>

Replacing crashed worker gw0
gw2 C / gw1 [1066]     gw2 ok / gw1 [1066]
=================================== FAILURES ===================================
__________ test_make_agg_feat_where_different_identity_feat[dask_es] ___________
[gw0] linux -- Python 3.7.8 /home/circleci/featuretools/venv/bin/python

es = Entityset: ecommerce
  Entities:
    régions [Rows: Delayed('int-d52188a2-f4e9-4241-9c01-8d28ff80daec'), Columns: 2]
 ...régions.id
    sessions.customer_id -> customers.id
    log.session_id -> sessions.id
    log.product_id -> products.id

    def test_make_agg_feat_where_different_identity_feat(es):
        feats = []
        where_cmps = [LessThanScalar, GreaterThanScalar, LessThanEqualToScalar,
                      GreaterThanEqualToScalar, EqualScalar, NotEqualScalar]
        for where_cmp in where_cmps:
            feats.append(ft.Feature(es['log']['id'],
                                    parent_entity=es['sessions'],
                                    where=ft.Feature(es['log']['datetime'], primitive=where_cmp(datetime(2011, 4, 10, 10, 40, 1))),
                                    primitive=Count))
    
        df = ft.calculate_feature_matrix(entityset=es, features=feats, instance_ids=[0, 1, 2, 3])
        if isinstance(df, dd.DataFrame):
>           df = df.compute()

featuretools/tests/computational_backend/test_feature_set_calculator.py:261: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../venv/lib/python3.7/site-packages/dask/base.py:167: in compute
    (result,) = compute(self, traverse=False, **kwargs)
../venv/lib/python3.7/site-packages/dask/base.py:447: in compute
    results = schedule(dsk, keys, **kwargs)
../venv/lib/python3.7/site-packages/dask/threaded.py:84: in get
    **kwargs
../venv/lib/python3.7/site-packages/dask/local.py:486: in get_async
    raise_exception(exc, tb)
../venv/lib/python3.7/site-packages/dask/local.py:316: in reraise
    raise exc
../venv/lib/python3.7/site-packages/dask/local.py:222: in execute_task
    result = _execute_task(task, data)
../venv/lib/python3.7/site-packages/dask/core.py:121: in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
../venv/lib/python3.7/site-packages/dask/core.py:121: in <genexpr>
    return func(*(_execute_task(a, cache) for a in args))
../venv/lib/python3.7/site-packages/dask/core.py:115: in _execute_task
    return [_execute_task(a, cache) for a in arg]
../venv/lib/python3.7/site-packages/dask/core.py:115: in <listcomp>
    return [_execute_task(a, cache) for a in arg]
../venv/lib/python3.7/site-packages/dask/core.py:121: in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
../venv/lib/python3.7/site-packages/dask/optimization.py:1022: in __call__
    return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
../venv/lib/python3.7/site-packages/dask/core.py:151: in get
    result = _execute_task(task, cache)
../venv/lib/python3.7/site-packages/dask/core.py:121: in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
../venv/lib/python3.7/site-packages/dask/core.py:121: in <genexpr>
    return func(*(_execute_task(a, cache) for a in args))
../venv/lib/python3.7/site-packages/dask/core.py:115: in _execute_task
    return [_execute_task(a, cache) for a in arg]
../venv/lib/python3.7/site-packages/dask/core.py:115: in <listcomp>
    return [_execute_task(a, cache) for a in arg]
../venv/lib/python3.7/site-packages/dask/core.py:121: in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
../venv/lib/python3.7/site-packages/pandas/core/frame.py:2875: in __getitem__
    return self._get_item_cache(key)
../venv/lib/python3.7/site-packages/pandas/core/generic.py:3532: in _get_item_cache
    values = self._mgr.iget(loc)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = BlockManager
Items: Index(['datetime', 'session_id', 'id', 'product_id',
       'datetime <= 2011-04-10 10:40:01', 'da..., dtype: datetime64[ns]
IntBlock: slice(1, 3, 1), 2 x 3, dtype: int64
ObjectBlock: slice(3, 4, 1), 1 x 3, dtype: object
i = 6

    def iget(self, i: int) -> "SingleBlockManager":
        """
        Return the data as a SingleBlockManager.
        """
>       block = self.blocks[self.blknos[i]]
E       IndexError: tuple index out of range

../venv/lib/python3.7/site-packages/pandas/core/internals/managers.py:959: IndexError

What you expected to happen: Results computed without error.

Minimal Complete Verifiable Example: The failures are part of a larger test suite and I have not yet been able to create a simple example that produces the same error reliably. Here is a link to one of the tests that result in a failure: Featuretools Test

Anything else we need to know?: The test failures have only been observed when running tests on CircleCI - no failures when testing locally have happened, even when running the tests inside the same docker container being used by CircleCI. Also, these failures do not happen with pandas 1.0.5 installed.

Environment:

  • Dask version: 2.21.0
  • Python versions: 3.6, 3.7 and 3.8
  • Operating Systems: Windows and Linux
  • Install method (conda, pip, source): pip

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 17 (7 by maintainers)

Most upvoted comments

Thanks @thehomebrewnerd. I’m taking a close look now.