dask: Error during .compute() with pandas release candidate 1.1.0rc0 installed
What happened:
During testing of Featuretools with pandas release candidate 1.1.0rc0, new failures are happening during testing that have not happened previously. All of the failures happen during a .compute() operation on a Dask dataframe.
For reference I am including a stack trace of the error below. Frankly, I am not sure if this is an issue with pandas, Dask, some interaction between the two or something else entirely, but was hoping to find some help in tracking down the root of the issue.
Full Stack Trace
[gw0] node down: Traceback (most recent call last):
File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/execnet/gateway_base.py", line 1400, in _save
dispatch = self._dispatch[tp]
KeyError: <enum 'ExitCode'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/execnet/gateway_base.py", line 1084, in executetask
do_exec(co, loc) # noqa
File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/xdist/remote.py", line 287, in <module>
config.hook.pytest_cmdline_main(config=config)
File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/pluggy/hooks.py", line 286, in __call__
return self._hookexec(self, self.get_hookimpls(), kwargs)
File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/pluggy/manager.py", line 93, in _hookexec
return self._inner_hookexec(hook, methods, kwargs)
File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/pluggy/manager.py", line 87, in <lambda>
firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,
File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/pluggy/callers.py", line 208, in _multicall
return outcome.get_result()
File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/pluggy/callers.py", line 80, in get_result
raise ex[1].with_traceback(ex[2])
File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/pluggy/callers.py", line 187, in _multicall
res = hook_impl.function(*args)
File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/_pytest/main.py", line 228, in pytest_cmdline_main
return wrap_session(config, _main)
File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/_pytest/main.py", line 221, in wrap_session
session=session, exitstatus=session.exitstatus
File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/pluggy/hooks.py", line 286, in __call__
return self._hookexec(self, self.get_hookimpls(), kwargs)
File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/pluggy/manager.py", line 93, in _hookexec
return self._inner_hookexec(hook, methods, kwargs)
File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/pluggy/manager.py", line 87, in <lambda>
firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,
File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/pluggy/callers.py", line 203, in _multicall
gen.send(outcome)
File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/xdist/remote.py", line 45, in pytest_sessionfinish
self.sendevent("workerfinished", workeroutput=self.config.workeroutput)
File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/xdist/remote.py", line 30, in sendevent
self.channel.send((name, kwargs))
File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/execnet/gateway_base.py", line 729, in send
self.gateway._send(Message.CHANNEL_DATA, self.id, dumps_internal(item))
File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/execnet/gateway_base.py", line 1371, in dumps_internal
return _Serializer().save(obj)
File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/execnet/gateway_base.py", line 1389, in save
self._save(obj)
File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/execnet/gateway_base.py", line 1407, in _save
dispatch(self, obj)
File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/execnet/gateway_base.py", line 1494, in save_tuple
self._save(item)
File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/execnet/gateway_base.py", line 1407, in _save
dispatch(self, obj)
File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/execnet/gateway_base.py", line 1490, in save_dict
self._write_setitem(key, value)
File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/execnet/gateway_base.py", line 1484, in _write_setitem
self._save(value)
File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/execnet/gateway_base.py", line 1407, in _save
dispatch(self, obj)
File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/execnet/gateway_base.py", line 1490, in save_dict
self._write_setitem(key, value)
File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/execnet/gateway_base.py", line 1484, in _write_setitem
self._save(value)
File "/home/circleci/featuretools/venv/lib/python3.7/site-packages/execnet/gateway_base.py", line 1405, in _save
raise DumpError("can't serialize {}".format(tp))
execnet.gateway_base.DumpError: can't serialize <enum 'ExitCode'>
Replacing crashed worker gw0
gw2 C / gw1 [1066] gw2 ok / gw1 [1066]
=================================== FAILURES ===================================
__________ test_make_agg_feat_where_different_identity_feat[dask_es] ___________
[gw0] linux -- Python 3.7.8 /home/circleci/featuretools/venv/bin/python
es = Entityset: ecommerce
Entities:
régions [Rows: Delayed('int-d52188a2-f4e9-4241-9c01-8d28ff80daec'), Columns: 2]
...régions.id
sessions.customer_id -> customers.id
log.session_id -> sessions.id
log.product_id -> products.id
def test_make_agg_feat_where_different_identity_feat(es):
feats = []
where_cmps = [LessThanScalar, GreaterThanScalar, LessThanEqualToScalar,
GreaterThanEqualToScalar, EqualScalar, NotEqualScalar]
for where_cmp in where_cmps:
feats.append(ft.Feature(es['log']['id'],
parent_entity=es['sessions'],
where=ft.Feature(es['log']['datetime'], primitive=where_cmp(datetime(2011, 4, 10, 10, 40, 1))),
primitive=Count))
df = ft.calculate_feature_matrix(entityset=es, features=feats, instance_ids=[0, 1, 2, 3])
if isinstance(df, dd.DataFrame):
> df = df.compute()
featuretools/tests/computational_backend/test_feature_set_calculator.py:261:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../venv/lib/python3.7/site-packages/dask/base.py:167: in compute
(result,) = compute(self, traverse=False, **kwargs)
../venv/lib/python3.7/site-packages/dask/base.py:447: in compute
results = schedule(dsk, keys, **kwargs)
../venv/lib/python3.7/site-packages/dask/threaded.py:84: in get
**kwargs
../venv/lib/python3.7/site-packages/dask/local.py:486: in get_async
raise_exception(exc, tb)
../venv/lib/python3.7/site-packages/dask/local.py:316: in reraise
raise exc
../venv/lib/python3.7/site-packages/dask/local.py:222: in execute_task
result = _execute_task(task, data)
../venv/lib/python3.7/site-packages/dask/core.py:121: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
../venv/lib/python3.7/site-packages/dask/core.py:121: in <genexpr>
return func(*(_execute_task(a, cache) for a in args))
../venv/lib/python3.7/site-packages/dask/core.py:115: in _execute_task
return [_execute_task(a, cache) for a in arg]
../venv/lib/python3.7/site-packages/dask/core.py:115: in <listcomp>
return [_execute_task(a, cache) for a in arg]
../venv/lib/python3.7/site-packages/dask/core.py:121: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
../venv/lib/python3.7/site-packages/dask/optimization.py:1022: in __call__
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
../venv/lib/python3.7/site-packages/dask/core.py:151: in get
result = _execute_task(task, cache)
../venv/lib/python3.7/site-packages/dask/core.py:121: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
../venv/lib/python3.7/site-packages/dask/core.py:121: in <genexpr>
return func(*(_execute_task(a, cache) for a in args))
../venv/lib/python3.7/site-packages/dask/core.py:115: in _execute_task
return [_execute_task(a, cache) for a in arg]
../venv/lib/python3.7/site-packages/dask/core.py:115: in <listcomp>
return [_execute_task(a, cache) for a in arg]
../venv/lib/python3.7/site-packages/dask/core.py:121: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
../venv/lib/python3.7/site-packages/pandas/core/frame.py:2875: in __getitem__
return self._get_item_cache(key)
../venv/lib/python3.7/site-packages/pandas/core/generic.py:3532: in _get_item_cache
values = self._mgr.iget(loc)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = BlockManager
Items: Index(['datetime', 'session_id', 'id', 'product_id',
'datetime <= 2011-04-10 10:40:01', 'da..., dtype: datetime64[ns]
IntBlock: slice(1, 3, 1), 2 x 3, dtype: int64
ObjectBlock: slice(3, 4, 1), 1 x 3, dtype: object
i = 6
def iget(self, i: int) -> "SingleBlockManager":
"""
Return the data as a SingleBlockManager.
"""
> block = self.blocks[self.blknos[i]]
E IndexError: tuple index out of range
../venv/lib/python3.7/site-packages/pandas/core/internals/managers.py:959: IndexError
What you expected to happen: Results computed without error.
Minimal Complete Verifiable Example: The failures are part of a larger test suite and I have not yet been able to create a simple example that produces the same error reliably. Here is a link to one of the tests that result in a failure: Featuretools Test
Anything else we need to know?: The test failures have only been observed when running tests on CircleCI - no failures when testing locally have happened, even when running the tests inside the same docker container being used by CircleCI. Also, these failures do not happen with pandas 1.0.5 installed.
Environment:
- Dask version: 2.21.0
- Python versions: 3.6, 3.7 and 3.8
- Operating Systems: Windows and Linux
- Install method (conda, pip, source): pip
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 17 (7 by maintainers)
Thanks @thehomebrewnerd. I’m taking a close look now.