dask: Dask array with non-highlevelgraph leads to KeyError

I have some older code that assigns a plain dict directly into array.dask, as a workaround for an old dask bug. If that’s no longer supported then feel free to close this bug (although it would be nice to add one or two sanity checks). Since dask 2.8.0 it raises a KeyError inside the optimizer.

Here’s a minimum reproducing example:

#!/usr/bin/env python3
import dask
import dask.array as da
import numpy as np

a = da.from_array(np.ones((4, 4), np.float64), chunks=(2, 4))
a.dask = dict(a.dask)
b = da.from_array(np.zeros((4, 4), np.float64), chunks=(2, 4))

x = a + b
da.compute(x)
Traceback (most recent call last):
  File "./bw.py", line 11, in <module>
    da.compute(x)
  File "/home/bmerry/work/sdp/env3/lib/python3.6/site-packages/dask/base.py", line 433, in compute
    dsk = collections_to_dsk(collections, optimize_graph, **kwargs)
  File "/home/bmerry/work/sdp/env3/lib/python3.6/site-packages/dask/base.py", line 219, in collections_to_dsk
    [opt(dsk, keys, **kwargs) for opt, (dsk, keys) in groups.items()],
  File "/home/bmerry/work/sdp/env3/lib/python3.6/site-packages/dask/base.py", line 219, in <listcomp>
    [opt(dsk, keys, **kwargs) for opt, (dsk, keys) in groups.items()],
  File "/home/bmerry/work/sdp/env3/lib/python3.6/site-packages/dask/array/optimization.py", line 43, in optimize
    dsk = fuse_roots(dsk, keys=keys)
  File "/home/bmerry/work/sdp/env3/lib/python3.6/site-packages/dask/blockwise.py", line 822, in fuse_roots
    and not any(dependencies[dep] for dep in deps)  # no need to fuse if 0 or 1
  File "/home/bmerry/work/sdp/env3/lib/python3.6/site-packages/dask/blockwise.py", line 822, in <genexpr>
    and not any(dependencies[dep] for dep in deps)  # no need to fuse if 0 or 1
KeyError: 'array-d6b7f7b33e2b14cd720de87a517147e8'

(run against dask 2.10.0, Python 3.6)

From what I can tell, the trouble starts when HighLevelGraph.from_collections is given a dependency that doesn’t use a HighLevelGraph, and it names the resulting layer just using id: here. However, the Blockwise for the addition keys its indices based on the array names. Then optimize_blockwise uses those indices keys to construct the new dependencies, here. That means the new dependencies don’t line up with the layers. The final explosion happens in fuse_roots, but it could presumably happen anywhere that expected the dependencies to be consistent.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 22 (22 by maintainers)

Commits related to this issue

Most upvoted comments

In principle it shouldn’t be terribly hard to support non-HLGs.

I think that the tricky thing here is having users mutate graphs of any sort. I would rather users create new Array objects using the constructor.

Over time I would also prefer that we remove the .dask attribute, preferring that people use the .__dask_graph__() method instead laid out in the protocol.

So rather than do type checking in a .dask property, how about adding a deprecation warning?

This is currently laid out here, but I agree that there is nothing in the docstring.

https://docs.dask.org/en/latest/array-design.html

Sure, I can have a go at it next week.