dask: Dask array with non-highlevelgraph leads to KeyError
I have some older code that assigns a plain dict directly into array.dask
, as a workaround for an old dask bug. If that’s no longer supported then feel free to close this bug (although it would be nice to add one or two sanity checks). Since dask 2.8.0 it raises a KeyError inside the optimizer.
Here’s a minimum reproducing example:
#!/usr/bin/env python3
import dask
import dask.array as da
import numpy as np
a = da.from_array(np.ones((4, 4), np.float64), chunks=(2, 4))
a.dask = dict(a.dask)
b = da.from_array(np.zeros((4, 4), np.float64), chunks=(2, 4))
x = a + b
da.compute(x)
Traceback (most recent call last):
File "./bw.py", line 11, in <module>
da.compute(x)
File "/home/bmerry/work/sdp/env3/lib/python3.6/site-packages/dask/base.py", line 433, in compute
dsk = collections_to_dsk(collections, optimize_graph, **kwargs)
File "/home/bmerry/work/sdp/env3/lib/python3.6/site-packages/dask/base.py", line 219, in collections_to_dsk
[opt(dsk, keys, **kwargs) for opt, (dsk, keys) in groups.items()],
File "/home/bmerry/work/sdp/env3/lib/python3.6/site-packages/dask/base.py", line 219, in <listcomp>
[opt(dsk, keys, **kwargs) for opt, (dsk, keys) in groups.items()],
File "/home/bmerry/work/sdp/env3/lib/python3.6/site-packages/dask/array/optimization.py", line 43, in optimize
dsk = fuse_roots(dsk, keys=keys)
File "/home/bmerry/work/sdp/env3/lib/python3.6/site-packages/dask/blockwise.py", line 822, in fuse_roots
and not any(dependencies[dep] for dep in deps) # no need to fuse if 0 or 1
File "/home/bmerry/work/sdp/env3/lib/python3.6/site-packages/dask/blockwise.py", line 822, in <genexpr>
and not any(dependencies[dep] for dep in deps) # no need to fuse if 0 or 1
KeyError: 'array-d6b7f7b33e2b14cd720de87a517147e8'
(run against dask 2.10.0, Python 3.6)
From what I can tell, the trouble starts when HighLevelGraph.from_collections is given a dependency that doesn’t use a HighLevelGraph, and it names the resulting layer just using id
: here. However, the Blockwise for the addition keys its indices
based on the array names. Then optimize_blockwise
uses those indices
keys to construct the new dependencies, here. That means the new dependencies don’t line up with the layers. The final explosion happens in fuse_roots
, but it could presumably happen anywhere that expected the dependencies to be consistent.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 22 (22 by maintainers)
Commits related to this issue
- Make cal work with latest version of dask There are two fixes: 1. A straight-up bug where a dependency set was made a string instead of a set of strings. I'm not sure how this ever worked! 2. A work... — committed to ska-sa/katsdpcal by bmerry 4 years ago
- Make cal work with latest version of dask There are two fixes: 1. A straight-up bug where a dependency set was made a string instead of a set of strings. I'm not sure how this ever worked! 2. A work... — committed to ska-sa/katsdpcal by bmerry 4 years ago
- Better layer name synthesis for non-HLGs when a dependent collection passed to HighLevelGraph.from_collections has a __dask_graph__ that is not a HighLevelGraph, the layer synthesised for it was bein... — committed to bmerry/dask by bmerry 4 years ago
- Better layer name synthesis for non-HLGs When a dependent collection passed to HighLevelGraph.from_collections has a __dask_graph__ that is not a HighLevelGraph, the layer synthesised for it was bein... — committed to bmerry/dask by bmerry 4 years ago
- Update array internal design docs for HighLevelGraph era - Use `.__dask_graph__()` instead of `.dask` (discussion in #5850 indicated that the former is preferred). - Update inspection of an existin... — committed to bmerry/dask by bmerry 4 years ago
- Update array internal design docs for HighLevelGraph era (#5889) - Use `.__dask_graph__()` instead of `.dask` (discussion in #5850 indicated that the former is preferred). - Update inspection of ... — committed to dask/dask by bmerry 4 years ago
- Improve layer name synthesis for non-HLGs (#5888) When a dependent collection passed to HighLevelGraph.from_collections has a __dask_graph__ that is not a HighLevelGraph, the layer synthesised for ... — committed to dask/dask by bmerry 4 years ago
- Make cal work with latest version of dask There are two fixes: 1. A straight-up bug where a dependency set was made a string instead of a set of strings. I'm not sure how this ever worked! 2. A work... — committed to ska-sa/katsdpcalproc by bmerry 4 years ago
In principle it shouldn’t be terribly hard to support non-HLGs.
I think that the tricky thing here is having users mutate graphs of any sort. I would rather users create new
Array
objects using the constructor.Over time I would also prefer that we remove the
.dask
attribute, preferring that people use the.__dask_graph__()
method instead laid out in the protocol.So rather than do type checking in a
.dask
property, how about adding a deprecation warning?This is currently laid out here, but I agree that there is nothing in the docstring.
https://docs.dask.org/en/latest/array-design.html
Sure, I can have a go at it next week.