dask-image: dask-image imread v0.5.0 not working with dask distributed Client & napari
@kpasko made a bug report https://github.com/napari/napari/issues/2304 but it turns out this is a problem caused by dask-image. I’ve copied the contents of the report into this issue (sadly I’m unable to transfer issues between different organisations).
What happened:
TypeError: can not serialize ‘function’ object
In distributed/client.py line 2635 futures = self._graph_to_futures
What you expected to happen:
successful image viewing
Minimal Complete Verifiable Example:
(Edited)
from dask.distributed import Client
import napari
from dask_image.imread import imread
client = Client()
data = imread('./*.tif')
napari.view_image(data)
Anything else we need to know?: Works fine when not initializing client, i.e.
from dask.distributed import Client
import napari
from dask_image.imread import imread
data = imread('./*.tif')
napari.view_image(data)
works as expected
Environment:
-
Napari/Dask version: dask 2021.2.0 pyhd8ed1ab_0 conda-forge dask-core 2021.2.0 pyhd8ed1ab_0 conda-forge dask-image 0.5.0 pyh44b312d_0 conda-forge distributed 2021.2.0 py39h6e9494a_0 conda-forge napari 0.4.5 pyhd8ed1ab_0 conda-forge napari-console 0.0.3 pyhd8ed1ab_0 conda-forge napari-plugin-engine 0.1.9 py39h6e9494a_1 conda-forge napari-svg 0.1.4 py_0 conda-forge
-
Python version: python 3.9.2 h2502468_0_cpython conda-forge
-
Operating System: OS X 11.2.1
-
Install method (conda, pip, source): conda
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 61 (31 by maintainers)
@jakirkham @jrbourbeau I can confirm that https://github.com/dask/dask/pull/7353 fixes all problems discussed in this issue and in particular the original
dask-image-napariworkflow 🎉Okay I narrowed it down to (without
napari):Same error:
I tested the changes in https://github.com/dask/dask/pull/7353 against the example snippet posted in https://github.com/dask/distributed/issues/4574 and the issue was resolved (a corresponding test was also added). It’d be great if someone could double check with the original napari workflow
Yep that was the missing piece 😄
Here’s another repro ( https://github.com/dask/distributed/issues/4574#issuecomment-794773534 ). Fusion on things work. Fusion off fails!
Wow, this is the kind of sleuthing that one loves to wake up to! 😂 👏 👏 Awesome work @m-albert!
I should have remembered and mentioned one more probably-relevant detail, which is that napari turns off dask task fusion when slicing dask arrays. That might account for why it was so hard to reproduce without napari. Sorry, I only just remembered that!
Have confirmed this also. As a result, we will:
dask-image==0.4.0if they prefer)imreadspeed over to https://github.com/dask/dask-image/issues/181 and reopen that issueReally nice work here @m-albert
Does PR ( https://github.com/dask/dask/pull/7353 ) solve the issue? We just merged it fwiw
Nice, so something goes wrong when trying to pack unfused graphs. For some reason, choosing a mapping function
def func(x, block_info=None)instead ofdef func(block_info=None)circumvents this issue in the latest commit ofdask_image. Looking forward to see whether https://github.com/dask/dask/pull/7353 will solve the issue!Well done! 😄 👏
Can you please file this as a new issue on Distributed ( https://github.com/dask/distributed/issues )?
In this failing line
dumps_msgpack({"layers": layers})it seems the entirelayersobject is packable apart fromlayers[1]['state']['indices']which containsfuncfrom the example.Btw, I found another hook of some type. This reproduces the problem without
dask-image, but usingnapari.Now, replacing
by
makes the error disappear.
As if
daskcouldn’t deal with the provided function properly in case there’s theblock_infoargument without the dask array being passed.Couldn’t reproduce the issue with a different/simpler example than the one reported here yet…
Yes it does.
@jni
The breaking commit is 17ec4c25d6b26300bbcdb6bfe781595bd2586a96 which implements
map_blocksforimread:After the latest commit 91fe6e1cb262058296c182ae9f9bd9b255aaebd9, the problem doesn’t occur anymore. This commit changes the way
map_blocksis implemented inimread:These code snippets include all relevant changed lines. Interestingly, the problem seems to be related to the differences in the use of
map_blocks. Namely, the problematic commit callsdask.array.map_blockswhile the fixing commit usesx.map_blocks, wherexis a dask array. I’ve tested that adding a dummy array in the problematic commit solves the problem:However, in my understanding using
map_blocksasdask.array.map_blocksshould be intended behaviour, as an example for this is also included in the docstring ofmap_blocks. Also, usingdask.array.map_blocksin a minimal example in combination with a distributed client and the napari viewer doesn’t seem to break anything:Something else to consider is the observation by @GenevieveBuckley that calling
data.compute()circumvents the problem. So my guess at this point would be that there’s adaskordistributedproblem as @jakirkham suggested, potentially related to new graph functionality, potentially occurring whennaparifirst slices and then computes, as @jni suggested.One could create a hybrid solution. Namely use PIMS just to find
shapeanddtypeinfo and then useskimage.io.imreador similar to read in each chunk on workerstotally agree. let’s change it!
Oh yeah, I’d almost forgotten about that. It turns out it’s even more interesting than I’d thought.
ipython --gui=qtError message: