distributed: CuPy (De)serialization error

I’m encountering the following exception when trying to perform a custom tree-reduce on a set of large cupy objects. I don’t get this exception with smaller objects, so I have not been able to reproduce it with the normal pytests. The n_features in the HashingVectorizer that is used as input to the Naive Bayes pytest, however, can be modified to 8M in order to reproduce this exception.

Traceback (most recent call last):
  File "/raid/cnolet/miniconda3/envs/cuml_dev_013/lib/python3.7/site-packages/distributed/protocol/core.py", line 124, in loads
    value = _deserialize(head, fs, deserializers=deserializers)
  File "/raid/cnolet/miniconda3/envs/cuml_dev_013/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 255, in deserialize
    deserializers=deserializers,
  File "/raid/cnolet/miniconda3/envs/cuml_dev_013/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 268, in deserialize
    return loads(header, frames)
  File "/raid/cnolet/miniconda3/envs/cuml_dev_013/lib/python3.7/site-packages/distributed/protocol/cuda.py", line 28, in cuda_loads
    return loads(header, frames)
  File "/raid/cnolet/miniconda3/envs/cuml_dev_013/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 608, in deserialize
    v = deserialize(h, f)
  File "/raid/cnolet/miniconda3/envs/cuml_dev_013/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 268, in deserialize
    return loads(header, frames)
  File "/raid/cnolet/miniconda3/envs/cuml_dev_013/lib/python3.7/site-packages/distributed/protocol/cuda.py", line 28, in cuda_loads
    return loads(header, frames)
  File "/raid/cnolet/miniconda3/envs/cuml_dev_013/lib/python3.7/site-packages/distributed/protocol/cupy.py", line 63, in cuda_deserialize_cupy_ndarray
    frame = PatchedCudaArrayInterface(frame)
  File "/raid/cnolet/miniconda3/envs/cuml_dev_013/lib/python3.7/site-packages/distributed/protocol/cupy.py", line 26, in __init__
    self.__cuda_array_interface__ = ary.__cuda_array_interface__
AttributeError: 'bytes' object has no attribute '__cuda_array_interface__'
distributed.utils - ERROR - 'bytes' object has no attribute '__cuda_array_interface__'
Traceback (most recent call last):
  File "/raid/cnolet/miniconda3/envs/cuml_dev_013/lib/python3.7/site-packages/distributed/utils.py", line 665, in log_errors
    yield
  File "/raid/cnolet/miniconda3/envs/cuml_dev_013/lib/python3.7/site-packages/distributed/comm/ucx.py", line 207, in read
    frames, deserialize=self.deserialize, deserializers=deserializers
  File "/raid/cnolet/miniconda3/envs/cuml_dev_013/lib/python3.7/site-packages/distributed/comm/utils.py", line 73, in from_frames
    res = await offload(_from_frames)
  File "/raid/cnolet/miniconda3/envs/cuml_dev_013/lib/python3.7/site-packages/distributed/utils.py", line 1458, in offload
    return await loop.run_in_executor(_offload_executor, lambda: fn(*args, **kwargs))
  File "/raid/cnolet/miniconda3/envs/cuml_dev_013/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/raid/cnolet/miniconda3/envs/cuml_dev_013/lib/python3.7/site-packages/distributed/utils.py", line 1458, in <lambda>
    return await loop.run_in_executor(_offload_executor, lambda: fn(*args, **kwargs))
  File "/raid/cnolet/miniconda3/envs/cuml_dev_013/lib/python3.7/site-packages/distributed/comm/utils.py", line 61, in _from_frames
    frames, deserialize=deserialize, deserializers=deserializers
  File "/raid/cnolet/miniconda3/envs/cuml_dev_013/lib/python3.7/site-packages/distributed/protocol/core.py", line 124, in loads
    value = _deserialize(head, fs, deserializers=deserializers)
  File "/raid/cnolet/miniconda3/envs/cuml_dev_013/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 255, in deserialize
    deserializers=deserializers,
  File "/raid/cnolet/miniconda3/envs/cuml_dev_013/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 268, in deserialize
    return loads(header, frames)
  File "/raid/cnolet/miniconda3/envs/cuml_dev_013/lib/python3.7/site-packages/distributed/protocol/cuda.py", line 28, in cuda_loads
    return loads(header, frames)
  File "/raid/cnolet/miniconda3/envs/cuml_dev_013/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 608, in deserialize
    v = deserialize(h, f)
  File "/raid/cnolet/miniconda3/envs/cuml_dev_013/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 268, in deserialize
    return loads(header, frames)
  File "/raid/cnolet/miniconda3/envs/cuml_dev_013/lib/python3.7/site-packages/distributed/protocol/cuda.py", line 28, in cuda_loads
    return loads(header, frames)
  File "/raid/cnolet/miniconda3/envs/cuml_dev_013/lib/python3.7/site-packages/distributed/protocol/cupy.py", line 63, in cuda_deserialize_cupy_ndarray
    frame = PatchedCudaArrayInterface(frame)
  File "/raid/cnolet/miniconda3/envs/cuml_dev_013/lib/python3.7/site-packages/distributed/protocol/cupy.py", line 26, in __init__
    self.__cuda_array_interface__ = ary.__cuda_array_interface__
AttributeError: 'bytes' object has no attribute '__cuda_array_interface__'
distributed.worker - ERROR - 'bytes' object has no attribute '__cuda_array_interface__'

I believe this is the same as https://github.com/rapidsai/ucx-py/issues/421, and while it only occurs when protocol=ucx, the stack trace is giving me all dask.distributed errors, so I’ve opted to start a fresh thread here.

cc @jakirkham

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 23 (14 by maintainers)

Most upvoted comments

@cjnolet, yup I am seeing that as well.

@mrocklin @jakirkham seems like you two both want the same thing 😃

We can avoid this routine by a check at the end of the merge_frames routine:

Alternatively, maybe with UCX we shouldn’t be splitting and merging frames at all?

I think I found the issue. With large and irregular sizes, within merge_frames is logic which calls ensure_bytes:

https://github.com/dask/distributed/blob/f2f82c6c2e8d36731cb3fb82fb1f80ea0323358e/distributed/protocol/utils.py#L94-L95

ensure_bytes calls bytes, not on a CuPy array, but rather and rmm.DeviceBuffer:

bytes(rmm.DeviceBuffer(size=10))

We can avoid this routine by a check at the end of the merge_frames routine:

        else:
            if any([hasattr(f, "__cuda_array_interface__") for f in L]):
                out.extend(L)
            else:
                out.append(b"".join(map(ensure_bytes, L)))

Thanks for that simple reproducer Corey! 😄

@quasiben, actually, here’s a more simple reproducible example:

import dask
from dask_cuda import LocalCUDACluster
from dask.distributed import Client
import cupy as cp

cluster = LocalCUDACluster(protocol="ucx")
client = Client(cluster)
def make_large_data(x):
    return cp.random.random((5, 8000000)), cp.random.random((5))

fut = [client.submit(make_large_data, i) for i in range(5)]
client.compute(fut, sync=True)

This is the exception I get:

distributed.protocol.core - CRITICAL - Failed to deserialize
Traceback (most recent call last):
  File "/raid/cnolet/miniconda3/envs/cuml_dev_013/lib/python3.7/site-packages/distributed/protocol/core.py", line 125, in loads
    value = _deserialize(head, fs, deserializers=deserializers)
  File "/raid/cnolet/miniconda3/envs/cuml_dev_013/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 258, in deserialize
    deserializers=deserializers,
  File "/raid/cnolet/miniconda3/envs/cuml_dev_013/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 271, in deserialize
    return loads(header, frames)
  File "/raid/cnolet/miniconda3/envs/cuml_dev_013/lib/python3.7/site-packages/distributed/protocol/cuda.py", line 28, in cuda_loads
    return loads(header, frames)
  File "/raid/cnolet/miniconda3/envs/cuml_dev_013/lib/python3.7/site-packages/distributed/protocol/cupy.py", line 63, in cuda_deserialize_cupy_ndarray
    frame = PatchedCudaArrayInterface(frame)
  File "/raid/cnolet/miniconda3/envs/cuml_dev_013/lib/python3.7/site-packages/distributed/protocol/cupy.py", line 26, in __init__
    self.__cuda_array_interface__ = ary.__cuda_array_interface__
AttributeError: 'bytes' object has no attribute '__cuda_array_interface__'
distributed.utils - ERROR - 'bytes' object has no attribute '__cuda_array_interface__'