distributed: ERROR - unpack requires a bytes object of length 8

I am using the latest github master of distributed (1.16.2)

I am having workers die with the following error. I really have no idea what it means.

distributed.worker - ERROR - unpack requires a bytes object of length 8
Traceback (most recent call last):
  File "/rigel/home/ra2697/distributed/distributed/worker.py", line 1838, in execute
    args2 = pack_data(args, self.data, key_types=str)
  File "/rigel/home/ra2697/distributed/distributed/utils_comm.py", line 236, in pack_data
    return typ([pack_data(x, d, key_types=key_types) for x in o])
  File "/rigel/home/ra2697/distributed/distributed/utils_comm.py", line 236, in <listcomp>
    return typ([pack_data(x, d, key_types=key_types) for x in o])
  File "/rigel/home/ra2697/distributed/distributed/utils_comm.py", line 231, in pack_data
    return d[o]
  File "/rigel/home/ra2697/miniconda/envs/dask_distributed/lib/python3.5/site-packages/zict/buffer.py", line 70, in __getitem__
    return self.slow_to_fast(key)
  File "/rigel/home/ra2697/miniconda/envs/dask_distributed/lib/python3.5/site-packages/zict/buffer.py", line 57, in slow_to_fast
    value = self.slow[key]
  File "/rigel/home/ra2697/miniconda/envs/dask_distributed/lib/python3.5/site-packages/zict/func.py", line 39, in __getitem__
    return self.load(self.d[key])
  File "/rigel/home/ra2697/distributed/distributed/protocol/serialize.py", line 364, in deserialize_bytes
    frames = unpack_frames(b)
  File "/rigel/home/ra2697/distributed/distributed/protocol/utils.py", line 119, in unpack_frames
    (length,) = struct.unpack('Q', b[(i + 1) * 8: (i + 2) * 8])
struct.error: unpack requires a bytes object of length 8

About this issue

Original URL
State: closed
Created 7 years ago
Comments: 40 (20 by maintainers)

Most upvoted comments

Just to give context. The error you’re getting is when workers have done some work, have too much data to fit in ram and then have stored some of that data onto disk. This is fine so far. What isn’t fine is when they go to pick the data back up from disk when it has to be used again.

The second failure you’re getting here is also related to dumping data to disk. It may be related to your actual error. Although, if you’re on a huge machine, it might also just be that it’s writing an incredible amount of data to your NFS in order to try to trigger the situation. I should adjust this test.

mrocklin on May 4, 2017