distributed: ERROR - unpack requires a bytes object of length 8
I am using the latest github master of distributed (1.16.2)
I am having workers die with the following error. I really have no idea what it means.
distributed.worker - ERROR - unpack requires a bytes object of length 8
Traceback (most recent call last):
File "/rigel/home/ra2697/distributed/distributed/worker.py", line 1838, in execute
args2 = pack_data(args, self.data, key_types=str)
File "/rigel/home/ra2697/distributed/distributed/utils_comm.py", line 236, in pack_data
return typ([pack_data(x, d, key_types=key_types) for x in o])
File "/rigel/home/ra2697/distributed/distributed/utils_comm.py", line 236, in <listcomp>
return typ([pack_data(x, d, key_types=key_types) for x in o])
File "/rigel/home/ra2697/distributed/distributed/utils_comm.py", line 231, in pack_data
return d[o]
File "/rigel/home/ra2697/miniconda/envs/dask_distributed/lib/python3.5/site-packages/zict/buffer.py", line 70, in __getitem__
return self.slow_to_fast(key)
File "/rigel/home/ra2697/miniconda/envs/dask_distributed/lib/python3.5/site-packages/zict/buffer.py", line 57, in slow_to_fast
value = self.slow[key]
File "/rigel/home/ra2697/miniconda/envs/dask_distributed/lib/python3.5/site-packages/zict/func.py", line 39, in __getitem__
return self.load(self.d[key])
File "/rigel/home/ra2697/distributed/distributed/protocol/serialize.py", line 364, in deserialize_bytes
frames = unpack_frames(b)
File "/rigel/home/ra2697/distributed/distributed/protocol/utils.py", line 119, in unpack_frames
(length,) = struct.unpack('Q', b[(i + 1) * 8: (i + 2) * 8])
struct.error: unpack requires a bytes object of length 8
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 40 (20 by maintainers)
Just to give context. The error you’re getting is when workers have done some work, have too much data to fit in ram and then have stored some of that data onto disk. This is fine so far. What isn’t fine is when they go to pick the data back up from disk when it has to be used again.
The second failure you’re getting here is also related to dumping data to disk. It may be related to your actual error. Although, if you’re on a huge machine, it might also just be that it’s writing an incredible amount of data to your NFS in order to try to trigger the situation. I should adjust this test.