distributed: KeyError: 'lengths'
Trying to find an older version of distributed that is not too buggy, having trouble.
rapids 0.14 pairs with 2.17 dask/disributed.
But 2.17 hits https://github.com/dask/distributed/issues/3851
Tried 2.18 as suggested there, but 2.18 hits below even for most basic test when have 2 GPUs.
Any thoughts? I can’t go to 2.27 used by rapids 0.14 because that causes many rapids tests to fail.
from dask.distributed import Client
from dask_cuda import LocalCUDACluster
from dask import dataframe as dd
import xgboost as xgb
def main(client):
dask_df = dd.read_csv('creditcard.csv')
target = 'default payment next month'
y = dask_df['default payment next month']
X = dask_df[dask_df.columns.difference([target])]
dtrain = xgb.dask.DaskDMatrix(client, X, y)
output = xgb.dask.train(client,
# Use GPU training algorithm
{'tree_method': 'gpu_hist'},
dtrain,
num_boost_round=100,
evals=[(dtrain, 'train')])
booster = output['booster'] # booster is the trained model
history = output['history'] # A dictionary containing evaluation results
# Save the model to file
booster.save_model('xgboost-model')
print('Training evaluation history:', history)
if __name__ == '__main__':
# `LocalCUDACluster` is used for assigning GPU to XGBoost
# processes. Here `n_workers` represents the number of GPUs
# since we use one GPU per worker process.
with LocalCUDACluster(n_workers=2) as cluster:
with Client(cluster) as client:
main(client)
(base) jon@mr-dl10:/data/jon/h2oai.fullcondatest3$ python dask_cudf_example.py
distributed.nanny - ERROR - Failed to start worker
Traceback (most recent call last):
File "/home/jon/minicondadai/lib/python3.6/site-packages/distributed/nanny.py", line 758, in run
await worker
File "/home/jon/minicondadai/lib/python3.6/site-packages/distributed/core.py", line 236, in _
await self.start()
File "/home/jon/minicondadai/lib/python3.6/site-packages/distributed/worker.py", line 1085, in start
await self._register_with_scheduler()
File "/home/jon/minicondadai/lib/python3.6/site-packages/distributed/worker.py", line 811, in _register_with_scheduler
types={k: typename(v) for k, v in self.data.items()},
File "/home/jon/minicondadai/lib/python3.6/site-packages/distributed/worker.py", line 811, in <dictcomp>
types={k: typename(v) for k, v in self.data.items()},
File "/home/jon/minicondadai/lib/python3.6/_collections_abc.py", line 744, in __iter__
yield (key, self._mapping[key])
File "/home/jon/minicondadai/lib/python3.6/site-packages/dask_cuda/device_host_file.py", line 150, in __getitem__
return self.host_buffer[key]
File "/home/jon/minicondadai/lib/python3.6/site-packages/zict/buffer.py", line 78, in __getitem__
return self.slow_to_fast(key)
File "/home/jon/minicondadai/lib/python3.6/site-packages/zict/buffer.py", line 65, in slow_to_fast
value = self.slow[key]
File "/home/jon/minicondadai/lib/python3.6/site-packages/zict/func.py", line 38, in __getitem__
return self.load(self.d[key])
File "/home/jon/minicondadai/lib/python3.6/site-packages/distributed/protocol/serialize.py", line 505, in deserialize_bytes
frames = merge_frames(header, frames)
File "/home/jon/minicondadai/lib/python3.6/site-packages/distributed/protocol/utils.py", line 60, in merge_frames
lengths = list(header["lengths"])
KeyError: 'lengths'
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 40 (12 by maintainers)
I think it would be a large undertaking to patch/work around
To answer your question:
I would not expect latest dask/distributed to work that far back as a lot of changes occurred in the serialization layers between Dask and RAPIDS. The errors you posted a probably a result of those changes. @jakirkham do you have any thoughts here ?