distributed: KeyError: 'lengths'

Trying to find an older version of distributed that is not too buggy, having trouble.

rapids 0.14 pairs with 2.17 dask/disributed.

But 2.17 hits https://github.com/dask/distributed/issues/3851

Tried 2.18 as suggested there, but 2.18 hits below even for most basic test when have 2 GPUs.

Any thoughts? I can’t go to 2.27 used by rapids 0.14 because that causes many rapids tests to fail.

from dask.distributed import Client
from dask_cuda import LocalCUDACluster
from dask import dataframe as dd
import xgboost as xgb
def main(client):
    dask_df = dd.read_csv('creditcard.csv')
    target = 'default payment next month'
    y = dask_df['default payment next month']
    X = dask_df[dask_df.columns.difference([target])]
    dtrain = xgb.dask.DaskDMatrix(client, X, y)
    output = xgb.dask.train(client,
                            # Use GPU training algorithm
                            {'tree_method': 'gpu_hist'},
                            dtrain,
                            num_boost_round=100,
                            evals=[(dtrain, 'train')])
    booster = output['booster']  # booster is the trained model
    history = output['history']  # A dictionary containing evaluation results
    # Save the model to file
    booster.save_model('xgboost-model')
    print('Training evaluation history:', history)

    
if __name__ == '__main__':
    # `LocalCUDACluster` is used for assigning GPU to XGBoost 
    # processes. Here `n_workers` represents the number of GPUs 
    # since we use one GPU per worker process.
    with LocalCUDACluster(n_workers=2) as cluster:
        with Client(cluster) as client:
            main(client)
            

(base) jon@mr-dl10:/data/jon/h2oai.fullcondatest3$ python dask_cudf_example.py 
distributed.nanny - ERROR - Failed to start worker
Traceback (most recent call last):
  File "/home/jon/minicondadai/lib/python3.6/site-packages/distributed/nanny.py", line 758, in run
    await worker
  File "/home/jon/minicondadai/lib/python3.6/site-packages/distributed/core.py", line 236, in _
    await self.start()
  File "/home/jon/minicondadai/lib/python3.6/site-packages/distributed/worker.py", line 1085, in start
    await self._register_with_scheduler()
  File "/home/jon/minicondadai/lib/python3.6/site-packages/distributed/worker.py", line 811, in _register_with_scheduler
    types={k: typename(v) for k, v in self.data.items()},
  File "/home/jon/minicondadai/lib/python3.6/site-packages/distributed/worker.py", line 811, in <dictcomp>
    types={k: typename(v) for k, v in self.data.items()},
  File "/home/jon/minicondadai/lib/python3.6/_collections_abc.py", line 744, in __iter__
    yield (key, self._mapping[key])
  File "/home/jon/minicondadai/lib/python3.6/site-packages/dask_cuda/device_host_file.py", line 150, in __getitem__
    return self.host_buffer[key]
  File "/home/jon/minicondadai/lib/python3.6/site-packages/zict/buffer.py", line 78, in __getitem__
    return self.slow_to_fast(key)
  File "/home/jon/minicondadai/lib/python3.6/site-packages/zict/buffer.py", line 65, in slow_to_fast
    value = self.slow[key]
  File "/home/jon/minicondadai/lib/python3.6/site-packages/zict/func.py", line 38, in __getitem__
    return self.load(self.d[key])
  File "/home/jon/minicondadai/lib/python3.6/site-packages/distributed/protocol/serialize.py", line 505, in deserialize_bytes
    frames = merge_frames(header, frames)
  File "/home/jon/minicondadai/lib/python3.6/site-packages/distributed/protocol/utils.py", line 60, in merge_frames
    lengths = list(header["lengths"])
KeyError: 'lengths'

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 40 (12 by maintainers)

Most upvoted comments

I think it would be a large undertaking to patch/work around

To answer your question:

should I be able to use latest dask/distributed with old rapids 0.14?

I would not expect latest dask/distributed to work that far back as a lot of changes occurred in the serialization layers between Dask and RAPIDS. The errors you posted a probably a result of those changes. @jakirkham do you have any thoughts here ?