dask-cuda: Starting the cluster with memory_limit=None causes failures on the latest nightly

Starting the cluster with memory_limit=None causes failures on the latest nightly

Minimal Repro:

from dask_cuda import LocalCUDACluster
from dask.distributed import Client, wait
import dask 
 
def test_func():
    return "abc"
        

if __name__ == "__main__":
    cluster = LocalCUDACluster(memory_limit=None)
    client = Client(cluster)
    
    test_val = client.submit(test_func)
    print(test_val.result())

Trace:

Traceback (most recent call last):
  File "test_bug.py", line 14, in <module>
    print(test_val.result())
  File "/raid/vjawa/conda_install/conda_env/envs/cudf_march_30/lib/python3.7/site-packages/distributed/client.py", line 220, in result
    raise exc.with_traceback(tb)
  File "/raid/vjawa/conda_install/conda_env/envs/cudf_march_30/lib/python3.7/site-packages/dask_cuda/device_host_file.py", line 139, in __setitem__
    self.host_buffer[key] = value
  File "/raid/vjawa/conda_install/conda_env/envs/cudf_march_30/lib/python3.7/site-packages/zict/buffer.py", line 84, in __setitem__
    if self.weight(key, value) <= self.n:
TypeError: '<=' not supported between instances of 'int' and 'NoneType'

Env:

dask-cuda                 0.14.0a200330           py37_35    rapidsai-nightly

Work around:

Setting it to auto works.

    cluster = LocalCUDACluster(memory_limit='auto')

Other details:

This used to work earlier

dask-cuda                 0.13.0b200329           py37_86    rapidsai-nightly

CC: @ayushdg , who triaged this.

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 17 (17 by maintainers)

Most upvoted comments

Just to be clear memory_limit refers to host memory. There is a separate device_memory_limit for device memory, which we have discussed extending the same functionality too ( #270 ).

Thanks for the clarification. Agreed that a discussion for #270 around device_limits would also be useful. To clarify my discussion w.r.t. auto, I am referring to defaults (auto) with host memory.

ayushdg on Mar 31, 2020