dask-cuda: Spill over after libcudf++ merge is causing CUDA_ERROR_OUT_OF_MEMORY issues
Spill over after libcudf++ merge is causing CUDA_ERROR_OUT_OF_MEMORY issues
After the libcudf++ merge, the spill over mechanism might be failing.
The current hypothesis is, in dask-cuda it looks like if it spills to disk when moving back to the GPU it will allocate via numba instead of RMM.
Relevant code lines are:
From Dask-cuda:
From distributed (example of how it should be handled ):
CC: @jakirkham @pentschev @kkraus14 .
Code to recreate the issue:
https://gist.github.com/VibhuJawa/dbf2573954db86fb193b687022a20f46
Note:
I have not run the cleaned up code again on exp01 but the issue should still be there. (Exp-01 was busy)
Stack Trace
ERROR Call to cuMemAlloc results in CUDA_ERROR_OUT_OF_MEMORY
ERROR Call to cuMemAlloc results in CUDA_ERROR_OUT_OF_MEMORY
distributed.worker - ERROR - [2] Call to cuMemAlloc results in CUDA_ERROR_OUT_OF_MEMORY
Traceback (most recent call last):
File "/raid/vjawa/conda_install/conda_env/envs/cudf_12_8_jan/lib/python3.7/site-packages/numba/cuda/cudadrv/driver.py", line 744, in _attempt_allocation
allocator()
File "/raid/vjawa/conda_install/conda_env/envs/cudf_12_8_jan/lib/python3.7/site-packages/numba/cuda/cudadrv/driver.py", line 759, in allocator
driver.cuMemAlloc(byref(ptr), bytesize)
File "/raid/vjawa/conda_install/conda_env/envs/cudf_12_8_jan/lib/python3.7/site-packages/numba/cuda/cudadrv/driver.py", line 294, in safe_cuda_api_call
self._check_error(fname, retcode)
File "/raid/vjawa/conda_install/conda_env/envs/cudf_12_8_jan/lib/python3.7/site-packages/numba/cuda/cudadrv/driver.py", line 329, in _check_error
raise CudaAPIError(retcode, msg)
numba.cuda.cudadrv.driver.CudaAPIError: [2] Call to cuMemAlloc results in CUDA_ERROR_OUT_OF_MEMORY
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/raid/vjawa/conda_install/conda_env/envs/cudf_12_8_jan/lib/python3.7/site-packages/distributed/worker.py", line 2455, in execute
data[k] = self.data[k]
File "/raid/vjawa/conda_install/conda_env/envs/cudf_12_8_jan/lib/python3.7/site-packages/dask_cuda/device_host_file.py", line 152, in __getitem__
return self.device_buffer[key]
File "/raid/vjawa/conda_install/conda_env/envs/cudf_12_8_jan/lib/python3.7/site-packages/zict/buffer.py", line 70, in __getitem__
return self.slow_to_fast(key)
File "/raid/vjawa/conda_install/conda_env/envs/cudf_12_8_jan/lib/python3.7/site-packages/zict/buffer.py", line 57, in slow_to_fast
value = self.slow[key]
File "/raid/vjawa/conda_install/conda_env/envs/cudf_12_8_jan/lib/python3.7/site-packages/zict/func.py", line 39, in __getitem__
return self.load(self.d[key])
File "/raid/vjawa/conda_install/conda_env/envs/cudf_12_8_jan/lib/python3.7/site-packages/dask_cuda/device_host_file.py", line 90, in host_to_device
frames = [cuda.to_device(f) if ic else f for ic, f in zip(s.is_cuda, s.parts)]
File "/raid/vjawa/conda_install/conda_env/envs/cudf_12_8_jan/lib/python3.7/site-packages/dask_cuda/device_host_file.py", line 90, in <listcomp>
frames = [cuda.to_device(f) if ic else f for ic, f in zip(s.is_cuda, s.parts)]
File "/raid/vjawa/conda_install/conda_env/envs/cudf_12_8_jan/lib/python3.7/site-packages/numba/cuda/cudadrv/devices.py", line 225, in _require_cuda_context
return fn(*args, **kws)
File "/raid/vjawa/conda_install/conda_env/envs/cudf_12_8_jan/lib/python3.7/site-packages/numba/cuda/api.py", line 111, in to_device
to, new = devicearray.auto_device(obj, stream=stream, copy=copy)
File "/raid/vjawa/conda_install/conda_env/envs/cudf_12_8_jan/lib/python3.7/site-packages/numba/cuda/cudadrv/devicearray.py", line 704, in auto_device
devobj = from_array_like(obj, stream=stream)
File "/raid/vjawa/conda_install/conda_env/envs/cudf_12_8_jan/lib/python3.7/site-packages/numba/cuda/cudadrv/devicearray.py", line 642, in from_array_like
writeback=ary, stream=stream, gpu_data=gpu_data)
File "/raid/vjawa/conda_install/conda_env/envs/cudf_12_8_jan/lib/python3.7/site-packages/numba/cuda/cudadrv/devicearray.py", line 103, in __init__
gpu_data = devices.get_context().memalloc(self.alloc_size)
File "/raid/vjawa/conda_install/conda_env/envs/cudf_12_8_jan/lib/python3.7/site-packages/numba/cuda/cudadrv/driver.py", line 761, in memalloc
self._attempt_allocation(allocator)
File "/raid/vjawa/conda_install/conda_env/envs/cudf_12_8_jan/lib/python3.7/site-packages/numba/cuda/cudadrv/driver.py", line 751, in _attempt_allocation
allocator()
File "/raid/vjawa/conda_install/conda_env/envs/cudf_12_8_jan/lib/python3.7/site-packages/numba/cuda/cudadrv/driver.py", line 759, in allocator
driver.cuMemAlloc(byref(ptr), bytesize)
File "/raid/vjawa/conda_install/conda_env/envs/cudf_12_8_jan/lib/python3.7/site-packages/numba/cuda/cudadrv/driver.py", line 294, in safe_cuda_api_call
self._check_error(fname, retcode)
File "/raid/vjawa/conda_install/conda_env/envs/cudf_12_8_jan/lib/python3.7/site-packages/numba/cuda/cudadrv/driver.py", line 329, in _check_error
raise CudaAPIError(retcode, msg)
numba.cuda.cudadrv.driver.CudaAPIError: [2] Call to cuMemAlloc results in CUDA_ERROR_OUT_OF_MEMORY
tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOLoop object at 0x7fb3dd96c410>>, <Task finished coro=<Worker.execute() done, defined at /raid/vjawa/conda_install/conda_env/envs/cudf_12_8_jan/lib/python3.7/site-packages/distributed/worker.py:2438> exception=CudaAPIError(2, 'Call to cuMemAlloc results in CUDA_ERROR_OUT_OF_MEMORY')>)
Traceback (most recent call last):
File "/raid/vjawa/conda_install/conda_env/envs/cudf_12_8_jan/lib/python3.7/site-packages/numba/cuda/cudadrv/driver.py", line 744, in _attempt_allocation
allocator()
File "/raid/vjawa/conda_install/conda_env/envs/cudf_12_8_jan/lib/python3.7/site-packages/numba/cuda/cudadrv/driver.py", line 759, in allocator
driver.cuMemAlloc(byref(ptr), bytesize)
File "/raid/vjawa/conda_install/conda_env/envs/cudf_12_8_jan/lib/python3.7/site-packages/numba/cuda/cudadrv/driver.py", line 294, in safe_cuda_api_call
self._check_error(fname, retcode)
File "/raid/vjawa/conda_install/conda_env/envs/cudf_12_8_jan/lib/python3.7/site-packages/numba/cuda/cudadrv/driver.py", line 329, in _check_error
raise CudaAPIError(retcode, msg)
numba.cuda.cudadrv.driver.CudaAPIError: [2] Call to cuMemAlloc results in CUDA_ERROR_OUT_OF_MEMORY
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/raid/vjawa/conda_install/conda_env/envs/cudf_12_8_jan/lib/python3.7/site-packages/tornado/ioloop.py", line 743, in _run_callback
ret = callback()
File "/raid/vjawa/conda_install/conda_env/envs/cudf_12_8_jan/lib/python3.7/site-packages/tornado/ioloop.py", line 767, in _discard_future_result
future.result()
File "/raid/vjawa/conda_install/conda_env/envs/cudf_12_8_jan/lib/python3.7/site-packages/distributed/worker.py", line 2455, in execute
data[k] = self.data[k]
File "/raid/vjawa/conda_install/conda_env/envs/cudf_12_8_jan/lib/python3.7/site-packages/dask_cuda/device_host_file.py", line 152, in __getitem__
return self.device_buffer[key]
File "/raid/vjawa/conda_install/conda_env/envs/cudf_12_8_jan/lib/python3.7/site-packages/zict/buffer.py", line 70, in __getitem__
return self.slow_to_fast(key)
File "/raid/vjawa/conda_install/conda_env/envs/cudf_12_8_jan/lib/python3.7/site-packages/zict/buffer.py", line 57, in slow_to_fast
value = self.slow[key]
File "/raid/vjawa/conda_install/conda_env/envs/cudf_12_8_jan/lib/python3.7/site-packages/zict/func.py", line 39, in __getitem__
return self.load(self.d[key])
File "/raid/vjawa/conda_install/conda_env/envs/cudf_12_8_jan/lib/python3.7/site-packages/dask_cuda/device_host_file.py", line 90, in host_to_device
frames = [cuda.to_device(f) if ic else f for ic, f in zip(s.is_cuda, s.parts)]
File "/raid/vjawa/conda_install/conda_env/envs/cudf_12_8_jan/lib/python3.7/site-packages/dask_cuda/device_host_file.py", line 90, in <listcomp>
frames = [cuda.to_device(f) if ic else f for ic, f in zip(s.is_cuda, s.parts)]
File "/raid/vjawa/conda_install/conda_env/envs/cudf_12_8_jan/lib/python3.7/site-packages/numba/cuda/cudadrv/devices.py", line 225, in _require_cuda_context
return fn(*args, **kws)
File "/raid/vjawa/conda_install/conda_env/envs/cudf_12_8_jan/lib/python3.7/site-packages/numba/cuda/api.py", line 111, in to_device
to, new = devicearray.auto_device(obj, stream=stream, copy=copy)
File "/raid/vjawa/conda_install/conda_env/envs/cudf_12_8_jan/lib/python3.7/site-packages/numba/cuda/cudadrv/devicearray.py", line 704, in auto_device
devobj = from_array_like(obj, stream=stream)
File "/raid/vjawa/conda_install/conda_env/envs/cudf_12_8_jan/lib/python3.7/site-packages/numba/cuda/cudadrv/devicearray.py", line 642, in from_array_like
writeback=ary, stream=stream, gpu_data=gpu_data)
File "/raid/vjawa/conda_install/conda_env/envs/cudf_12_8_jan/lib/python3.7/site-packages/numba/cuda/cudadrv/devicearray.py", line 103, in __init__
gpu_data = devices.get_context().memalloc(self.alloc_size)
File "/raid/vjawa/conda_install/conda_env/envs/cudf_12_8_jan/lib/python3.7/site-packages/numba/cuda/cudadrv/driver.py", line 761, in memalloc
self._attempt_allocation(allocator)
File "/raid/vjawa/conda_install/conda_env/envs/cudf_12_8_jan/lib/python3.7/site-packages/numba/cuda/cudadrv/driver.py", line 751, in _attempt_allocation
allocator()
File "/raid/vjawa/conda_install/conda_env/envs/cudf_12_8_jan/lib/python3.7/site-packages/numba/cuda/cudadrv/driver.py", line 759, in allocator
driver.cuMemAlloc(byref(ptr), bytesize)
File "/raid/vjawa/conda_install/conda_env/envs/cudf_12_8_jan/lib/python3.7/site-packages/numba/cuda/cudadrv/driver.py", line 294, in safe_cuda_api_call
self._check_error(fname, retcode)
File "/raid/vjawa/conda_install/conda_env/envs/cudf_12_8_jan/lib/python3.7/site-packages/numba/cuda/cudadrv/driver.py", line 329, in _check_error
raise CudaAPIError(retcode, msg)
numba.cuda.cudadrv.driver.CudaAPIError: [2] Call to cuMemAlloc results in CUDA_ERROR_OUT_OF_MEMORY
ERROR Call to cuMemAlloc results in CUDA_ERROR_OUT_OF_MEMORY
ERROR Call to cuMemAlloc results in CUDA_ERROR_OUT_OF_MEMORY
distributed.worker - ERROR - [2] Call to cuMemAlloc results in CUDA_ERROR_OUT_OF_MEMORY
Traceback (most recent call last):
File "/raid/vjawa/conda_install/conda_env/envs/cudf_12_8_jan/lib/python3.7/site-packages/numba/cuda/cudadrv/driver.py", line 744, in _attempt_allocation
allocator()
File "/raid/vjawa/conda_install/conda_env/envs/cudf_12_8_jan/lib/python3.7/site-packages/numba/cuda/cudadrv/driver.py", line 759, in allocator
driver.cuMemAlloc(byref(ptr), bytesize)
File "/raid/vjawa/conda_install/conda_env/envs/cudf_12_8_jan/lib/python3.7/site-packages/numba/cuda/cudadrv/driver.py", line 294, in safe_cuda_api_call
self._check_error(fname, retcode)
File "/raid/vjawa/conda_install/conda_env/envs/cudf_12_8_jan/lib/python3.7/site-packages/numba/cuda/cudadrv/driver.py", line 329, in _check_error
raise CudaAPIError(retcode, msg)
numba.cuda.cudadrv.driver.CudaAPIError: [2] Call to cuMemAlloc results in CUDA_ERROR_OUT_OF_MEMORY
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 18 (18 by maintainers)
Thanks a lot @VibhuJawa for testing this. I’ll make sure this is merged for 0.12, will leave this issue open until we merge it there.
Yup, I believe so.
I tested it on the same environment by just doing a source install of
dask-cuda(branch 277).I.E, It works on below :
And Fails on below:
@jakirkham , Yup, The issue no longer seems to be present as the workflow works now. Thanks for closing.
@jakirkham, Sure will update here once i get the time .
Just for clarification, @VibhuJawa does that mean it did not work in pure cuDF with the same version (i.e., this PR definitively caused the fix)?
@pentschev , I tested #227 and it works now successfully. Thanks a lot for your work on this and sorry for the delay in testing.
Tested on below Cudf versions (for record keeping) :