dask: Kernel CUDA_ERROR_ILLEGAL_ADDRESS, depending on whether a print statement is executed in da.map_blocks()
Issue I am trying to run the example “Parallelizing Custom CuPy Kernels with Dask”, located here. It’s been lightly modified, my code is shown below.
Notice the print statement in the function dispatch_add_broadcast(x,y). Obviously, this is not desired, but:
- With the print statement, it doesn’t crash.
- Without the print statement, I get a CUDA_ERROR_ILLEGAL_ADDRESS error:
File "test_dask.py", line 58, in <module>
result = res.compute()
File "C:\Dev\Anaconda3\envs\pyt13_37\lib\site-packages\dask\base.py", line 166, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "C:\Dev\Anaconda3\envs\pyt13_37\lib\site-packages\dask\base.py", line 438, in compute
return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
File "C:\Dev\Anaconda3\envs\pyt13_37\lib\site-packages\dask\base.py", line 438, in <listcomp>
return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
File "C:\Dev\Anaconda3\envs\pyt13_37\lib\site-packages\dask\array\core.py", line 955, in finalize
return concatenate3(results)
File "C:\Dev\Anaconda3\envs\pyt13_37\lib\site-packages\dask\array\core.py", line 4310, in concatenate3
return _concatenate2(arrays, axes=list(range(x.ndim)))
File "C:\Dev\Anaconda3\envs\pyt13_37\lib\site-packages\dask\array\core.py", line 304, in _concatenate2
arrays = [_concatenate2(a, axes=axes[1:]) for a in arrays]
File "C:\Dev\Anaconda3\envs\pyt13_37\lib\site-packages\dask\array\core.py", line 304, in <listcomp>
arrays = [_concatenate2(a, axes=axes[1:]) for a in arrays]
File "C:\Dev\Anaconda3\envs\pyt13_37\lib\site-packages\dask\array\core.py", line 308, in _concatenate2
return concatenate(arrays, axis=axes[0])
File "C:\Dev\Anaconda3\envs\pyt13_37\lib\site-packages\cupy\manipulation\join.py", line 54, in concatenate
return core.concatenate_method(tup, axis)
File "cupy\core\_routines_manipulation.pyx", line 577, in cupy.core._routines_manipulation.concatenate_method
File "cupy\core\_routines_manipulation.pyx", line 623, in cupy.core._routines_manipulation.concatenate_method
File "cupy\core\_routines_manipulation.pyx", line 658, in cupy.core._routines_manipulation._concatenate
File "cupy\core\_kernel.pyx", line 951, in cupy.core._kernel.ufunc.__call__
File "cupy\core\_kernel.pyx", line 974, in cupy.core._kernel.ufunc._get_ufunc_kernel
File "cupy\core\_kernel.pyx", line 714, in cupy.core._kernel._get_ufunc_kernel
File "cupy\core\_kernel.pyx", line 61, in cupy.core._kernel._get_simple_elementwise_kernel
File "cupy\core\carray.pxi", line 192, in cupy.core.core.compile_with_cache
File "C:\Dev\Anaconda3\envs\pyt13_37\lib\site-packages\cupy\cuda\compiler.py", line 287, in compile_with_cache
extra_source, backend)
File "C:\Dev\Anaconda3\envs\pyt13_37\lib\site-packages\cupy\cuda\compiler.py", line 335, in _compile_with_cache_cuda
mod.load(cubin)
File "cupy\cuda\function.pyx", line 197, in cupy.cuda.function.Module.load
File "cupy\cuda\function.pyx", line 199, in cupy.cuda.function.Module.load
File "cupy\cuda\driver.pyx", line 240, in cupy.cuda.driver.moduleLoadData
File "cupy\cuda\driver.pyx", line 118, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Traceback (most recent call last):
File "cupy\cuda\driver.pyx", line 247, in cupy.cuda.driver.moduleUnload
File "cupy\cuda\driver.pyx", line 118, in cupy.cuda.driver.check_status
TypeError: 'NoneType' object is not callable
Exception ignored in: 'cupy.cuda.function.Module.__dealloc__'
Traceback (most recent call last):
File "cupy\cuda\driver.pyx", line 247, in cupy.cuda.driver.moduleUnload
File "cupy\cuda\driver.pyx", line 118, in cupy.cuda.driver.check_status
TypeError: 'NoneType' object is not callable
Error in sys.excepthook:
Original exception was:
Any ideas on how to debug this or what could be going wrong? Copying @pentschev, the author of the blog post.
Code
from dask.distributed import Client
#from dask_cuda import LocalCUDACluster
from dask.array.utils import assert_eq
import dask
import dask.array as da
import cupy
print('Cupy:', cupy.__version__) # version 7.2.0
print('Dask:', dask.__version__) # version 2.11.0
add_broadcast_kernel = cupy.RawKernel(
r'''
extern "C" __global__
void add_broadcast_kernel(
const float* x, const float* y, float* z,
const int xdim0, const int zdim0)
{
int idx0 = blockIdx.x * blockDim.x + threadIdx.x;
int idx1 = blockIdx.y * blockDim.y + threadIdx.y;
z[idx1 * zdim0 + idx0] = x[idx1 * xdim0 + idx0] + y[idx0];
}
''',
'add_broadcast_kernel'
)
def dispatch_add_broadcast(x, y):
block_size = (32, 32)
grid_size = (x.shape[1] // block_size[1], x.shape[0] // block_size[0])
z = cupy.empty(x.shape, x.dtype)
xdim0 = x.strides[0] // x.strides[1]
zdim0 = z.strides[0] // z.strides[1]
# ********** FOLLOW LINE PREVENTS THE ERROR **********
print(xdim0, zdim0, x[0][0])
add_broadcast_kernel(grid_size, block_size, (x, y, z, xdim0, zdim0))
return z
if __name__ == "__main__":
#cluster = LocalCUDACluster()
#client = Client(cluster)
x = cupy.arange(4096 * 1024, dtype=cupy.float32).reshape((4096, 1024))
y = cupy.arange(1024, dtype=cupy.float32).reshape(1, 1024)
res_cupy = x + y
res_add_broadcast = dispatch_add_broadcast(x, y)
assert_eq(res_cupy, res_add_broadcast)
dx = da.from_array(x, chunks=(1024, 512), asarray=False)
dy = da.from_array(y, chunks=(1, 512), asarray=False)
res = da.map_blocks(dispatch_add_broadcast, dx, dy, dtype=dx.dtype, chunks=(8,1))
print('----- Calling res.compute() ----- ')
result = res.compute()
assert_eq(result, res_cupy)
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 24 (8 by maintainers)
Confirmed; with CuPy 8.0 I now don’t get a crash.
I’m sorry @drcdr for leaving this hanging. Unfortunately there’s nothing obvious I can see in your code right away so this will take some more time to debug, I might be able to take a look at this but it won’t probably happen before mid-May. As @leofang said in https://github.com/dask/dask/issues/5992#issuecomment-611692131, usually that’s due to lack of context or accessing invalid memory – particularly I think it’s the latter.