dask: Kernel CUDA_ERROR_ILLEGAL_ADDRESS, depending on whether a print statement is executed in da.map_blocks()

Issue I am trying to run the example “Parallelizing Custom CuPy Kernels with Dask”, located here. It’s been lightly modified, my code is shown below.

Notice the print statement in the function dispatch_add_broadcast(x,y). Obviously, this is not desired, but:

  • With the print statement, it doesn’t crash.
  • Without the print statement, I get a CUDA_ERROR_ILLEGAL_ADDRESS error:
  File "test_dask.py", line 58, in <module>                                                                            
    result = res.compute()                                                                                             
  File "C:\Dev\Anaconda3\envs\pyt13_37\lib\site-packages\dask\base.py", line 166, in compute                           
    (result,) = compute(self, traverse=False, **kwargs)                                                                
  File "C:\Dev\Anaconda3\envs\pyt13_37\lib\site-packages\dask\base.py", line 438, in compute                           
    return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])                                              
  File "C:\Dev\Anaconda3\envs\pyt13_37\lib\site-packages\dask\base.py", line 438, in <listcomp>                        
    return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])                                              
  File "C:\Dev\Anaconda3\envs\pyt13_37\lib\site-packages\dask\array\core.py", line 955, in finalize                    
    return concatenate3(results)                                                                                       
  File "C:\Dev\Anaconda3\envs\pyt13_37\lib\site-packages\dask\array\core.py", line 4310, in concatenate3               
    return _concatenate2(arrays, axes=list(range(x.ndim)))                                                             
  File "C:\Dev\Anaconda3\envs\pyt13_37\lib\site-packages\dask\array\core.py", line 304, in _concatenate2               
    arrays = [_concatenate2(a, axes=axes[1:]) for a in arrays]                                                         
  File "C:\Dev\Anaconda3\envs\pyt13_37\lib\site-packages\dask\array\core.py", line 304, in <listcomp>                  
    arrays = [_concatenate2(a, axes=axes[1:]) for a in arrays]                                                         
  File "C:\Dev\Anaconda3\envs\pyt13_37\lib\site-packages\dask\array\core.py", line 308, in _concatenate2               
    return concatenate(arrays, axis=axes[0])                                                                           
  File "C:\Dev\Anaconda3\envs\pyt13_37\lib\site-packages\cupy\manipulation\join.py", line 54, in concatenate           
    return core.concatenate_method(tup, axis)                                                                          
  File "cupy\core\_routines_manipulation.pyx", line 577, in cupy.core._routines_manipulation.concatenate_method        
  File "cupy\core\_routines_manipulation.pyx", line 623, in cupy.core._routines_manipulation.concatenate_method        
  File "cupy\core\_routines_manipulation.pyx", line 658, in cupy.core._routines_manipulation._concatenate              
  File "cupy\core\_kernel.pyx", line 951, in cupy.core._kernel.ufunc.__call__                                          
  File "cupy\core\_kernel.pyx", line 974, in cupy.core._kernel.ufunc._get_ufunc_kernel                                 
  File "cupy\core\_kernel.pyx", line 714, in cupy.core._kernel._get_ufunc_kernel                                       
  File "cupy\core\_kernel.pyx", line 61, in cupy.core._kernel._get_simple_elementwise_kernel                           
  File "cupy\core\carray.pxi", line 192, in cupy.core.core.compile_with_cache                                          
  File "C:\Dev\Anaconda3\envs\pyt13_37\lib\site-packages\cupy\cuda\compiler.py", line 287, in compile_with_cache       
    extra_source, backend)                                                                                             
  File "C:\Dev\Anaconda3\envs\pyt13_37\lib\site-packages\cupy\cuda\compiler.py", line 335, in _compile_with_cache_cuda 
    mod.load(cubin)                                                                                                    
  File "cupy\cuda\function.pyx", line 197, in cupy.cuda.function.Module.load                                           
  File "cupy\cuda\function.pyx", line 199, in cupy.cuda.function.Module.load                                           
  File "cupy\cuda\driver.pyx", line 240, in cupy.cuda.driver.moduleLoadData                                            
  File "cupy\cuda\driver.pyx", line 118, in cupy.cuda.driver.check_status                                              
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered                 
Traceback (most recent call last):                                                                                     
  File "cupy\cuda\driver.pyx", line 247, in cupy.cuda.driver.moduleUnload                                              
  File "cupy\cuda\driver.pyx", line 118, in cupy.cuda.driver.check_status                                              
TypeError: 'NoneType' object is not callable                                                                           
Exception ignored in: 'cupy.cuda.function.Module.__dealloc__'                                                          
Traceback (most recent call last):                                                                                     
  File "cupy\cuda\driver.pyx", line 247, in cupy.cuda.driver.moduleUnload                                              
  File "cupy\cuda\driver.pyx", line 118, in cupy.cuda.driver.check_status                                              
TypeError: 'NoneType' object is not callable                                                                           
Error in sys.excepthook:                                                                                               
                                                                                                                       
Original exception was:                                                                                                

Any ideas on how to debug this or what could be going wrong? Copying @pentschev, the author of the blog post.

Code

from dask.distributed import Client
#from dask_cuda import LocalCUDACluster
from dask.array.utils import assert_eq
import dask
import dask.array as da
import cupy

print('Cupy:', cupy.__version__)  # version 7.2.0
print('Dask:', dask.__version__)  # version 2.11.0

add_broadcast_kernel = cupy.RawKernel(
    r'''
    extern "C" __global__
    void add_broadcast_kernel(
        const float* x, const float* y, float* z,
        const int xdim0, const int zdim0)
    {
        int idx0 = blockIdx.x * blockDim.x + threadIdx.x;
        int idx1 = blockIdx.y * blockDim.y + threadIdx.y;
        z[idx1 * zdim0 + idx0] = x[idx1 * xdim0 + idx0] + y[idx0];
    }
    ''',
    'add_broadcast_kernel'
)

def dispatch_add_broadcast(x, y):
    block_size = (32, 32)
    grid_size = (x.shape[1] // block_size[1], x.shape[0] // block_size[0])

    z = cupy.empty(x.shape, x.dtype)

    xdim0 = x.strides[0] // x.strides[1]
    zdim0 = z.strides[0] // z.strides[1]
    
    # ********** FOLLOW LINE PREVENTS THE ERROR **********
    print(xdim0, zdim0, x[0][0])  
    add_broadcast_kernel(grid_size, block_size, (x, y, z, xdim0, zdim0))
    return z

if __name__ == "__main__":
    #cluster = LocalCUDACluster()
    #client = Client(cluster)

    x = cupy.arange(4096 * 1024, dtype=cupy.float32).reshape((4096, 1024))
    y = cupy.arange(1024, dtype=cupy.float32).reshape(1, 1024)

    res_cupy = x + y
    res_add_broadcast = dispatch_add_broadcast(x, y)

    assert_eq(res_cupy, res_add_broadcast)

    dx = da.from_array(x, chunks=(1024, 512), asarray=False)
    dy = da.from_array(y, chunks=(1, 512), asarray=False)

    res = da.map_blocks(dispatch_add_broadcast, dx, dy, dtype=dx.dtype, chunks=(8,1))
    print('----- Calling res.compute() ----- ')
    result = res.compute()

    assert_eq(result, res_cupy)

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 24 (8 by maintainers)

Most upvoted comments

Confirmed; with CuPy 8.0 I now don’t get a crash.

I’m sorry @drcdr for leaving this hanging. Unfortunately there’s nothing obvious I can see in your code right away so this will take some more time to debug, I might be able to take a look at this but it won’t probably happen before mid-May. As @leofang said in https://github.com/dask/dask/issues/5992#issuecomment-611692131, usually that’s due to lack of context or accessing invalid memory – particularly I think it’s the latter.