cupy: CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered

Description

---------------------------------------------------------------------------
CUDARuntimeError                          Traceback (most recent call last)
~/miniconda3/envs/cuda_11_8/lib/python3.9/site-packages/IPython/core/formatters.py in __call__(self, obj)
    700                 type_pprinters=self.type_printers,
    701                 deferred_pprinters=self.deferred_printers)
--> 702             printer.pretty(obj)
    703             printer.flush()
    704             return stream.getvalue()

~/miniconda3/envs/cuda_11_8/lib/python3.9/site-packages/IPython/lib/pretty.py in pretty(self, obj)
    392                         if cls is not object \
    393                                 and callable(cls.__dict__.get('__repr__')):
--> 394                             return _repr_pprint(obj, self, cycle)
    395 
    396             return _default_pprint(obj, self, cycle)

~/miniconda3/envs/cuda_11_8/lib/python3.9/site-packages/IPython/lib/pretty.py in _repr_pprint(obj, p, cycle)
    698     """A pprint that just redirects to the normal repr function."""
    699     # Find newlines and replace them with p.break_()
--> 700     output = repr(obj)
    701     lines = output.splitlines()
    702     with p.group():

cupy/_core/core.pyx in cupy._core.core._ndarray_base.__repr__()

cupy/_core/core.pyx in cupy._core.core._ndarray_base.get()

cupy/cuda/memory.pyx in cupy.cuda.memory.MemoryPointer.copy_to_host()

cupy_backends/cuda/api/runtime.pyx in cupy_backends.cuda.api.runtime.streamIsCapturing()

cupy_backends/cuda/api/runtime.pyx in cupy_backends.cuda.api.runtime.check_status()

CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered

To Reproduce

# needed for supressing gpu memory leak after previous run, CAC 220505
#%reset -f

import numpy as np
import matplotlib.pyplot as plt

import sigpy as sp
import sigpy.plot as pl

from tkinter import Tk
from tkinter import filedialog as fd
root = Tk() # pointing root to Tk() to use it as Tk() in program.
root.withdraw() # Hides small tkinter window.
root.attributes('-topmost', True) # Opened windows will be active. above all windows despite of selection.

import cupy as cp
import multiprocessing

from tqdm.auto import trange

#%matplotlib widget
#%matplotlib inline

gpu_av = cp.cuda.is_available()
print( 'gpu avalable:', gpu_av)
#gpu_av = False # for testing with CPU

if gpu_av:
    device_num = 0 # -1 for CPU, 0 for GPU
else: 
    device_num = -1 # force CPU
    
gpu_device = sp.Device( device_num)
cpu_device = sp.Device( -1)
print( 'device id:', gpu_device.id)

xp = gpu_device.xp  # Returns NumPy if id == -1, otherwise returns CuPy
xp # use in place of cp or np

mempool = cp.get_default_memory_pool()
mempool.free_all_blocks()
mempool.set_limit(size=15*1024**3)  # 15 GiB
print(cp.get_default_memory_pool().get_limit())

Nx = 181; Ny = 217; Nz = 181
Nscale = 4095; Ncat = 9
N = 256 # starting padded size
fov = (N, N, N)
phantom_crisp = xp.ones( fov)
phantom_crisp = sp.to_device( phantom_crisp, device=gpu_device)
phantom_crisp_fft = sp.fft( phantom_crisp, oshape=fov)
phantom_crisp_pad = xp.real( sp.ifft( phantom_crisp_fft))
tissues = ['Nothing', 'CSF', 'GM', 'WM', 'Fat', 'MuSk', 'Skin', 'Skull', 'Glial', 'Conn']
tissue_idxs = []
tissue_fr = xp.array( [])
V = np.size( phantom_crisp_pad)
for idx, item in enumerate( tissues):
        #print( idx, item)
        tmp_idxs = xp.asarray( xp.nonzero( (phantom_crisp_pad > idx - .1) * (phantom_crisp_pad < idx + .1)))
        tissue_idxs.append( tmp_idxs)
        tissue_fr = xp.append( tissue_fr, xp.shape( tmp_idxs)[1]/V)
total_fr = xp.sum( tissue_fr)
assert  total_fr == 1.0
#tissue_parameters [[PD, R1]]
#Water proton T 1 measurements in brain tissue at 7, 3, and 1.5 T using IR-EPI, IR-TSE, and MPRAGE: results and optimization
#https://pubmed.ncbi.nlm.nih.gov/18259791/
tissue_parameters = xp.array( \
    [[0., 9999.],
    [1.0, .25],
    [0.716, 0.515],
    [0.837, 0.885],
    [1., 2.],
    [0.7, 0.5],
    [0.75, 0.55],
    [0.8, 0.6],
    [0.8, 0.8],
    [0.7, 0.6]] )
phantom_pd = xp.zeros_like( phantom_crisp_pad)
#phantom_pd[ tissue_idxs[0]] = tissue_parameters[0, 0]
phantom_pd[ tissue_idxs[1]] = tissue_parameters[1, 0]
#phantom_pd[ tissue_idxs[2]] = tissue_parameters[2, 0]
#phantom_pd[ tissue_idxs[3]] = tissue_parameters[3, 0]
#phantom_pd[ tissue_idxs[4]] = tissue_parameters[4, 0]
#phantom_pd[ tissue_idxs[5]] = tissue_parameters[5, 0]
#phantom_pd[ tissue_idxs[6]] = tissue_parameters[6, 0]
#phantom_pd[ tissue_idxs[7]] = tissue_parameters[7, 0]
#phantom_pd[ tissue_idxs[8]] = tissue_parameters[8, 0]
#phantom_pd[ tissue_idxs[9]] = tissue_parameters[9, 0]
phantom_pd

Installation

Conda-Forge (conda install ...)

Environment

OS                           : Linux-5.13.0-1012-oem-x86_64-with-glibc2.31
Python Version               : 3.9.0
CuPy Version                 : 11.4.0
CuPy Platform                : NVIDIA CUDA
NumPy Version                : 1.23.5
SciPy Version                : 1.9.3
Cython Build Version         : 0.29.32
Cython Runtime Version       : None
CUDA Root                    : /home/curt/miniconda3/envs/ccdev
nvcc PATH                    : None
CUDA Build Version           : 11020
CUDA Driver Version          : 12010
CUDA Runtime Version         : 11080
cuBLAS Version               : (available)
cuFFT Version                : 10900
cuRAND Version               : 10300
cuSOLVER Version             : (11, 4, 1)
cuSPARSE Version             : (available)
NVRTC Version                : (11, 8)
Thrust Version               : 101000
CUB Build Version            : 101000
Jitify Build Version         : 3bc2849
cuDNN Build Version          : 8401
cuDNN Version                : 8401
NCCL Build Version           : 21403
NCCL Runtime Version         : 21403
cuTENSOR Version             : None
cuSPARSELt Build Version     : None
Device 0 Name                : NVIDIA RTX A5000 Laptop GPU
Device 0 Compute Capability  : 86
Device 0 PCI Bus ID          : 0000:01:00.0

Additional Information

This error also happens consistently on at least one different platform, hardware and several variations of environment.

Ubuntu 18.04 and 20.04 RTX8000 and A5000 Laptop Cuda: 11.4, 11.7, 11.8 nvida driver: 470, 525, 530

Thanks for developing cupy!

About this issue

Original URL
State: open
Created a year ago
Reactions: 1
Comments: 16 (6 by maintainers)

Most upvoted comments

Without the ‘asarray’ entirely it works up to the memory limit!

The indexing via an array seems to be the trigger…and can be worked around, I will check some more.

N = 1024-64 # starting padded size
fov = (N, N, N)
print( 'fov:', fov)
phantom_crisp = xp.ones( fov)

tmp_idxs = xp.nonzero( (phantom_crisp > 1 - .1) * (phantom_crisp < 1 + .1))

phantom_pd = xp.zeros_like( phantom_crisp)

phantom_pd[ tmp_idxs] = 0.5

CUDA_LAUNCH_BLOCKING =  1
gpu avalable: True
device id: 0
memory pool: 49392123904
fov: (960, 960, 960)

gpu_memory_n_960

curtcorum on Mar 28, 2023

I mean that my question is stupid and just to double check that the values used as indexes are correct 😃, I didn’t mean that the issue was stupid! (Sorry for creating confusion here)

emcastillo on Mar 28, 2023

Could you rerun your code with CUDA_LAUNCH_BLOCKING=1 environment variable set to spot which line is causing the problem?

kmaehashi on Mar 27, 2023