cupy: CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered
Description
---------------------------------------------------------------------------
CUDARuntimeError Traceback (most recent call last)
~/miniconda3/envs/cuda_11_8/lib/python3.9/site-packages/IPython/core/formatters.py in __call__(self, obj)
700 type_pprinters=self.type_printers,
701 deferred_pprinters=self.deferred_printers)
--> 702 printer.pretty(obj)
703 printer.flush()
704 return stream.getvalue()
~/miniconda3/envs/cuda_11_8/lib/python3.9/site-packages/IPython/lib/pretty.py in pretty(self, obj)
392 if cls is not object \
393 and callable(cls.__dict__.get('__repr__')):
--> 394 return _repr_pprint(obj, self, cycle)
395
396 return _default_pprint(obj, self, cycle)
~/miniconda3/envs/cuda_11_8/lib/python3.9/site-packages/IPython/lib/pretty.py in _repr_pprint(obj, p, cycle)
698 """A pprint that just redirects to the normal repr function."""
699 # Find newlines and replace them with p.break_()
--> 700 output = repr(obj)
701 lines = output.splitlines()
702 with p.group():
cupy/_core/core.pyx in cupy._core.core._ndarray_base.__repr__()
cupy/_core/core.pyx in cupy._core.core._ndarray_base.get()
cupy/cuda/memory.pyx in cupy.cuda.memory.MemoryPointer.copy_to_host()
cupy_backends/cuda/api/runtime.pyx in cupy_backends.cuda.api.runtime.streamIsCapturing()
cupy_backends/cuda/api/runtime.pyx in cupy_backends.cuda.api.runtime.check_status()
CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered
To Reproduce
# needed for supressing gpu memory leak after previous run, CAC 220505
#%reset -f
import numpy as np
import matplotlib.pyplot as plt
import sigpy as sp
import sigpy.plot as pl
from tkinter import Tk
from tkinter import filedialog as fd
root = Tk() # pointing root to Tk() to use it as Tk() in program.
root.withdraw() # Hides small tkinter window.
root.attributes('-topmost', True) # Opened windows will be active. above all windows despite of selection.
import cupy as cp
import multiprocessing
from tqdm.auto import trange
#%matplotlib widget
#%matplotlib inline
gpu_av = cp.cuda.is_available()
print( 'gpu avalable:', gpu_av)
#gpu_av = False # for testing with CPU
if gpu_av:
device_num = 0 # -1 for CPU, 0 for GPU
else:
device_num = -1 # force CPU
gpu_device = sp.Device( device_num)
cpu_device = sp.Device( -1)
print( 'device id:', gpu_device.id)
xp = gpu_device.xp # Returns NumPy if id == -1, otherwise returns CuPy
xp # use in place of cp or np
mempool = cp.get_default_memory_pool()
mempool.free_all_blocks()
mempool.set_limit(size=15*1024**3) # 15 GiB
print(cp.get_default_memory_pool().get_limit())
Nx = 181; Ny = 217; Nz = 181
Nscale = 4095; Ncat = 9
N = 256 # starting padded size
fov = (N, N, N)
phantom_crisp = xp.ones( fov)
phantom_crisp = sp.to_device( phantom_crisp, device=gpu_device)
phantom_crisp_fft = sp.fft( phantom_crisp, oshape=fov)
phantom_crisp_pad = xp.real( sp.ifft( phantom_crisp_fft))
tissues = ['Nothing', 'CSF', 'GM', 'WM', 'Fat', 'MuSk', 'Skin', 'Skull', 'Glial', 'Conn']
tissue_idxs = []
tissue_fr = xp.array( [])
V = np.size( phantom_crisp_pad)
for idx, item in enumerate( tissues):
#print( idx, item)
tmp_idxs = xp.asarray( xp.nonzero( (phantom_crisp_pad > idx - .1) * (phantom_crisp_pad < idx + .1)))
tissue_idxs.append( tmp_idxs)
tissue_fr = xp.append( tissue_fr, xp.shape( tmp_idxs)[1]/V)
total_fr = xp.sum( tissue_fr)
assert total_fr == 1.0
#tissue_parameters [[PD, R1]]
#Water proton T 1 measurements in brain tissue at 7, 3, and 1.5 T using IR-EPI, IR-TSE, and MPRAGE: results and optimization
#https://pubmed.ncbi.nlm.nih.gov/18259791/
tissue_parameters = xp.array( \
[[0., 9999.],
[1.0, .25],
[0.716, 0.515],
[0.837, 0.885],
[1., 2.],
[0.7, 0.5],
[0.75, 0.55],
[0.8, 0.6],
[0.8, 0.8],
[0.7, 0.6]] )
phantom_pd = xp.zeros_like( phantom_crisp_pad)
#phantom_pd[ tissue_idxs[0]] = tissue_parameters[0, 0]
phantom_pd[ tissue_idxs[1]] = tissue_parameters[1, 0]
#phantom_pd[ tissue_idxs[2]] = tissue_parameters[2, 0]
#phantom_pd[ tissue_idxs[3]] = tissue_parameters[3, 0]
#phantom_pd[ tissue_idxs[4]] = tissue_parameters[4, 0]
#phantom_pd[ tissue_idxs[5]] = tissue_parameters[5, 0]
#phantom_pd[ tissue_idxs[6]] = tissue_parameters[6, 0]
#phantom_pd[ tissue_idxs[7]] = tissue_parameters[7, 0]
#phantom_pd[ tissue_idxs[8]] = tissue_parameters[8, 0]
#phantom_pd[ tissue_idxs[9]] = tissue_parameters[9, 0]
phantom_pd
Installation
Conda-Forge (conda install ...)
Environment
OS : Linux-5.13.0-1012-oem-x86_64-with-glibc2.31
Python Version : 3.9.0
CuPy Version : 11.4.0
CuPy Platform : NVIDIA CUDA
NumPy Version : 1.23.5
SciPy Version : 1.9.3
Cython Build Version : 0.29.32
Cython Runtime Version : None
CUDA Root : /home/curt/miniconda3/envs/ccdev
nvcc PATH : None
CUDA Build Version : 11020
CUDA Driver Version : 12010
CUDA Runtime Version : 11080
cuBLAS Version : (available)
cuFFT Version : 10900
cuRAND Version : 10300
cuSOLVER Version : (11, 4, 1)
cuSPARSE Version : (available)
NVRTC Version : (11, 8)
Thrust Version : 101000
CUB Build Version : 101000
Jitify Build Version : 3bc2849
cuDNN Build Version : 8401
cuDNN Version : 8401
NCCL Build Version : 21403
NCCL Runtime Version : 21403
cuTENSOR Version : None
cuSPARSELt Build Version : None
Device 0 Name : NVIDIA RTX A5000 Laptop GPU
Device 0 Compute Capability : 86
Device 0 PCI Bus ID : 0000:01:00.0
Additional Information
Possibly related to these issues and others? https://github.com/cupy/cupy/issues/6789 https://github.com/cupy/cupy/issues/5668 https://github.com/cupy/cupy/issues/4866 https://github.com/cupy/cupy/issues/1389
This error also happens consistently on at least one different platform, hardware and several variations of environment.
Ubuntu 18.04 and 20.04 RTX8000 and A5000 Laptop Cuda: 11.4, 11.7, 11.8 nvida driver: 470, 525, 530
Thanks for developing cupy!
About this issue
- Original URL
- State: open
- Created a year ago
- Reactions: 1
- Comments: 16 (6 by maintainers)
Without the ‘asarray’ entirely it works up to the memory limit!
The indexing via an array seems to be the trigger…and can be worked around, I will check some more.
I mean that my question is stupid and just to double check that the values used as indexes are correct 😃, I didn’t mean that the issue was stupid! (Sorry for creating confusion here)
Could you rerun your code with
CUDA_LAUNCH_BLOCKING=1environment variable set to spot which line is causing the problem?