cupy: v7.4.0 cupy/cuda/driver.pyx error line 118

Hi,

I’m working in conda envs with conda installs. Hit a snag upgrading from cupy 6.0.0 to 7.4.0 with rapidsai.

The MRE runs in cupy 6.0.0 and crashes in 7.4.0 with this error:

Exception ignored in: 'cupy.cuda.function.Module.__dealloc__'
Traceback (most recent call last):
  File "cupy/cuda/driver.pyx", line 247, in cupy.cuda.driver.moduleUnload
  File "cupy/cuda/driver.pyx", line 118, in cupy.cuda.driver.check_status
TypeError: 'NoneType' object is not callable

In Jupyter Notebook the MRE errors one line later:

CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered

The complete stack trace is in the attached notebook along with additional system and device specs.

Great tool box, thanks.

Tom


cupy740_crash_mre.ipynb.pdf

# MRE
import numpy as np
import cupy as cp

MB = 1024**2
cp.cuda.Device(3).use()

free, total = cp.cuda.Device(3).mem_info
print(f"MB free {free / MB :.0f} total {total / MB :.0f}")

# ok on these ...
# n, p, g = 250, 5, 10
# n, p, g = 2500, 25, 1000

# errors on these
n, p, g = 25000, 250, 10000

yg = np.random.rand(n, g).astype("float32")
X = np.random.rand(n, p).astype("float32")


ygd = cp.asarray(yg)
Xd = cp.asarray(X)
print(f"MB matrices: {(ygd.nbytes + Xd.nbytes) / MB :.0f}")
assert ygd.nbytes + Xd.nbytes < free 

Qd, Rd = cp.linalg.qr(Xd)
bhatsd = cp.linalg.solve(Rd, Qd.T @ ygd)
yhatsd = Xd @ bhatsd  # jupyter gets past this line

# ed = yhatsd - ygd   # jupyter errors on this line

[sandbox]$ conda activate cupy (cupy) [sandbox]$ python --version; python -c “import cupy; cupy.show_config()”; python cupy740_crash_mre.py Python 3.7.7 CuPy Version : 6.0.0 CUDA Root : /usr/local/cuda-8.0 CUDA Build Version : 10000 CUDA Driver Version : 10020 CUDA Runtime Version : 10000 cuDNN Build Version : 7301 cuDNN Version : 7605 NCCL Build Version : 1000 NCCL Runtime Version : (unknown) MB free 12039 total 12196 MB matrices: 978 (cupy) [sandbox]$ conda deactivate [sandbox]$ conda activate rapidsai37 (rapidsai37) [sandbox]$ python --version; python -c “import cupy; cupy.show_config()”; python cupy740_crash_mre.py Python 3.7.6 CuPy Version : 7.4.0 CUDA Root : /home/turbach/.conda/envs/rapidsai37 CUDA Build Version : 10020 CUDA Driver Version : 10020 CUDA Runtime Version : 10020 cuBLAS Version : 10202 cuFFT Version : 10102 cuRAND Version : 10102 cuSOLVER Version : (10, 3, 0) cuSPARSE Version : 10301 NVRTC Version : (10, 2) cuDNN Build Version : 7605 cuDNN Version : 7605 NCCL Build Version : 2406 NCCL Runtime Version : 2507 MB free 12027 total 12196 MB matrices: 978 Traceback (most recent call last): File “cupy/cuda/driver.pyx”, line 247, in cupy.cuda.driver.moduleUnload File “cupy/cuda/driver.pyx”, line 118, in cupy.cuda.driver.check_status TypeError: ‘NoneType’ object is not callable Exception ignored in: ‘cupy.cuda.function.Module.dealloc’ Traceback (most recent call last): File “cupy/cuda/driver.pyx”, line 247, in cupy.cuda.driver.moduleUnload File “cupy/cuda/driver.pyx”, line 118, in cupy.cuda.driver.check_status TypeError: ‘NoneType’ object is not callable (rapidsai37) [sandbox]$

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 16 (14 by maintainers)

Most upvoted comments

#3331 fix this bug, it was an error in the implementation of linalg.solve causing a memory corruption.

I can’t reproduce wtih 10.2

CuPy Version          : 8.0.0b2
CUDA Root             : /usr/local/cuda
CUDA Build Version    : 10020
CUDA Driver Version   : 10020
CUDA Runtime Version  : 10020
cuBLAS Version        : 10202
cuFFT Version         : 10102
cuRAND Version        : 10102
cuSOLVER Version      : (10, 3, 0)
cuSPARSE Version      : 10301
NVRTC Version         : (10, 2)
cuDNN Build Version   : 7500
cuDNN Version         : 7500
NCCL Build Version    : None
NCCL Runtime Version  : None