cupy: Get segmentation fault with FFT callback function

pr.py

import numpy as np
import cupy as cp

code = r'''
__device__ cufftComplex CB_ConvertInput(
    void *dataIn,
    size_t offset,
    void *callerInfo,
    void *sharedPtr) {
            return make_float2(0.0, 0.1);
}

__device__ cufftCallbackLoadC d_loadCallbackPtr = CB_ConvertInput;
'''

in_arr = cp.random.rand(12, dtype=np.float32)

with cp.fft.config.set_cufft_callbacks(cb_load=code):
    out_arr = cp.fft.fft(in_arr)

print(out_arr)

Run with

CUDA_VISIBLE_DEVICES=0 python pr.py

Then I got

[1]    410392 segmentation fault (core dumped)  CUDA_VISIBLE_DEVICES=0 python pr.py

My system information:

OS                           : Linux-5.12.8-arch1-1-x86_64-with-glibc2.33
Python Version               : 3.9.5
CuPy Version                 : 9.1.0
CuPy Platform                : NVIDIA CUDA
NumPy Version                : 1.20.0
SciPy Version                : 1.6.3
Cython Build Version         : 0.29.23
Cython Runtime Version       : 0.29.23
CUDA Root                    : /opt/cuda
nvcc PATH                    : /opt/cuda/bin/nvcc
CUDA Build Version           : 11030
CUDA Driver Version          : 11030
CUDA Runtime Version         : 11030
cuBLAS Version               : 11402
cuFFT Version                : 10402
cuRAND Version               : 10204
cuSOLVER Version             : (11, 1, 1)
cuSPARSE Version             : 11500
NVRTC Version                : (11, 3)
Thrust Version               : 101100
CUB Build Version            : 101100
Jitify Build Version         : 60e9e72
cuDNN Build Version          : 8200
cuDNN Version                : 8200
NCCL Build Version           : 20906
NCCL Runtime Version         : 20906
cuTENSOR Version             : None
cuSPARSELt Build Version     : None
Device 0 Name                : NVIDIA GeForce GTX 1080
Device 0 Compute Capability  : 61
Device 0 PCI Bus ID          : 0000:17:00.0
Device 1 Name                : NVIDIA Quadro K600
Device 1 Compute Capability  : 30
Device 1 PCI Bus ID          : 0000:65:00.0

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 1
  • Comments: 23 (12 by maintainers)

Most upvoted comments

@leofang My system is archlinux. The cupy is installed from archlinuxcn/python-cupy. After deleting .cupy folder, I got a warning before the segmentation fault.

nvprune warning : No device code that matched architecture, so stripped out all device code

@leofang Thanks, everything works now on the 980 workstation.

@leofang I tested both my 1080 machines. And the new patch (cf78fbf86eec6a8a4baef2d5628bb96130c4879) works on both. Cheers. But I cannot compile cupy from source on my 980 machine. It complains that cudnn.h is missing. The dependency on cudnn is optional, isn’t it?

Sure. This is my output.

# make
>>> GCC Version is greater or equal to  5.3.0 <<<
/opt/cuda/bin/nvcc -ccbin g++ -I../../common/inc -m64 -dc -std=c++11 --threads 0 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_86,code=compute_86 -o simpleCUFFT_callback.o -c simpleCUFFT_callback.cu
nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
/opt/cuda/bin/nvcc -ccbin g++ -m64 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_86,code=compute_86 -o simpleCUFFT_callback simpleCUFFT_callback.o -L/usr/lib/x86_64-linux-gnu -L/usr/lib64 -lcufft_static -lculibos
nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
mkdir -p ../../bin/x86_64/linux/release
cp simpleCUFFT_callback ../../bin/x86_64/linux/release

# ./simpleCUFFT_callback
[simpleCUFFT_callback] is starting...
GPU Device 0: "Pascal" with compute capability 6.1

Transforming signal cufftExecC2C
Transforming signal back cufftExecC2C

Let me clarify. This is an another NVIDIA 1080 card. I installed cupy with pip install cupy. The version of cupy is 9.1.0, without your patch. This is my test code.

import numpy as np
import cupy as cp

code = r'''
__device__ cufftComplex CB_ConvertInput(
    void *dataIn,
    size_t offset,
    void *callerInfo,
    void *sharedPtr) {
            return make_float2(1.0, 0.0);
}

__device__ cufftCallbackLoadC d_loadCallbackPtr = CB_ConvertInput;
'''

in_arr = cp.random.rand(12, dtype=np.float32)

with cp.fft.config.set_cufft_callbacks(cb_load=code):
    out_arr = cp.fft.fft(in_arr)

print(cp.__version__)
print(out_arr)
print(cp.fft.fft(cp.ones(12)))

This is the output.

/home/szsdk/anaconda3/lib/python3.8/site-packages/cupy/fft/_fft.py:149: UserWarning: cuFFT plan cache is disabled on CUDA 11.1 due to a known bug, so performance may be degraded. The bug is fixed on CUDA 11.2+.
  cache = get_plan_cache()
9.1.0
[0.+0.j 0.+0.j 0.+0.j 0.+0.j 0.+0.j 0.+0.j 0.+0.j 0.+0.j 0.+0.j 0.+0.j
 0.+0.j 0.+0.j]
[12.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j
  0.+0.j  0.+0.j  0.+0.j]

I do not think I get the correct answer. From my understand, the answer should be the last line. I am new to this callback function. Please correct me if I am wrong.

@leofang I have access to another NVIDIA 1080 machine. I get the same warning (cupy 9.10), but not segfault.

Thanks, @szsdk. I will send a patch to fix this.

On CUDA 11.3:

import os
import sys
import subprocess
import cupy as cp


sm_list = cp.cuda.nvrtc.getSupportedArchs()
print(sm_list)
for i in sm_list:
    print(f"on sm_{i}...")
    #p = subprocess.run(['echo', str(i)])
    p = subprocess.run(['nvprune', '-arch', f'sm_{i}', '/usr/local/cuda-11.3/lib64/libcufft_static.a', '-o', 'temp.a'], env=os.environ)
    if p.stderr:
        print(p.stderr)
    else:
        print("OK")

Output:

(35, 37, 50, 52, 53, 60, 61, 62, 70, 72, 75, 80, 86)
on sm_35...
OK
on sm_37...
nvprune warning : No device code that matched architecture, so stripped out all device code
OK
on sm_50...
OK
on sm_52...
nvprune warning : No device code that matched architecture, so stripped out all device code
OK
on sm_53...
nvprune warning : No device code that matched architecture, so stripped out all device code
OK
on sm_60...
OK
on sm_61...
nvprune warning : No device code that matched architecture, so stripped out all device code
OK
on sm_62...
nvprune warning : No device code that matched architecture, so stripped out all device code
OK
on sm_70...
OK
on sm_72...
nvprune warning : No device code that matched architecture, so stripped out all device code
OK
on sm_75...
OK
on sm_80...
OK
on sm_86...
OK