numba: Running only CUDA tests results in strange crashes / failures

Reporting a bug

I am using the latest released version of Numba (most recent is visible in the change log (https://github.com/numba/numba/blob/master/CHANGE_LOG).
I have included below a minimal working reproducer (if you are unsure how to write one see http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports).

Running the CUDA testsuite only seems to result in some surprising results. For example, on my system with a V100, running:

python -m numba.runtests numba.cuda.tests -v

eventually dies with:

test_broadcast (numba.cuda.tests.cudadrv.test_cuda_array_slicing.CudaArraySetting) ... 
Fatal Python error: Segmentation fault

Current thread 0x00007fb059d49700 (most recent call first):
  File "/home/nfs/gmarkall/numbadev/numba/numba/cuda/cudadrv/driver.py", line 293 in safe_cuda_api_call
  File "/home/nfs/gmarkall/numbadev/numba/numba/cuda/cudadrv/driver.py", line 1622 in launch_kernel
  File "/home/nfs/gmarkall/numbadev/numba/numba/cuda/cudadrv/driver.py", line 1578 in __call__
  File "/home/nfs/gmarkall/numbadev/numba/numba/cuda/compiler.py", line 614 in _kernel_call
  File "/home/nfs/gmarkall/numbadev/numba/numba/cuda/compiler.py", line 540 in __call__
  File "/home/nfs/gmarkall/numbadev/numba/numba/cuda/compiler.py", line 278 in __call__
  File "/home/nfs/gmarkall/numbadev/numba/numba/cuda/cudadrv/devicearray.py", line 581 in _do_setitem
  File "/home/nfs/gmarkall/numbadev/numba/numba/cuda/cudadrv/devicearray.py", line 530 in __setitem__
  File "/home/nfs/gmarkall/numbadev/numba/numba/cuda/cudadrv/devices.py", line 225 in _require_cuda_context
  File "/home/nfs/gmarkall/numbadev/numba/numba/cuda/tests/cudadrv/test_cuda_array_slicing.py", line 209 in test_broadcast
  File "/home/nfs/gmarkall/miniconda3/envs/numbaenv/lib/python3.7/unittest/case.py", line 628 in run
  File "/home/nfs/gmarkall/miniconda3/envs/numbaenv/lib/python3.7/unittest/case.py", line 676 in __call__
  File "/home/nfs/gmarkall/miniconda3/envs/numbaenv/lib/python3.7/unittest/suite.py", line 122 in run
  File "/home/nfs/gmarkall/miniconda3/envs/numbaenv/lib/python3.7/unittest/suite.py", line 84 in __call__
  File "/home/nfs/gmarkall/miniconda3/envs/numbaenv/lib/python3.7/unittest/runner.py", line 176 in run
  File "/home/nfs/gmarkall/numbadev/numba/numba/testing/main.py", line 123 in run
  File "/home/nfs/gmarkall/miniconda3/envs/numbaenv/lib/python3.7/unittest/main.py", line 271 in runTests
  File "/home/nfs/gmarkall/numbadev/numba/numba/testing/main.py", line 354 in run_tests_real
  File "/home/nfs/gmarkall/numbadev/numba/numba/testing/main.py", line 369 in runTests
  File "/home/nfs/gmarkall/miniconda3/envs/numbaenv/lib/python3.7/unittest/main.py", line 101 in __init__
  File "/home/nfs/gmarkall/numbadev/numba/numba/testing/main.py", line 163 in __init__
  File "/home/nfs/gmarkall/numbadev/numba/numba/testing/__init__.py", line 75 in run_tests
  File "/home/nfs/gmarkall/numbadev/numba/numba/_runtests.py", line 28 in _main
  File "/home/nfs/gmarkall/numbadev/numba/numba/runtests.py", line 9 in <module>
  File "/home/nfs/gmarkall/miniconda3/envs/numbaenv/lib/python3.7/runpy.py", line 85 in _run_code
  File "/home/nfs/gmarkall/miniconda3/envs/numbaenv/lib/python3.7/runpy.py", line 193 in _run_module_as_main
Segmentation fault

The same on @stuartarchibald’s machine with a GTX 750 Ti doesn’t segfault, but has the following failure:

numba/cuda/tests/cudadrv/test_linker.py", line 93, in test_set_registers_57 self.assertEquals(57, compiled._func.get().attrs.regs) AssertionError: 57 != 56

However, the testsuite as a whole has no failures when I run it like:

python -m numba.runtests -m

The test that segfaults for me is getting discovered in this case - it shows up when listing tests with -l - so I don’t think it’s being omitted when running the whole testsuite.

Some initial investigation suggests that some CUDA API calls fail undetected, leading to launch failures later - in one case, computing the threads per block somehow results in a thread count of 0, which a kernel is then configured with and subsequently fails to launch. However, I cannot reliably reproduce this - I’m now getting failures with nonzero threads per block.

I plan to continue investigating this some more. Some other thoughts / hypotheses:

Perhaps something related to the CUDATestCase class, which is used for a few of the tests, and resets the CUDA context on teardown may be involved - perhaps either resetting the context at an inappropriate time, or perhaps there’s a test class that needs to be a CUDATestCase and is presently only a unittest.TestCase + SerialMixin. I’m not sure of the exact rationale behind / rules for using CUDATestCase as opposed to just the SerialMixin for CUDA tests - can anyone shed any light on this?
Some memory corruption may be involved - I’m presently looking for a way to run valgrind and also hit the error in a way that isn’t too painful to do.

About this issue

Original URL
State: closed
Created 5 years ago
Comments: 20 (20 by maintainers)

Commits related to this issue

Fix OOB write in test_round{f4,f8} The typing of the parameter (int64[:]_ vs. the type of the array that was passed in (np.int32) resulted in 8 bytes being written to a 4-byte allocation. This was ... — committed to gmarkall/numba by gmarkall 4 years ago
Fix OOB write in test_round{f4,f8} The typing of the parameter (int64[:]_ vs. the type of the array that was passed in (np.int32) resulted in 8 bytes being written to a 4-byte allocation. This was ... — committed to gmarkall/numba by gmarkall 4 years ago
Merge pull request #5400 from gmarkall/grm-issue-4954 Fix #4954, and some other small CUDA testsuite fixes — committed to numba/numba by sklam 4 years ago

Most upvoted comments

More things:

test_broadcast (numba.cuda.tests.cudadrv.test_cuda_array_slicing.CudaArraySetting) ... <path>/numba/numba/cuda/compiler.py:272: RuntimeWarning: divide by zero encountered in long_scalars
  blkct = (self.ntasks + tpbm1) // tpb

Program received signal SIGSEGV, Segmentation fault.
0x00007fffdba8de67 in ?? () from /lib64/libcuda.so
(gdb) bt
#0  0x00007fffdba8de67 in ?? () from /lib64/libcuda.so
#1  0x00007fffdb9a9164 in ?? () from /lib64/libcuda.so
#2  0x00007fffdb9a941e in ?? () from /lib64/libcuda.so
#3  0x00007fffdbb57e22 in cuLaunchKernel () from /lib64/libcuda.so
#4  0x00007ffff0206ec0 in ffi_call_unix64 ()
   from <envpath>/lib/python3.8/lib-dynload/../../libffi.so.6
#5  0x00007ffff020687d in ffi_call ()
   from <envpath>/lib/python3.8/lib-dynload/../../libffi.so.6
#6  0x00007ffff041cd30 in _call_function_pointer (argcount=11, resmem=0x7fffffff7ad0, restype=<optimized out>, 
    atypes=0x7fffffff79f0, avalues=0x7fffffff7a60, pProc=0x7fffdbb57d70 <cuLaunchKernel>, flags=4353)
    at /usr/local/src/conda/python-3.8.1/Modules/_ctypes/callproc.c:871
#7  _ctypes_callproc () at /usr/local/src/conda/python-3.8.1/Modules/_ctypes/callproc.c:1199
#8  0x00007ffff041d765 in PyCFuncPtr_call () at /usr/local/src/conda/python-3.8.1/Modules/_ctypes/_ctypes.c:4181
#9  0x000055555569a3a0 in PyObject_Call () at /tmp/build/80754af9/python_1578519979997/work/Objects/call.c:245
#10 0x0000555555739729 in do_call_core (kwdict=0x0, callargs=0x7fffd8154c40, func=0x7fffd8e9ef40, tstate=<optimized out>)
    at /tmp/build/80754af9/python_1578519979997/work/Python/ceval.c:5034


(gdb) info registers 
rax            0x5bd801c00ff70005       6618041576834138117
rbx            0x555556eb0c70   93825018825840
rcx            0x0      0
rdx            0x0      0
rsi            0x0      0
rdi            0x5555562fe200   93825006559744
rbp            0x0      0x0
rsp            0x7fffffff75c0   0x7fffffff75c0
r8             0x0      0
r9             0x1      1
r10            0x7fffdca9a5c0   140736895493568
r11            0x7fffdbb57d70   140736879492464
r12            0x7fffffff7650   140737488320080
r13            0x1      1
r14            0x0      0
r15            0x1      1
rip            0x7fffdba8de67   0x7fffdba8de67
eflags         0x10202  [ IF RF ]
cs             0x33     51
ss             0x2b     43
ds             0x0      0
es             0x0      0
fs             0x0      0
gs             0x0      0

stuartarchibald on Mar 12, 2020

I’m seeing this too:

$ python -m numba.runtests numba.cuda.tests -v
test_contigous_2d (numba.cuda.tests.cudadrv.test_array_attr.TestArrayAttr) ... ok
test_contigous_3d (numba.cuda.tests.cudadrv.test_array_attr.TestArrayAttr) ... ok
test_contigous_4d (numba.cuda.tests.cudadrv.test_array_attr.TestArrayAttr) ... ok
test_ravel_c (numba.cuda.tests.cudadrv.test_array_attr.TestArrayAttr) ... ok
test_ravel_f (numba.cuda.tests.cudadrv.test_array_attr.TestArrayAttr) ... ok
test_reshape_c (numba.cuda.tests.cudadrv.test_array_attr.TestArrayAttr) ... ok
test_reshape_f (numba.cuda.tests.cudadrv.test_array_attr.TestArrayAttr) ... ok
test_transpose (numba.cuda.tests.cudapy.test_transpose.Test) ... ok
test_transpose_bool (numba.cuda.tests.cudapy.test_transpose.Test) ... ok
test_transpose_record (numba.cuda.tests.cudapy.test_transpose.Test) ... ok
test_transpose_view (numba.cuda.tests.cudapy.test_transpose.Test) ... ok
test_accepted_context_switch (numba.cuda.tests.cudadrv.test_context_stack.TestContextAPI) ... skipped 'need more than 1 gpus'
test_array_assign_all (numba.cuda.tests.cudadrv.test_cuda_array_slicing.CudaArraySetting) ... ok
test_array_assign_column (numba.cuda.tests.cudadrv.test_cuda_array_slicing.CudaArraySetting) ... ok
test_array_assign_deep_subarray (numba.cuda.tests.cudadrv.test_cuda_array_slicing.CudaArraySetting) ... ok
test_array_assign_row (numba.cuda.tests.cudadrv.test_cuda_array_slicing.CudaArraySetting) ... ok
test_array_assign_subarray (numba.cuda.tests.cudadrv.test_cuda_array_slicing.CudaArraySetting) ... ok
test_attached_non_primary (numba.cuda.tests.cudadrv.test_context_stack.Test3rdPartyContext) ... ok
test_attached_primary (numba.cuda.tests.cudadrv.test_context_stack.Test3rdPartyContext) ... ok
test_auto_context (numba.cuda.tests.cudadrv.test_cuda_auto_context.TestCudaAutoContext)
A problem was revealed by a customer that the use cuda.to_device ... ok
test_auto_device (numba.cuda.tests.cudadrv.test_cuda_devicerecord.TestCudaDeviceRecord) ... ok
test_auto_device (numba.cuda.tests.cudadrv.test_cuda_devicerecord.TestCudaDeviceRecordWithRecord) ... ok
test_broadcast (numba.cuda.tests.cudadrv.test_cuda_array_slicing.CudaArraySetting) ... ERROR
test_context_memory (numba.cuda.tests.cudadrv.test_context_stack.TestContextAPI) ... ok
test_cuda_driver_basic (numba.cuda.tests.cudadrv.test_cuda_driver.TestCudaDriver) ... ok
test_cuda_driver_default_stream (numba.cuda.tests.cudadrv.test_cuda_driver.TestCudaDriver) ... ok
test_cuda_driver_occupancy (numba.cuda.tests.cudadrv.test_cuda_driver.TestCudaDriver) ... active blocks: 16
grid size: 80 , block size: 128
ok
test_cuda_driver_stream (numba.cuda.tests.cudadrv.test_cuda_driver.TestCudaDriver) ... ok
test_cuda_driver_stream_operations (numba.cuda.tests.cudadrv.test_cuda_driver.TestCudaDriver) ... ok
test_cudajit_in_attached_primary_context (numba.cuda.tests.cudadrv.test_context_stack.Test3rdPartyContext) ... ok
test_device_record_copy (numba.cuda.tests.cudadrv.test_cuda_devicerecord.TestCudaDeviceRecord) ... ok
test_device_record_copy (numba.cuda.tests.cudadrv.test_cuda_devicerecord.TestCudaDeviceRecordWithRecord) ... ok
test_device_record_interface (numba.cuda.tests.cudadrv.test_cuda_devicerecord.TestCudaDeviceRecord) ... ok
test_device_record_interface (numba.cuda.tests.cudadrv.test_cuda_devicerecord.TestCudaDeviceRecordWithRecord) ... ok
test_empty_slice_1d (numba.cuda.tests.cudadrv.test_cuda_array_slicing.CudaArraySlicing) ... ok
test_empty_slice_2d (numba.cuda.tests.cudadrv.test_cuda_array_slicing.CudaArraySlicing) ... ok
test_forbidden_context_switch (numba.cuda.tests.cudadrv.test_context_stack.TestContextAPI) ... skipped 'need more than 1 gpus'
test_from_record_like (numba.cuda.tests.cudadrv.test_cuda_devicerecord.TestCudaDeviceRecord) ... ok
test_from_record_like (numba.cuda.tests.cudadrv.test_cuda_devicerecord.TestCudaDeviceRecordWithRecord) ... ok
test_gpus_current (numba.cuda.tests.cudadrv.test_context_stack.TestContextStack) ... ok
test_gpus_iter (numba.cuda.tests.cudadrv.test_context_stack.TestContextStack) ... ok
test_gpus_len (numba.cuda.tests.cudadrv.test_context_stack.TestContextStack) ... ok
test_incompatible_highdim (numba.cuda.tests.cudadrv.test_cuda_array_slicing.CudaArraySetting) ... ok
test_incompatible_shape (numba.cuda.tests.cudadrv.test_cuda_array_slicing.CudaArraySetting) ... ok
test_index_1d (numba.cuda.tests.cudadrv.test_cuda_array_slicing.CudaArrayIndexing) ... ok
test_index_2d (numba.cuda.tests.cudadrv.test_cuda_array_slicing.CudaArrayIndexing) ... ok
test_index_3d (numba.cuda.tests.cudadrv.test_cuda_array_slicing.CudaArrayIndexing) ... ok
test_negative_slicing_1d (numba.cuda.tests.cudadrv.test_cuda_array_slicing.CudaArraySlicing) ... ok
test_negative_slicing_2d (numba.cuda.tests.cudadrv.test_cuda_array_slicing.CudaArraySlicing) ... ok
test_prefix_1d (numba.cuda.tests.cudadrv.test_cuda_array_slicing.CudaArraySlicing) ... ok
test_prefix_2d (numba.cuda.tests.cudadrv.test_cuda_array_slicing.CudaArraySlicing) ... ok
test_prefix_select (numba.cuda.tests.cudadrv.test_cuda_array_slicing.CudaArraySlicing) ... ok
test_rank (numba.cuda.tests.cudadrv.test_cuda_array_slicing.CudaArraySetting) ... Fatal Python error: Segmentation fault

Current thread 0x00007f44ec064740 (most recent call first):
  File "<path>/numba/numba/cuda/cudadrv/driver.py", line 292 in safe_cuda_api_call
  File "<path>/numba/numba/cuda/cudadrv/driver.py", line 1622 in launch_kernel
  File "<path>/numba/numba/cuda/cudadrv/driver.py", line 1583 in __call__
  File "<path>/numba/numba/cuda/compiler.py", line 597 in _kernel_call
  File "<path>/numba/numba/cuda/compiler.py", line 519 in __call__
  File "<path>/numba/numba/cuda/compiler.py", line 274 in __call__
  File "<path>/numba/numba/cuda/cudadrv/devicearray.py", line 625 in _do_setitem
  File "<path>/numba/numba/cuda/cudadrv/devicearray.py", line 574 in __setitem__
  File "<path>/numba/numba/cuda/cudadrv/devices.py", line 224 in _require_cuda_context
  File "<path>/numba/numba/cuda/tests/cudadrv/test_cuda_array_slicing.py", line 225 in test_rank
  File "<env>/envs/numba_latest/lib/python3.8/unittest/case.py", line 633 in _callTestMethod
  File "<env>/envs/numba_latest/lib/python3.8/unittest/case.py", line 676 in run
  File "<env>/envs/numba_latest/lib/python3.8/unittest/case.py", line 736 in __call__
  File "<env>/envs/numba_latest/lib/python3.8/unittest/suite.py", line 122 in run
  File "<env>/envs/numba_latest/lib/python3.8/unittest/suite.py", line 84 in __call__
  File "<env>/envs/numba_latest/lib/python3.8/unittest/runner.py", line 176 in run
  File "<path>/numba/numba/testing/main.py", line 120 in run
  File "<env>/envs/numba_latest/lib/python3.8/unittest/main.py", line 271 in runTests
  File "<path>/numba/numba/testing/main.py", line 306 in run_tests_real
  File "<path>/numba/numba/testing/main.py", line 321 in runTests
  File "<env>/envs/numba_latest/lib/python3.8/unittest/main.py", line 101 in __init__
  File "<path>/numba/numba/testing/main.py", line 160 in __init__
  File "<path>/numba/numba/testing/__init__.py", line 67 in run_tests
  File "<path>/numba/numba/testing/_runtests.py", line 25 in _main
  File "<path>/numba/numba/runtests.py", line 9 in <module>
  File "<env>/envs/numba_latest/lib/python3.8/runpy.py", line 86 in _run_code
  File "<env>/envs/numba_latest/lib/python3.8/runpy.py", line 193 in _run_module_as_main
Segmentation fault

Env things:

| NVIDIA-SMI 440.64       Driver Version: 440.64       CUDA Version: 10.2     |
|   0  GeForce GTX 750 Ti  Off  | 00000000:01:00.0  On |                  N/A |

stuartarchibald on Mar 12, 2020