mpi4py: cuda tests fail when CUDA is available but not configured

I’m testing the build of the new release 3.1.1.

All tests accessing cuda are failing. This is not entirely surprising in itself. My system has nvidia drivers available and has a switchable nvidia card accessible via bumblebee (primusrun). But I have not specifically configured my system to execute CUDA. So it’s not surprising that CUDA_ERROR_NO_DEVICE is found. For me the nvidia card that I have at hand is for experimentation, not for routine operation. The main video card is intel.

What’s the best way to handle this situation? How can a non-CUDA build be enforced when CUDA is otherwise “available”.

An example test log is:

ERROR: testAllgather (test_cco_buf.TestCCOBufInplaceSelf)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/projects/python/build/mpi4py/test/test_cco_buf.py", line 382, in testAllgather
    buf = array(-1, typecode, (size, count))
  File "/projects/python/build/mpi4py/test/arrayimpl.py", line 459, in __init__
    self.array = numba.cuda.device_array(shape, typecode)
  File "/usr/lib/python3/dist-packages/numba/cuda/cudadrv/devices.py", line 223, in _require_cuda_context
    with _runtime.ensure_context():
  File "/usr/lib/python3.9/contextlib.py", line 117, in __enter__
    return next(self.gen)
  File "/usr/lib/python3/dist-packages/numba/cuda/cudadrv/devices.py", line 121, in ensure_context
    with driver.get_active_context():
  File "/usr/lib/python3/dist-packages/numba/cuda/cudadrv/driver.py", line 393, in __enter__
    driver.cuCtxGetCurrent(byref(hctx))
  File "/usr/lib/python3/dist-packages/numba/cuda/cudadrv/driver.py", line 280, in __getattr__
    self.initialize()
  File "/usr/lib/python3/dist-packages/numba/cuda/cudadrv/driver.py", line 240, in initialize
    raise CudaSupportError("Error at driver init: \n%s:" % e)
numba.cuda.cudadrv.error.CudaSupportError: Error at driver init:
[100] Call to cuInit results in CUDA_ERROR_NO_DEVICE:
-------------------- >> begin captured logging << --------------------
numba.cuda.cudadrv.driver: INFO: init
numba.cuda.cudadrv.driver: DEBUG: call driver api: cuInit
numba.cuda.cudadrv.driver: ERROR: Call to cuInit results in CUDA_ERROR_NO_DEVICE
--------------------- >> end captured logging << ---------------------

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 43 (43 by maintainers)

Most upvoted comments

It seems spawn trouble has been a long-running saga! I’ll deactivate them for now and check again later with future OpenMPI releases.

drew-parsons on Aug 19, 2021

@drew-parsons I guess you are using Open MPI, right? Dynamic process management has always been semi-broken. I would suggest to just disable these tests if they are giving trouble of behave erratically. Hopefully, things will be much better in upcoming release Open MPI 5.x, are mpi4py tests are passing .

dalcinl on Aug 15, 2021

@drew-parsons What’s the exact command line you are using to run these tests?

My guess is that you are using pytest or similar tool, and not our test/runtests.py script. So our “off-by-default” CuPy tests are being run, and they fail because of your broken environment. We have not foreseen your testing scenario, otherwise we would have provided an alternative mechanism to disable these tests (for example, some environment variable). These tests cannot be disabled by name, a common set of tests run with different “buffer providers”. Also remember that we keep our dependencies to a minimum (mpi4py does not even depend on NumPy!), and that includes testing, so we do not have at our hands all the features of specialized testing tools.

The only quick fix I can think of if asking you to run your testing on a virtual environment without CuPy installed. If you have alternative suggestions to provide (optional) support for other testing tools like pytest, we will happily incorporate them.

dalcinl on Aug 15, 2021