cmssw: Unable to choose current device because CUDAService is disabled.

In CMSSW_11_3_X_2020-12-08-2300, we are getting in wf 136.885522, 136.888522, 10824.522, 11634.522

----- Begin Fatal Exception 09-Dec-2020 10:02:40 CET-----------------------
An exception of category 'CUDAError' occurred while
   [0] Processing  Event run: 320822 lumi: 40 event: 64112784 stream: 2
   [1] Running path 'dqmoffline_step'
   [2] Prefetching for module RecHitTask/'recHitPreRecoTask'
   [3] Prefetching for module HcalCPURecHitsProducer/'hbheprereco'
   [4] Prefetching for module HBHERecHitProducerGPU/'hbheRecHitProducerGPU'
   [5] Calling method for module HcalDigisProducerGPU/'hcalDigisGPU'
   [6] Calling cms::cuda::chooseDevice()
Exception Message:
Unable to choose current device because CUDAService is disabled. If CUDAService was not explicitly
disabled in the configuration, the probable cause is that there is no GPU or there is some problem
in the CUDA runtime or drivers.
----- End Fatal Exception -------------------------------------------------

It seems related to #31720 . It sounds like a kind of expected error due to missing GPU in the IB test machines.

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Comments: 26 (26 by maintainers)

Most upvoted comments

A possible straightforward way to implement 3. would be to define a separate -w set for GPU-only workflows. Then the GPU-only workflows would not be run by default, and it would be easy to run only them with runTheMatrix.py -w gpu.

@silviodonato , I am working on improving GPU PR tests. Currently when we enable GPU tests then bot runs two jobs

  1. Run standard PR tests (compilation of externals cmssw and run unit tests, addon tests and relvals )
  2. Run special PR tests ( compilation of externals, cmssw and run gpu relvals)

https://github.com/cms-sw/cms-bot/pull/1459 should allow to run GPU tests as a additional test within the standard PR test. This will avoid the compilation of externals and cmssw on GPU machines. Once cms-bot changes are merged then I can include -w gpu for GPU relvals tests.

About IBs tests, I will add an extra GPU relval tests which will run runTheMatrix with -w gpu option.