cmssw: Unable to choose current device because CUDAService is disabled.
In CMSSW_11_3_X_2020-12-08-2300, we are getting in wf 136.885522, 136.888522, 10824.522, 11634.522
----- Begin Fatal Exception 09-Dec-2020 10:02:40 CET-----------------------
An exception of category 'CUDAError' occurred while
[0] Processing Event run: 320822 lumi: 40 event: 64112784 stream: 2
[1] Running path 'dqmoffline_step'
[2] Prefetching for module RecHitTask/'recHitPreRecoTask'
[3] Prefetching for module HcalCPURecHitsProducer/'hbheprereco'
[4] Prefetching for module HBHERecHitProducerGPU/'hbheRecHitProducerGPU'
[5] Calling method for module HcalDigisProducerGPU/'hcalDigisGPU'
[6] Calling cms::cuda::chooseDevice()
Exception Message:
Unable to choose current device because CUDAService is disabled. If CUDAService was not explicitly
disabled in the configuration, the probable cause is that there is no GPU or there is some problem
in the CUDA runtime or drivers.
----- End Fatal Exception -------------------------------------------------
It seems related to #31720 . It sounds like a kind of expected error due to missing GPU in the IB test machines.
About this issue
- Original URL
- State: open
- Created 4 years ago
- Comments: 26 (26 by maintainers)
A possible straightforward way to implement
3.would be to define a separate-wset for GPU-only workflows. Then the GPU-only workflows would not be run by default, and it would be easy to run only them withrunTheMatrix.py -w gpu.@silviodonato , I am working on improving GPU PR tests. Currently when we enable GPU tests then bot runs two jobs
https://github.com/cms-sw/cms-bot/pull/1459 should allow to run GPU tests as a additional test within the standard PR test. This will avoid the compilation of externals and cmssw on GPU machines. Once cms-bot changes are merged then I can include
-w gpufor GPU relvals tests.About IBs tests, I will add an extra GPU relval tests which will run runTheMatrix with
-w gpuoption.