cmssw: HLT Farm crashes (PFRecHitSoAProducerHCAL@alpaka) when HCAL is out
The HLT farm got a lot of errors in run=379174 since HCAL was removed from the global run
The error is:
An exception of category 'StdException' occurred while
[0] Processing Event run: 379174 lumi: 1 event: 4626 stream: 12
[1] Running path 'DST_PFScouting_DatasetMuon_v1'
[2] Calling method for module PFRecHitSoAProducerHCAL@alpaka/'hltParticleFlowRecHitHBHESoA'
Exception Message:
A std::exception was thrown.
/data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_0_4-el8_amd64_gcc12/build/CMSSW_14_0_4-build/el8_amd64_gcc12/external/alpaka/1.1.0-c6af69ddd6f2ee5be4f2b069590bae19/include/alpaka/kernel/TaskKernelGpuUniformCudaHipRt.hpp(259) 'TApi::setDevice(queue.m_spQueueImpl->m_dev.getNativeHandle())' A previous API call (not this one) set the error : 'cudaErrorInvalidConfiguration': 'invalid configuration argument'!
I will add the recipes to reproduce this error as soon as the data from the run without HCAL is available.
In 379178 HCAL was added back and everything worked fine.
About this issue
- Original URL
- State: closed
- Created 3 months ago
- Comments: 23 (23 by maintainers)
@cmsbuild, please close
+heterogeneous
+hlt
CMSSW_14_0_5_patch1
(containing the fix) went online on Apr 16th 2024 (see e-log: http://cmsonline.cern.ch/cms-elog/1210531)Just have to make the fix a bit more elegant and I will get a branch together for further testing.
I am taking a look
@jsamudio FYI
type pf
@mzarucki as discussed elsewhere applying the same recipe as above, but adjusting the FED selection to exclude either Pixel or ECAL:
one can also produce data without those FEDs. Running the same test as above doesn’t produce a crash.
assign hlt, heterogeneous
@cms-sw/pf-l2 FYI