cmssw: [GPU] Multiple RelVals failing with memory allocation error
Hello,
There are multiple RelVals failing with the following exception in GPU
IBs:
----- Begin Fatal Exception 05-Feb-2024 04:19:50 CET-----------------------
An exception of category 'StdException' occurred while
[0] Processing Event run: 366727 lumi: 89 event: 131642946 stream: 3
[1] Running path 'MC_Run3_PFScoutingPixelTracking_v22'
[2] Calling method for module HBHERecHitProducerGPU/'hltHbherecoGPU'
Exception Message:
A std::exception was thrown.
/data/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/5569e690981e3c5d49d7743adaadedca/opt/cmssw/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_GPU_X_2024-02-04-2300/src/HeterogeneousCore/CUDAUtilities/src/CachingDeviceAllocator.h, line 489:
cudaCheck(error = cudaMalloc(&search_key.d_ptr, search_key.bytes));
cudaErrorMemoryAllocation: out of memory
----- End Fatal Exception -------------------------------------------------
It seems caused by modifications in https://github.com/cms-sw/cmssw/pull/43804.
FYI, @iarspider
Thanks, Andrea
About this issue
- Original URL
- State: open
- Created 5 months ago
- Comments: 21 (20 by maintainers)
type tracking (even though the association is not strong; it does look like related to the pixel tracking Alpaka migration)
assign heterogeneous, hlt