cmssw: HLT Farm crashes in run 378940

Report the large numbers of GPU-related HLT crashes yesterday night (elog)

  • Related to illegal memory access
  • Not fully understood as HLT menus were unchanged with respect to the previous runs

Here’s the recipe how to reproduce the crashes. (tested with CMSSW_14_0_4 on lxplus8-gpu)

cmsrel CMSSW_14_0_4
cd CMSSW_14_0_4/src
cmsenv

https_proxy=http://cmsproxy.cms:3128 hltConfigFromDB --runNumber 378940 > hlt_run378940.py
cat <<@EOF >> hlt_run378940.py
from EventFilter.Utilities.EvFDaqDirector_cfi import EvFDaqDirector as _EvFDaqDirector
process.EvFDaqDirector = _EvFDaqDirector.clone(
    buBaseDir = '/eos/cms/store/group/phys_muon/wjun/error_stream',
    runNumber = 378940
)
from EventFilter.Utilities.FedRawDataInputSource_cfi import source as _source
process.source = _source.clone(
    fileListMode = True,
    fileNames = (
        '/eos/cms/store/group/phys_muon/wjun/error_stream/run378940/run378940_ls0021_index000036_fu-c2b02-31-01_pid1363776.raw',
    )
)
process.options.wantSummary = True

process.options.numberOfThreads = 1
process.options.numberOfStreams = 0
@EOF

mkdir run378940
cmsRun hlt_run378940.py &> crash_run378940.log

@cms-sw/hlt-l2 FYI @cms-sw/heterogeneous-l2 FYI

About this issue

  • Original URL
  • State: open
  • Created 3 months ago
  • Comments: 24 (24 by maintainers)

Most upvoted comments

type pf

assign hlt, heterogeneous