cmssw: Crash in Tier0 replay of Run-2 data using 12_0_0

A crash was observed while running a Tier0 replay on Run-2 data (https://github.com/dmwm/T0/pull/4602, more info in this HN message) to test the PCL workflows and the new AlCaRecos: SiPixelCalSingleMuonLoose, SiPixelCalSingleMuonTight, TkAlDiMuonAndVertex.

The crash has been reported in https://hypernews.cern.ch/HyperNews/CMS/get/tier0-Ops/2261/1.html as

Current Modules:

Module: L1TTauOffline:l1tTauOfflineDQMEmu (crashed)
Module: METMonitor:PFMET110_PFMHT110_IDTight_METmonitoring
Module: LowPtGsfElectronSeedProducer:lowPtGsfElectronSeeds
Module: PATPackedCandidateProducer:packedPFCandidates
Module: HTMonitor:hltHT_HT650_DisplacedDijet60_Inclusive_Prommonitoring
Module: FastjetJetProducer:ak8PFJetsPuppi
Module: CandSecondaryVertexProducer:pfInclusiveSecondaryVertexFinderCvsLTagInfosPuppi
Module: CandSecondaryVertexProducer:pfInclusiveSecondaryVertexFinderTagInfos

A fatal system signal has occurred: segmentation violation
Segmentation fault (core dumped)

@tvami spotted the exact event for which this crash happens: Run 317696, Event 59331484, LumiSection 63 So the crash can be easily reproduced by doing:

cmsrel CMSSW_12_0_0
cd CMSSW_12_0_0/src
cmsenv
cp /afs/cern.ch/work/f/fbrivio/public/ALCA/replay_Run2data_PR4602/job_config.py .
cmsRun job_config.py

@makortel reported in https://hypernews.cern.ch/HyperNews/CMS/get/tier0-Ops/2261/1/1/1.html the stack trace of the crashing module:

#5  0x00002b4a0a4e17b2 in L1TTauOffline::getProbeTaus(edm::Event const&, 
edm::Handle<std::vector<reco::PFTau, std::allocator<reco::PFTau> > > 
const&, edm::Handle<std::vector<reco::Muon, std::allocator<reco::Muon> > 
 > const&, reco::Vertex const&) () from 
/cvmfs/cms.cern.ch/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_0_0/lib/slc7_amd64_gcc900/pluginDQMOfflineL1Trigger.so
#6  0x00002b4a0a4e2032 in L1TTauOffline::analyze(edm::Event const&, 
edm::EventSetup const&) () from 
/cvmfs/cms.cern.ch/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_0_0/lib/slc7_amd64_gcc900/pluginDQMOfflineL1Trigger.so
#7  0x00002b49973221cc in 
edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo 
const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () 
from 
/cvmfs/cms.cern.ch/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_0_0/lib/slc7_amd64_gcc900/libFWCoreFramework.so

which seems related to L1T

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 17 (17 by maintainers)

Most upvoted comments

@francescobrivio I think the crashing code is actually in the L1T DQM: https://github.com/cms-sw/cmssw/blob/0967201290508fa815a95781424a5c0b15f0f9b8/DQMOffline/L1Trigger/src/L1TTauOffline.cc#L245 perhaps it should be assigned DQM as well

+1 For the records

Ok I will make the PR

https://github.com/cms-sw/cmssw/blob/master/DQMOffline/L1Trigger/src/L1TTauOffline.cc#L709 This is the problematic line (the antiele part) that triggers the crash. A quick and dirty fix seems to simply make sure the antielectron discriminators were computed if((*antiele)[tauCandidate].workingPoints.size() ==0) continue;