cmssw: Crash in Tier0 replay of Run-2 data using 12_0_0
A crash was observed while running a Tier0 replay on Run-2 data (https://github.com/dmwm/T0/pull/4602, more info in this HN message) to test the PCL workflows and the new AlCaRecos: SiPixelCalSingleMuonLoose, SiPixelCalSingleMuonTight, TkAlDiMuonAndVertex.
The crash has been reported in https://hypernews.cern.ch/HyperNews/CMS/get/tier0-Ops/2261/1.html as
Current Modules:
Module: L1TTauOffline:l1tTauOfflineDQMEmu (crashed)
Module: METMonitor:PFMET110_PFMHT110_IDTight_METmonitoring
Module: LowPtGsfElectronSeedProducer:lowPtGsfElectronSeeds
Module: PATPackedCandidateProducer:packedPFCandidates
Module: HTMonitor:hltHT_HT650_DisplacedDijet60_Inclusive_Prommonitoring
Module: FastjetJetProducer:ak8PFJetsPuppi
Module: CandSecondaryVertexProducer:pfInclusiveSecondaryVertexFinderCvsLTagInfosPuppi
Module: CandSecondaryVertexProducer:pfInclusiveSecondaryVertexFinderTagInfos
A fatal system signal has occurred: segmentation violation
Segmentation fault (core dumped)
@tvami spotted the exact event for which this crash happens: Run 317696, Event 59331484, LumiSection 63 So the crash can be easily reproduced by doing:
cmsrel CMSSW_12_0_0
cd CMSSW_12_0_0/src
cmsenv
cp /afs/cern.ch/work/f/fbrivio/public/ALCA/replay_Run2data_PR4602/job_config.py .
cmsRun job_config.py
@makortel reported in https://hypernews.cern.ch/HyperNews/CMS/get/tier0-Ops/2261/1/1/1.html the stack trace of the crashing module:
#5 0x00002b4a0a4e17b2 in L1TTauOffline::getProbeTaus(edm::Event const&,
edm::Handle<std::vector<reco::PFTau, std::allocator<reco::PFTau> > >
const&, edm::Handle<std::vector<reco::Muon, std::allocator<reco::Muon> >
> const&, reco::Vertex const&) () from
/cvmfs/cms.cern.ch/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_0_0/lib/slc7_amd64_gcc900/pluginDQMOfflineL1Trigger.so
#6 0x00002b4a0a4e2032 in L1TTauOffline::analyze(edm::Event const&,
edm::EventSetup const&) () from
/cvmfs/cms.cern.ch/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_0_0/lib/slc7_amd64_gcc900/pluginDQMOfflineL1Trigger.so
#7 0x00002b49973221cc in
edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo
const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) ()
from
/cvmfs/cms.cern.ch/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_0_0/lib/slc7_amd64_gcc900/libFWCoreFramework.so
which seems related to L1T
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 17 (17 by maintainers)
@francescobrivio I think the crashing code is actually in the L1T DQM: https://github.com/cms-sw/cmssw/blob/0967201290508fa815a95781424a5c0b15f0f9b8/DQMOffline/L1Trigger/src/L1TTauOffline.cc#L245 perhaps it should be assigned DQM as well
+1 For the records
Ok I will make the PR
https://github.com/cms-sw/cmssw/blob/master/DQMOffline/L1Trigger/src/L1TTauOffline.cc#L709 This is the problematic line (the antiele part) that triggers the crash. A quick and dirty fix seems to simply make sure the antielectron discriminators were computed
if((*antiele)[tauCandidate].workingPoints.size() ==0) continue;