cmssw: Failures in NanoDQMIO production
We’ve been recently producing a new type of DQMIO datasets, as was requested here. While the local tests went fine, there are many failures seen in the production, also at different sites. Here is the example with the crash info:
cmsRun1
CMSSWStepFailure (Exit Code: 139)
Adding last 25 lines of CMSSW stdout:
#18 0x00002b4e00301c2f in edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc10/cms/cmssw/CMSSW_12_4_12/lib/el8_amd64_gcc10/libFWCoreFramework.so
#19 0x00002b4e00259e55 in decltype ({parm#1}()) edm::convertException::wrap<edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}>(edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}) () from /cvmfs/cms.cern.ch/el8_amd64_gcc10/cms/cmssw/CMSSW_12_4_12/lib/el8_amd64_gcc10/libFWCoreFramework.so
#20 0x00002b4e0025a14b in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::__exception_ptr::exception_ptr const*, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc10/cms/cmssw/CMSSW_12_4_12/lib/el8_amd64_gcc10/libFWCoreFramework.so
#21 0x00002b4e0025c735 in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute() () from /cvmfs/cms.cern.ch/el8_amd64_gcc10/cms/cmssw/CMSSW_12_4_12/lib/el8_amd64_gcc10/libFWCoreFramework.so
#22 0x00002b4dffff37b5 in tbb::detail::d1::function_task<edm::WaitingTaskList::announce()::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) () from /cvmfs/cms.cern.ch/el8_amd64_gcc10/cms/cmssw/CMSSW_12_4_12/lib/el8_amd64_gcc10/libFWCoreConcurrency.so
#23 0x00002b4e01aadbec in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter> (waiter=..., t=0x2b4e93b81600, this=0x2b4e04623d80) at /data/cmsbld/jenkins/workspace/jenkins-test-bootstrap/toolconf/BUILD/el8_amd64_gcc10/external/tbb/v2021.5.0-e966a5acb1e4d5fd7605074bafbb079c/tbb-v2021.5.0/src/tbb/task_dispatcher.h:322
#24 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::external_waiter> (waiter=..., t=<optimized out>, this=0x2b4e04623d80) at /data/cmsbld/jenkins/workspace/jenkins-test-bootstrap/toolconf/BUILD/el8_amd64_gcc10/external/tbb/v2021.5.0-e966a5acb1e4d5fd7605074bafbb079c/tbb-v2021.5.0/src/tbb/task_dispatcher.h:463
#25 tbb::detail::r1::task_dispatcher::execute_and_wait (t=<optimized out>, wait_ctx=..., w_ctx=...) at /data/cmsbld/jenkins/workspace/jenkins-test-bootstrap/toolconf/BUILD/el8_amd64_gcc10/external/tbb/v2021.5.0-e966a5acb1e4d5fd7605074bafbb079c/tbb-v2021.5.0/src/tbb/task_dispatcher.cpp:168
#26 0x00002b4e001ca2d8 in edm::EventProcessor::processLumis(std::shared_ptr<void> const&) () from /cvmfs/cms.cern.ch/el8_amd64_gcc10/cms/cmssw/CMSSW_12_4_12/lib/el8_amd64_gcc10/libFWCoreFramework.so
#27 0x00002b4e001d51fb in edm::EventProcessor::runToCompletion() () from /cvmfs/cms.cern.ch/el8_amd64_gcc10/cms/cmssw/CMSSW_12_4_12/lib/el8_amd64_gcc10/libFWCoreFramework.so
#28 0x000000000040a266 in tbb::detail::d1::task_arena_function<main::{lambda()#1}::operator()() const::{lambda()#1}, void>::operator()() const ()
#29 0x00002b4e01a9c0eb in tbb::detail::r1::task_arena_impl::execute (ta=..., d=...) at /data/cmsbld/jenkins/workspace/jenkins-test-bootstrap/toolconf/BUILD/el8_amd64_gcc10/external/tbb/v2021.5.0-e966a5acb1e4d5fd7605074bafbb079c/tbb-v2021.5.0/src/tbb/arena.cpp:698
#30 0x000000000040b094 in main::{lambda()#1}::operator()() const ()
#31 0x000000000040970c in main ()
Current Modules:
Module: HLTElePhoTagAndProbeOfflineSource:egHLTElePhoHighEtaDQMOfflineTnPSource (crashed)
Module: TrackProducer:lowPtTripletStepTracks
Module: CkfTrackCandidateMaker:lowPtQuadStepTrackCandidates
Module: CkfTrackCandidateMaker:muonSeededTrackCandidatesInOut
In this example, the input dataset is /EGamma/Run2022E-v1/RAW and the full log at /store/unmerged/logs/prod/2023/1/26/pdmvserv_Run2022E_EGamma_19Jan2023_230119_090450_268/DataProcessing/0002/3/33f0e71f-9554-40ba-87f3-80522baac221-0-3-logArchive.tar.gz.
About this issue
- Original URL
- State: open
- Created a year ago
- Comments: 24 (24 by maintainers)
This is still very much connected to DQM.