cmssw: Crash on multicore jobs when running on condor with CMSSW_11_3_0_pre6

Here is the report from https://hypernews.cern.ch/HyperNews/CMS/get/edmFramework/3920.html

With the following cmsDrivers (*), I found the issue when trying to run on the condor. Note that, everything runs fine on lxplus.

  1. If I run ZEE, I got (**). Issue also happens when I register with nTherads=8.
  2. If I run ZEE locally, then read the root file from the SIM step. SIM and DIGI steps run fine. All jobs fail at the RECO step with (***). No more information printout apart from what I posted.

(*) cmsDriver.py ZEE_14TeV_TuneCP5_cfi --mc --conditions auto:phase1_2021_realistic -n 500 --era Run3 --eventcontent FEVTDEBUG -s GEN --datatier GEN --geometry DB:Extended --beamspot Run3RoundOptics25ns13TeVLowSigmaZ --python step1_ZEE_GEN_temp.py --no_exec --fileout file:step1_GEN.root --nThreads 1 --customise_commands "from IOMC.RandomEngine.RandomServiceHelper import RandomNumberServiceHelper ; randSvc = RandomNumberServiceHelper(process.RandomNumberGeneratorService) ; randSvc.populate()\n process.source.firstLuminosityBlock = cms.untracked.uint32(3)"

cmsDriver.py step2 --mc --conditions auto:phase1_2021_realistic -n -1 --era Run3 --eventcontent FEVTDEBUG -s SIM --datatier GEN-SIM --beamspot Run3RoundOptics25ns13TeVLowSigmaZ --geometry DB:Extended --python step2_SIM_GFlashNo.py --no_exec --filein file:step1_GEN.root --fileout file:step2_SIM.root --nThreads 8 --customise_commands "from IOMC.RandomEngine.RandomServiceHelper import RandomNumberServiceHelper ; randSvc = RandomNumberServiceHelper(process.RandomNumberGeneratorService) ; randSvc.populate()" --customise Configuration/DataProcessing/Utils.addMonitoring

cmsDriver.py step3 --mc --conditions auto:phase1_2021_realistic -s DIGI:pdigi_valid,L1,DIGI2RAW,HLT:@relval2021 --datatier GEN-SIM-DIGI-RAW -n -1 --geometry DB:Extended --era Run3 --eventcontent FEVTDEBUGHLT --python step3_DIGIL1HLT.py --no_exec --filein file:step2_SIM.root --fileout file:step3_DIGIL1HLT.root --nThreads 8 --customise_commands "from IOMC.RandomEngine.RandomServiceHelper import RandomNumberServiceHelper ; randSvc = RandomNumberServiceHelper(process.RandomNumberGeneratorService) ; randSvc.populate()"

cmsDriver.py step4 --mc --conditions auto:phase1_2021_realistic -n -1 --era Run3 --eventcontent MINIAODSIM,DQM -s RAW2DIGI,L1Reco,RECO,RECOSIM,EI,PAT,VALIDATION:@standardValidation+@miniAODValidation,DQM:@standardDQM+@ExtraHLT+@miniAODDQM --datatier MINIAODSIM,DQMIO --geometry DB:Extended --python step4_RECO.py --no_exec --filein file:step3_DIGIL1HLT.root --fileout file:step4_RECO.root --nThreads 8

(**)

Thread 2 (Thread 0x2b1e8e0f3700 (LWP 233)):
#0  0x00002b1e6f48a1d9 in waitpid () from /lib64/libpthread.so.0
#1  0x00002b1e7582a8d7 in edm::service::cmssw_stacktrace_fork() () from /cvmfs/cms.cern.ch/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_0_pre6/lib/slc7_amd64_gcc900/pluginFWCoreServicesPlugins.so
#2  0x00002b1e7582b49a in edm::service::InitRootHandlers::stacktraceHelperThread() () from /cvmfs/cms.cern.ch/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_0_pre6/lib/slc7_amd64_gcc900/pluginFWCoreServicesP
lugins.so
#3  0x00002b1e6f054af0 in std::execute_native_thread_routine (__p=0x2b1e8d48e4e0) at ../../../../../libstdc++-v3/src/c++11/thread.cc:80
#4  0x00002b1e6f482ea5 in start_thread () from /lib64/libpthread.so.0
#5  0x00002b1e6f7959fd in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x2b1e71391a80 (LWP 215)):
#0  0x00002b1e6f78accd in poll () from /lib64/libc.so.6
#1  0x00002b1e7582acd7 in full_read.constprop () from /cvmfs/cms.cern.ch/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_0_pre6/lib/slc7_amd64_gcc900/pluginFWCoreServicesPlugins.so
#2  0x00002b1e7582b56c in edm::service::InitRootHandlers::stacktraceFromThread() () from /cvmfs/cms.cern.ch/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_0_pre6/lib/slc7_amd64_gcc900/pluginFWCoreServicesPlu
gins.so
#3  0x00002b1e7582c922 in sig_dostack_then_abort () from /cvmfs/cms.cern.ch/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_0_pre6/lib/slc7_amd64_gcc900/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x00002b1e6f6cd3d7 in raise () from /lib64/libc.so.6
#6  0x00002b1e6f6ceac8 in abort () from /lib64/libc.so.6
#7  0x00002b1e6f01f683 in __gnu_cxx::__verbose_terminate_handler () at ../../../../libstdc++-v3/libsupc++/vterminate.cc:95
#8  0x00002b1e6f02b0a6 in __cxxabiv1::__terminate (handler=<optimized out>) at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:48
#9  0x00002b1e6f02a1a9 in __cxa_call_terminate (ue_header=ue_header@entry=0x2b1e98ca1de0) at ../../../../libstdc++-v3/libsupc++/eh_call.cc:54
#10 0x00002b1e6f02aad4 in __cxxabiv1::__gxx_personality_v0 (version=<optimized out>, actions=6, exception_class=5138137972254386944, ue_header=0x2b1e98ca1de0, context=<optimized out>) at ../../../../
libstdc++-v3/libsupc++/eh_personality.cc:677
#11 0x00002b1e6f4717b3 in _Unwind_RaiseException_Phase2 (exc=0x2b1e98ca1de0, context=0x7ffe85db3e00, frames_p=0x7ffe85db3d08) at ../../../libgcc/unwind.inc:64
#12 0x00002b1e6f472016 in _Unwind_Resume (exc=0x2b1e98ca1de0) at ../../../libgcc/unwind.inc:241
#13 0x00002b1e9eff8738 in _GLOBAL__sub_I_TGClient.cxx.cold () from /cvmfs/cms.cern.ch/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_0_pre6/external/slc7_amd64_gcc900/lib/libGui.so
#14 0x00002b1e6cab89c3 in _dl_init_internal () from /lib64/ld-linux-x86-64.so.2
#15 0x00002b1e6cabd59e in dl_open_worker () from /lib64/ld-linux-x86-64.so.2
#16 0x00002b1e6cab87d4 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#17 0x00002b1e6cabcb8b in _dl_open () from /lib64/ld-linux-x86-64.so.2
#18 0x00002b1e6eb61fab in dlopen_doit () from /lib64/libdl.so.2
#19 0x00002b1e6cab87d4 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#20 0x00002b1e6eb625ad in _dlerror_run () from /lib64/libdl.so.2
#21 0x00002b1e6eb62041 in dlopen@@GLIBC_2.2.5 () from /lib64/libdl.so.2
#22 0x00002b1e6cc66167 in edmplugin::SharedLibrary::SharedLibrary(std::filesystem::__cxx11::path const&) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_0_pre6/lib/slc7_amd64_gcc900
/libFWCorePluginManager.so
#23 0x00002b1e6cc60676 in edmplugin::PluginManager::load(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char
>, std::allocator<char> > const&) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_0_pre6/lib/slc7_amd64_gcc900/libFWCorePluginManager.so
#24 0x00002b1e6cc5a7f5 in edmplugin::PluginFactoryBase::findPMaker(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const () from /cvmfs/cms.cern.ch/slc7_amd64_
gcc900/cms/cmssw/CMSSW_11_3_0_pre6/lib/slc7_amd64_gcc900/libFWCorePluginManager.so
#25 0x00002b1e6ced4b78 in edm::Factory::findMaker(edm::MakeModuleParams const&) const () from /cvmfs/cms.cern.ch/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_0_pre6/lib/slc7_amd64_gcc900/libFWCoreFramework
.so
#26 0x00002b1e6ced4d72 in edm::Factory::makeModule(edm::MakeModuleParams const&, edm::signalslot::Signal<void (edm::ModuleDescription const&)>&, edm::signalslot::Signal<void (edm::ModuleDescription c
onst&)>&) const () from /cvmfs/cms.cern.ch/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_0_pre6/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#27 0x00002b1e6cee66e8 in edm::ModuleRegistry::getModule(edm::MakeModuleParams const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, edm::signalslot::Signal<
void (edm::ModuleDescription const&)>&, edm::signalslot::Signal<void (edm::ModuleDescription const&)>&) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_0_pre6/lib/slc7_amd64_gcc900/
libFWCoreFramework.so
#28 0x00002b1e6cf91c01 in edm::WorkerRegistry::getWorker(edm::WorkerParams const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /cvmfs/cms.cern.ch/s
lc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_0_pre6/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#29 0x00002b1e6cf8fe16 in edm::WorkerManager::getWorker(edm::ParameterSet&, edm::ProductRegistry&, edm::PreallocationConfiguration const*, std::shared_ptr<edm::ProcessConfiguration const>, std::__cxx
11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_0_pre6/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#30 0x00002b1e6cf90aa9 in edm::WorkerManager::addToUnscheduledWorkers(edm::ParameterSet&, edm::ProductRegistry&, edm::PreallocationConfiguration const*, std::shared_ptr<edm::ProcessConfiguration>, st
d::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_
string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, std::vector<std::__cxx11::basic_str
ing<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&) () from /cvmfs/cms.cern.ch/slc7_amd64_gc
c900/cms/cmssw/CMSSW_11_3_0_pre6/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#31 0x00002b1e6cf6e52a in edm::StreamSchedule::StreamSchedule(std::shared_ptr<edm::TriggerResultInserter>, std::vector<edm::propagate_const<std::shared_ptr<edm::PathStatusInserter> >, std::allocator<
edm::propagate_const<std::shared_ptr<edm::PathStatusInserter> > > >&, std::vector<edm::propagate_const<std::shared_ptr<edm::EndPathStatusInserter> >, std::allocator<edm::propagate_const<std::shared_p
tr<edm::EndPathStatusInserter> > > >&, std::shared_ptr<edm::ModuleRegistry>, edm::ParameterSet&, edm::service::TriggerNamesService const&, edm::PreallocationConfiguration const&, edm::ProductRegistry
&, edm::BranchIDListHelper&, edm::ExceptionToActionTable const&, std::shared_ptr<edm::ActivityRegistry>, std::shared_ptr<edm::ProcessConfiguration>, bool, edm::StreamID, edm::ProcessContext const*) (
) from /cvmfs/cms.cern.ch/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_0_pre6/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#32 0x00002b1e6cf4eda3 in edm::Schedule::Schedule(edm::ParameterSet&, edm::service::TriggerNamesService const&, edm::ProductRegistry&, edm::BranchIDListHelper&, edm::ThinnedAssociationsHelper&, edm::
SubProcessParentageHelper const*, edm::ExceptionToActionTable const&, std::shared_ptr<edm::ActivityRegistry>, std::shared_ptr<edm::ProcessConfiguration>, bool, edm::PreallocationConfiguration const&,
 edm::ProcessContext const*) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_0_pre6/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#33 0x00002b1e6cf5f8cc in edm::ScheduleItems::initSchedule(edm::ParameterSet&, bool, edm::PreallocationConfiguration const&, edm::ProcessContext const*) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc900/c
ms/cmssw/CMSSW_11_3_0_pre6/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#34 0x00002b1e6ce7402a in edm::EventProcessor::init(std::shared_ptr<edm::ProcessDesc>&, edm::ServiceToken const&, edm::serviceregistry::ServiceLegacy) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc900/cms
/cmssw/CMSSW_11_3_0_pre6/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#35 0x00002b1e6ce75eda in edm::EventProcessor::EventProcessor(std::shared_ptr<edm::ProcessDesc>, edm::ServiceToken const&, edm::serviceregistry::ServiceLegacy) () from /cvmfs/cms.cern.ch/slc7_amd64_g
cc900/cms/cmssw/CMSSW_11_3_0_pre6/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#36 0x000000000040ba11 in tbb::interface7::internal::delegated_function<main::{lambda()#1}::operator()() const::{lambda()#1} const, void>::operator()() const ()
#37 0x00002b1e6e65c552 in tbb::interface7::internal::task_arena_base::internal_execute (this=0x7ffe85db6150, d=...) at ../../src/tbb/arena.cpp:1105
#38 0x000000000040ca13 in main::{lambda()#1}::operator()() const ()
#39 0x000000000040b62c in main ()

(***)

----- Begin Fatal Exception 18-Apr-2021 18:51:02 CEST-----------------------
An exception of category 'FatalRootError' occurred while
   [0] Constructing the EventProcessor
   [1] Constructing module: class=RecoTauCleaner label='pfTausProducerSansRefs'
   Additional Info:
      [a] Fatal Root Error: @SUB=TSystem::ExpandFileName
input: $HOME/.root.mimes, output: $HOME/.root.mimes

----- End Fatal Exception -------------------------------------------------

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 69 (57 by maintainers)

Most upvoted comments

So looking at the TypeWithDict code we can see that the string "vector<ROOT::Experimental::REveTableEntry>::size_type" comes from TMethodArg::GetTypeName() https://github.com/cms-sw/cmssw/blob/b2180834deb82044f9e8817f22d149b9f77ba7a9/FWCore/Reflection/src/TypeWithDict.cc#L342-L343