cmssw: Failure in refined Nano of UL16

We have seen the failure in production of refined Nano in https://cms-unified.web.cern.ch/cms-unified/report/cmsunified_ACDC0_task_SUS-RunIISummer20UL16NanoAODv9-01329__v1_T_240129_014628_5603

To replicate the error: CMSSW_10_6_36

Input: /store/mc/RunIISummer20UL16MiniAODv2/SMS-T5bbbbZH_TuneCP5_13TeV-madgraphMLM-pythia8/MINIAODSIM/FSUL16_106X_mcRun2_asymptotic_v17-v1/80000/78AD6EFE-932B-234F-9382-FE58B9B36D1F.root

cmsDriver: cmsDriver.py --python_filename SUS-RunIISummer20UL16NanoAODv9-01329_1_cfg.py --eventcontent NANOEDMAODSIM --customise PhysicsTools/NanoAOD/jets_cff.nanoAOD_refineFastSim_bTagDeepFlav,Configuration/DataProcessing/Utils.addMonitoring --datatier NANOAODSIM --fileout file:SUS-RunIISummer20UL16NanoAODv9-01329.root --conditions 106X_mcRun2_asymptotic_v17 --step NANO --filein file:78AD6EFE-932B-234F-9382-FE58B9B36D1F.root --era Run2_2016,run2_nanoAOD_106Xv2 --fast --no_exec --mc -n -1 --customise_commands "process.source.lumisToProcess = cms.untracked.VLuminosityBlockRange('1:23760-1:23760') \n process.source.firstEvent = cms.untracked.uint32(198226907)"

Log of error:

cmsRun: /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_10_6_36-slc7_amd64_gcc700/build/CMSSW_10_6_36-build/tmp/BUILDROOT/3024d5a531af97e6e3d52fc4f2af117e/opt/cmssw/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_6_36/src/PhysicsTools/ONNXRuntime/src/ONNXRuntime.cc:87: cms::Ort::FloatArrays cms::Ort::ONNXRuntime::run(const std::vector<std::__cxx11::basic_string<char> >&, cms::Ort::FloatArrays&, const std::vector<std::vector<long int> >&, const std::vector<std::__cxx11::basic_string<char> >&, int64_t) const: Assertion `batch_size > 0' failed.


A fatal system signal has occurred: abort signal
The following is the call stack containing the origin of the signal.

Mon Feb  5 09:59:24 CET 2024
Thread 3 (Thread 0x7f34ab7a7700 (LWP 1319)):
#0  0x00007f350c326a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f350c9070ac in __gthread_cond_wait (__mutex=<optimized out>, __cond=<optimized out>) at /build/cmsbld/auto-builds/CMSSW_10_6_0_pre4-slc7_amd64_gcc700/build/CMSSW_10_6_0_pre4-build/BUILD/slc7_amd64_gcc700/external/gcc/7.0.0-pafccj/gcc-branches_gcc-7-branch-268351/obj/x86_64-unknown-linux-gnu/libstdc++-v3/include/x86_64-unknown-linux-gnu/bits/gthr-default.h:864
#2  std::condition_variable::wait (this=<optimized out>, __lock=...) at ../../../../../libstdc++-v3/src/c++11/condition_variable.cc:53
#3  0x00007f34ca250bba in Eigen::ThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WorkerLoop(int) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_6_36/external/slc7_amd64_gcc700/lib/libtensorflow_framework.so
#4  0x00007f34ca24d687 in std::_Function_handler<void (), tensorflow::thread::EigenEnvironment::CreateThread(std::function<void ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_6_36/external/slc7_amd64_gcc700/lib/libtensorflow_framework.so
#5  0x00007f350c90ccbf in std::execute_native_thread_routine (__p=0x7f34b77d7350) at ../../../../../libstdc++-v3/src/c++11/thread.cc:83
#6  0x00007f350c322ea5 in start_thread () from /lib64/libpthread.so.0
#7  0x00007f350c04bb0d in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x7f34efb3c700 (LWP 1013)):
#0  0x00007f350c32a1d9 in waitpid () from /lib64/libpthread.so.0
#1  0x00007f34fe145297 in edm::service::cmssw_stacktrace_fork() () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_6_36/lib/slc7_amd64_gcc700/pluginFWCoreServicesPlugins.so
#2  0x00007f34fe145d7a in edm::service::InitRootHandlers::stacktraceHelperThread() () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_6_36/lib/slc7_amd64_gcc700/pluginFWCoreServicesPlugins.so
#3  0x00007f350c90ccbf in std::execute_native_thread_routine (__p=0x7f34ffab4d10) at ../../../../../libstdc++-v3/src/c++11/thread.cc:83
#4  0x00007f350c322ea5 in start_thread () from /lib64/libpthread.so.0
#5  0x00007f350c04bb0d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7f350a6d6480 (LWP 842)):
#0  0x00007f350c040ddd in poll () from /lib64/libc.so.6
#1  0x00007f34fe1457c7 in full_read.constprop () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_6_36/lib/slc7_amd64_gcc700/pluginFWCoreServicesPlugins.so
#2  0x00007f34fe145e5c in edm::service::InitRootHandlers::stacktraceFromThread() () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_6_36/lib/slc7_amd64_gcc700/pluginFWCoreServicesPlugins.so
#3  0x00007f34fe146ec8 in sig_dostack_then_abort () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_6_36/lib/slc7_amd64_gcc700/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x00007f350bf83387 in raise () from /lib64/libc.so.6
#6  0x00007f350bf84a78 in abort () from /lib64/libc.so.6
#7  0x00007f350bf7c1a6 in __assert_fail_base () from /lib64/libc.so.6
#8  0x00007f350bf7c252 in __assert_fail () from /lib64/libc.so.6
#9  0x00007f34cf7a4d80 in cms::Ort::ONNXRuntime::run(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::vector<float, std::allocator<float> >, std::allocator<std::vector<float, std::allocator<float> > > >&, std::vector<std::vector<long, std::allocator<long> >, std::allocator<std::vector<long, std::allocator<long> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, long) const () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_6_36/lib/slc7_amd64_gcc700/libPhysicsToolsONNXRuntime.so
#10 0x00007f34cfb13c14 in BaseMVAValueMapProducer<pat::Jet>::produce(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_6_36/lib/slc7_amd64_gcc700/pluginPhysicsToolsNanoAODPlugins.so
#11 0x00007f350ef55fc7 in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventPrincipal const&, edm::EventSetupImpl const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_6_36/lib/slc7_amd64_gcc700/libFWCoreFramework.so
#12 0x00007f350ee6bb22 in edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventPrincipal const&, edm::EventSetupImpl const&, edm::ModuleCallingContext const*) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_6_36/lib/slc7_amd64_gcc700/libFWCoreFramework.so
#13 0x00007f350ee1e09a in decltype ({parm#1}()) edm::convertException::wrap<bool edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal const&, edm::EventSetupImpl const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}>(bool edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal const&, edm::EventSetupImpl const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_6_36/lib/slc7_amd64_gcc700/libFWCoreFramework.so
#14 0x00007f350ee1e25d in bool edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal const&, edm::EventSetupImpl const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_6_36/lib/slc7_amd64_gcc700/libFWCoreFramework.so
#15 0x00007f350ee1f8eb in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::__exception_ptr::exception_ptr const*, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal const&, edm::EventSetupImpl const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_6_36/lib/slc7_amd64_gcc700/libFWCoreFramework.so
#16 0x00007f350ee208e0 in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute() () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_6_36/lib/slc7_amd64_gcc700/libFWCoreFramework.so
#17 0x00007f350d68d931 in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all (this=0x7f350914f200, parent=..., child=<optimized out>) at ../../src/tbb/custom_scheduler.h:521
#18 0x00007f350eeb1400 in edm::EventProcessor::processLumis(std::shared_ptr<void> const&) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_6_36/lib/slc7_amd64_gcc700/libFWCoreFramework.so
#19 0x00007f350eebae51 in edm::EventProcessor::runToCompletion() () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_6_36/lib/slc7_amd64_gcc700/libFWCoreFramework.so
#20 0x000000000040fc20 in main::{lambda()#1}::operator()() const ()
#21 0x000000000040def2 in main ()

Current Modules:

Module: JetBaseMVAValueMapProducer:btagDeepFlavRefineNN (crashed)

About this issue

  • Original URL
  • State: closed
  • Created 5 months ago
  • Comments: 23 (23 by maintainers)

Most upvoted comments

please close

+1

assign xpog,fastsim