cmssw: igprof pp segfault in 12_3_0, 12_3_0_pre4 in the Run3 reco step
In two recent releases, igprof pp
crashes in the reco step in 11834.21:
$ tail -n10 /eos/cms/store/user/cmsbuild/profiling/data/CMSSW_12_3_0/slc7_amd64_gcc10/11834.21/step3_igprof_cpu.txt
#14 0x00007f36c6833f45 in IgHookTrace::stacktrace (addresses=addresses@entry=0x7ffc58699700, nmax=nmax@entry=800) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_3_0_pre5-slc7_amd64_gcc10/build/CMSSW_12_3_0_pre5-build/BUILD/slc7_amd64_gcc10/external/igprof/5.9.16-f8a2b39c36d2a318d6c7c0f619242bdb/igprof-6cc73b59d83ed6c9d73b455dc40857e700ef6ee4/src/walk-syms.cc:175
#15 0x00007f36c683d508 in profileSignalHandler () at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_3_0_pre5-slc7_amd64_gcc10/build/CMSSW_12_3_0_pre5-build/BUILD/slc7_amd64_gcc10/external/igprof/5.9.16-f8a2b39c36d2a318d6c7c0f619242bdb/igprof-6cc73b59d83ed6c9d73b455dc40857e700ef6ee4/src/profile-perf.cc:66
#16 <signal handler called>
#17 0x00007f368f898ee2 in mkfit::kalmanOperation(int, Matriplex::MatriplexSym<float, 6, 4> const&, Matriplex::Matriplex<float, 6, 1, 4> const&, Matriplex::MatriplexSym<float, 3, 4> const&, Matriplex::Matriplex<float, 3, 1, 4> const&, Matriplex::MatriplexSym<float, 6, 4>&, Matriplex::Matriplex<float, 6, 1, 4>&, Matriplex::Matriplex<float, 1, 1, 4>&, int) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_0/lib/slc7_amd64_gcc10/libRecoTrackerMkFitCore.so
Current Modules:
Module: MkFitProducer:detachedTripletStepTrackCandidatesMkFit (crashed)
A fatal system signal has occurred: segmentation violation
$ tail -n10 /eos/cms/store/user/cmsbuild/profiling/data/CMSSW_12_4_0_pre3/slc7_amd64_gcc10/11834.21/step3_igprof_cpu.txt
#14 0x00007f9899389f45 in IgHookTrace::stacktrace (addresses=addresses@entry=0x7ffd8674d480, nmax=nmax@entry=800) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_4_0_pre2-slc7_amd64_gcc10/build/CMSSW_12_4_0_pre2-build/BUILD/slc7_amd64_gcc10/external/igprof/5.9.16-95dc8f7dd3ee3d76c20fd25518fc6fa9/igprof-6cc73b59d83ed6c9d73b455dc40857e700ef6ee4/src/walk-syms.cc:175
#15 0x00007f9899393508 in profileSignalHandler () at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_4_0_pre2-slc7_amd64_gcc10/build/CMSSW_12_4_0_pre2-build/BUILD/slc7_amd64_gcc10/external/igprof/5.9.16-95dc8f7dd3ee3d76c20fd25518fc6fa9/igprof-6cc73b59d83ed6c9d73b455dc40857e700ef6ee4/src/profile-perf.cc:66
#16 <signal handler called>
#17 0x00007f9860bcf8ea in mkfit::propagateHelixToZMPlex(Matriplex::MatriplexSym<float, 6, 4> const&, Matriplex::Matriplex<float, 6, 1, 4> const&, Matriplex::Matriplex<int, 1, 1, 4> const&, Matriplex::Matriplex<float, 1, 1, 4> const&, Matriplex::MatriplexSym<float, 6, 4>&, Matriplex::Matriplex<float, 6, 1, 4>&, int, mkfit::PropagationFlags, Matriplex::Matriplex<int, 1, 1, 4> const*) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_4_0_pre3/lib/slc7_amd64_gcc10/libRecoTrackerMkFitCore.so
Current Modules:
Module: MkFitProducer:highPtTripletStepTrackCandidatesMkFit (crashed)
A fatal system signal has occurred: segmentation violation
In both cases, the current module is MkFitProducer. Is it a coincidence, or do we have a regression?
Note that igprof mp
does not crash in these workflows, and the crash happens around event 230-260. Since jenkins tries to run igprof several times in case of failure, it looks like it’s reproducible.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 60 (60 by maintainers)
@smuzaffar could this issue be reopened, just to avoid discussion on a closed issue (we never signed from reco)
It looks like a different bug in libunwind. I will test with gperftools as well since it uses libunwind as well.
Looks like this bug might be addressed by updating libunwind.
there were no recent updates in the propagate or kalman update routines. This still seems similar to the previous case where the issue was with the profiler itself having some outdated (was it TBB or pthread?) dependencies.