cmssw: Problem with mxnet on CC8
Running workflows on CC8 which usie mxnet where the jobs use multilple threads leads to a crash at the end of the job. This can sometimes be reproduced when running under the gdb which yields the following traceback
#0 0x000000003442bf80 in ?? ()
#1 0x00007fff9f0f0472 in mxnet::resource::ResourceManagerImpl::~ResourceManagerImpl() () from /cvmfs/cms-ib.cern.ch/week0/cc8_amd64_gcc8/cms/cmssw-patch/CMSSW_11_2_X_2020-06-26-1100/external/cc8_amd64_gcc8/lib/libmxnet.so
#2 0x00007fff9f0f0ca5 in dmlc::ThreadLocalStore<mxnet::resource::ResourceManagerImpl>::~ThreadLocalStore() ()
from /cvmfs/cms-ib.cern.ch/week0/cc8_amd64_gcc8/cms/cmssw-patch/CMSSW_11_2_X_2020-06-26-1100/external/cc8_amd64_gcc8/lib/libmxnet.so
#3 0x00007ffff552406c in __run_exit_handlers () from /lib64/libc.so.6
#4 0x00007ffff55241a0 in exit () from /lib64/libc.so.6
#5 0x00007ffff550d87a in __libc_start_main () from /lib64/libc.so.6
#6 0x000000000041145e in _start ()
One question, has the libmxnet.so shared library already been unloaded from the process before this happened?
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 49 (49 by maintainers)
Commits related to this issue
- Drop mxnet (cms-sw/cmssw#30432) — committed to cms-sw/cmsdist by iarspider a year ago
- Remove PhysicsTools/MXNet (cms-sw/cmssw#30432) — committed to iarspider-cmssw/cmssw by iarspider a year ago
- Merge pull request #41377 from iarspider/drop-mxnet Remove PhysicsTools/MXNet (cms-sw/cmssw#30432) — committed to cms-sw/cmssw by cmsbuild a year ago
- Merge pull request #8451 from cms-sw/drop-mxnet Drop mxnet (cms-sw/cmssw#30432) — committed to cms-sw/cmsdist by smuzaffar a year ago
@Dr15Jones (et al.) after the merge of #41377 this can be probably considered fixed, and closed