cmssw: Multiple failures in NONLTO, CLANG and ASAN Unit Tests and RelVals due to `PluginNotFound`

Hello,

There are multiple failures in NONLTO, CLANG and ASAN IBs (both in Unit Tests and RelVals) in lastest IBs (CMSSW_14_1_[FLAVOR]_X_2024-04-22-2300) reporting:

===== Test "testROCmTestDeviceAdditionModule" ====
----- Begin Fatal Exception 23-Apr-2024 12:18:10 CEST-----------------------
An exception of category 'PluginNotFound' occurred while
   [0] Initializing message logger
Exception Message:
Unable to find plugin 'SingleThreadMSPresence' because the category 'CMS EDM Framework Presence' has no known plugins
----- End Fatal Exception -------------------------------------------------

---> test testROCmTestDeviceAdditionModule had ERRORS
TestTime:0
^^^^ End Test testROCmTestDeviceAdditionModule ^^^^

There are other variants of the exception, for example:

  • CondCore/SiPixelPlugins:
===== Test "testPixelPayloadInspector" ====
terminate called after throwing an instance of 'cms::Exception'
  what():  An exception of category 'PluginNotFound' occurred.
Exception Message:
Unable to find plugin 'SiteLocalConfigService' because the category 'CMS EDM Framework Service' has no known plugins
  • CondCore/CondDB:
===== Test "testConditionDatabase_1" ====
> Connecting with db in sqlite_file:cms_conditions_1.db
ERROR: An exception of category 'PluginNotFound' occurred.
Exception Message:
Unable to find plugin 'COND/Services/RelationalAuthenticationService' because the category 'CoralService' has no known plugins

I am not sure if it is related, but we had ROCm update yesterday in #44777 and ROCm device builds fine (See log). However, there was a similar issue in the past reported at cmssw#40680 and related to a ROCm update in which the missing plugins were not properly registered in the .edmplugincache file.

Thanks, Andrea

About this issue

  • Original URL
  • State: open
  • Created 2 months ago
  • Comments: 21 (21 by maintainers)

Most upvoted comments

The libDD4hepGaudiPluginMgr.so has

0000000000035320 W std::filesystem::__cxx11::path::~path()

For LTO builds ( where dd4hep is also build with lto flags) libDD4hepGaudiPluginMgr.so library does not contain this. It only has

Singularity> nm -D external/el8_amd64_gcc12/lib/libDD4hepGaudiPluginMgr.so | c++filt | grep ::path::
                 U std::filesystem::__cxx11::path::_M_find_extension() const@GLIBCXX_3.4.26
                 U std::filesystem::__cxx11::path::_List::_Impl_deleter::operator()(std::filesystem::__cxx11::path::_List::_Impl*) const@GLIBCXX_3.4.26
                 U std::filesystem::__cxx11::path::_List::end() const@GLIBCXX_3.4.26
                 U std::filesystem::__cxx11::path::compare(std::filesystem::__cxx11::path const&) const@GLIBCXX_3.4.26
                 U std::filesystem::__cxx11::path::_M_split_cmpts()@GLIBCXX_3.4.26
                 U std::filesystem::__cxx11::path::_List::_List(std::filesystem::__cxx11::path::_List const&)@GLIBCXX_3.4.26
                 U std::filesystem::__cxx11::path::_List::_List()@GLIBCXX_3.4.26

So may be that is why LTO enabled IBs are not failing.

https://github.com/cms-sw/cmsdist/blob/IB/CMSSW_14_1_X/master/scram-project-build.file#L228 is where we run edmPluginRefresh and though there was a crash but edmPluginRefresh did not exit with non-zero code that is why build process did not stop

https://github.com/cms-sw/cmssw/pull/44838 fixes edmPluginRefresh to return a non-zero exit code if the child process fails.

Disassembling things, the instructions of ~path() in rocprofiler-register.so.0.so match to the instructions in libstdc++fs.a from GCC 8. The instructions in libFWCorePluginManager.so match to to the instructions in libDD4hepGaudiPluginMgr.so. The instructions in the GCC8 rocprofiler-register.so.0.so/libstdc++fs.a are (very) different from the instructions in the GCC12 libFWCorePluginManager.so/libDD4hepGaudiPluginMgr.so.

It seems like we have an ODR violation from trying to mix libraries that were built with (very) different versions of libstdc++, and thus if we need to keep the rocprofiler, we’d have to build it ourselves.

assign heterogeneous