cmssw: [ASAN] Unit Test `testTriggerEventAnalyzers` failing with heap-buffer-overflow

Hello,

Unit Test testTriggerEventAnalyzers (from module HLTrigger/HLTcore) is failing with heap-buffer-overflow. Here the summary:

SUMMARY: AddressSanitizer: heap-buffer-overflow (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02775/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_ASAN_X_2023-03-10-2300/lib/el8_amd64_gcc11/pluginHLTriggerHLTcorePlugins.so+0xf1638) in void HLTEventAnalyzerRAW::showObjects<std::vector<int, std::allocator<int> >, std::vector<edm::Ref<std::vector<reco::PFJet,
std::allocator<reco::PFJet> >, reco::PFJet, edm::refhelper::FindUsingAdvance<std::vector<reco::PFJet,
std::allocator<reco::PFJet> >, reco::PFJet> >, std::allocator<edm::Ref<std::vector<reco::PFJet,
std::allocator<reco::PFJet> >, reco::PFJet, edm::refhelper::FindUsingAdvance<std::vector<reco::PFJet,
std::allocator<reco::PFJet> >, reco::PFJet> > > > >(std::vector<int, std::allocator<int> > const&, std::vector<edm::Ref<std::vector<reco::PFJet, std::allocator<reco::PFJet> >, reco::PFJet,
edm::refhelper::FindUsingAdvance<std::vector<reco::PFJet, std::allocator<reco::PFJet> >, reco::PFJet> >,
std::allocator<edm::Ref<std::vector<reco::PFJet, std::allocator<reco::PFJet> >, reco::P

Find the full log in this link.

Thanks!

FYI, @missirol (I see you added this test to CMSSW)

About this issue

Original URL
State: closed
Created a year ago
Comments: 22 (22 by maintainers)

Most upvoted comments

I guess my question was even more naive, and is about this line in HLTJetTag. There, the key can be larger than the size of *h_Jets. When this happens, is it possible that this could crash at runtime? And why don’t we see ASAN errors from HLTJetTag itself? (sorry if these questions are a bit too basic)

@makortel , would you mind explaining this ?

The edm::Ref constructor does not try to access the element pointed by the index. The invalid memory access can therefore only occur when the Ref is de-referenced.

makortel on May 5, 2023

Slowly converging on this issue.

step 0 (#41350): the unit-test failure can be fixed by using a different input file.
step 1 (CMSHLT-2750): this issue highlighted a long-standing misconfiguration of a few b-tagging modules of the HLT pp menu (according to my checks, there are 7 affected modules). The first step is to fix this in the HLT menu, by replacing the ill-defined instances of HLTJetTag<T> with instances of HLTJetTagWithMatching<T> (which uses delta-R matching, appropriate for cases where the two jets collection are not identical);
step 2 (#41351, #41352): part of the problem is that the implementation of the plugin HLTJetTag<T> is fragile, so it should be improved. This step-2 must go after step-1, because the improved/stricter version of HLTJetTag<T> would lead to runtime errors with the current HLT menu. This also implies that step-2 can be completed only after we remove the frozen 2022 HLT menu from 13_0_X and 13_1_X (otherwise, 2022 RelVals would fail). Retiring the 2022 HLT menu in those releases was going to happen in any case, as HLT usually runs only one non-fake HLT menu in a given release (for 13_0_X and 13_1_X, that is the 2023 HLT menu).
step 3: since the new version of HLTJetTag<T> will have an option to use delta-R matching, HLTJetTag<T> will become a superset of HLTJetTagWithMatching<T>, so I plan to replace all instances HLTJetTagWithMatching<T> in the HLT menus with properly-configured instances of the new HLTJetTag<T>. After that, HLTJetTagWithMatching<T> can be removed from the master branch of CMSSW.

Once there is data/MC processed with an updated HLT menu, the input file of the unit test could be updated again (with a proper input, more is tested, as this issue shows).

By the way, thanks a lot for opening this issue (@aandvalenzuela). I doubt we would have caught this bug otherwise (and clearly, the bug went unnoticed for a long time).

One naive question: why didn’t ‘anything’ catch this earlier? It was sort-of caught by accident (in a unit test of something pretty unrelated, in ASAN). Was this issue triggering ‘alarms’ anywhere else (e.g. other tests in ASAN, or else)? If not, why was it going unnoticed?

missirol on Apr 16, 2023