cmssw: how to skip events which throw a `cms::Exception`
The reproducer in [1] (CMSSW_13_0_5_patch1
, input file on lxplus) tries to use options.skipEvent
(documented, for example, in SWGuideEdmExceptionUse#Framework_Exception_Handling) in order skip an event which is known to throw an exception of type "InvalidGlobalAlgBlkBxCollection"
from the module hltStage2GtDigis
.
Naively, I was expecting the job to skip the event and succeed. Instead, I see that the job fails because a different module on one EndPath throws a different exception (“ProductNotFound”) while attempting to access the products of hltStage2GtDigis
(which are likely not produced because hltStage2GtDigis
fails due to "InvalidGlobalAlgBlkBxCollection"
). The error message of the reproducer is in [2]. @fwyzard spotted that the message quotes "Begin IgnoreCompletely"
, and does not quote "Begin SkipEvent"
. A simple search leads me to this:
// If we are processing an endpath and the module was scheduled, treat SkipEvent or FailPath
// as IgnoreCompletely, so any subsequent OutputModules are still run.
// For unscheduled modules only treat FailPath as IgnoreCompletely but still allow SkipEvent to throw
One workaround is to include ProductNotFound
in options.skipEvent
.
Question: are there “better” ways ?
Context : this issue is related to https://github.com/cms-sw/cmssw/issues/41489#issuecomment-1532696126, as we look into the feasibility of using options.skipEvent
to avoid the frequent HLT crashes seen online these days due to the L1T unpacker (CMSLITOPS-411).
FYI: @silviodonato @cms-sw/hlt-l2
[1]
#!/bin/bash
# cmsrel CMSSW_13_0_5_patch1
# cd CMSSW_13_0_5_patch1/src
# cmsenv
INPUTF=/eos/cms/store/group/dpg_trigger/comm_trigger/TriggerStudiesGroup/FOG/error_stream/run366497/run366497_ls0196_index000095_fu-c2b01-26-01_pid1955211.raw
[ $# -eq 0 ] || INPUTF="${1}"
rm -rf run000000
mkdir run000000
hltConfigFromDB --runNumber 366469 > hlt.py
cat <<@EOF >> hlt.py
process.options.numberOfThreads = 1
process.options.numberOfStreams = 0
del process.MessageLogger
process.load('FWCore.MessageService.MessageLogger_cfi')
process.MessageLogger.cerr.FwkReport.reportEvery = 1
process.MessageLogger.cerr.enableStatistics = False
process.MessageLogger.cerr.threshold = 'INFO'
process.source.fileListMode = True
process.source.fileNames = [ "${INPUTF}" ]
process.options.SkipEvent = cms.untracked.vstring(
'InvalidGlobalAlgBlkBxCollection',
# 'ProductNotFound',
)
#del process.DQMHistograms
@EOF
cmsRun hlt.py #&> hlt.log
[2]
----- Begin IgnoreCompletely Exception 03-May-2023 18:48:06 CEST-----------------------
An exception of category 'InvalidGlobalAlgBlkBxCollection' occurred while
[0] Processing Event run: 366497 lumi: 196 event: 251484500 stream: 0
[1] Running path 'RatesMonitoring'
[2] Calling method for module L1TRawToDigi/'hltGtStage2Digis'
Exception Message:
The GlobalAlgBlk unpacker result vector is empty, but is not receiving the first expected header ID! This may be due to corrupted, or poorly formatted events.
uGTBoard: 0
BX: -2
First expected block: 33
Received block: 37
----- End IgnoreCompletely Exception -------------------------------------------------
%MSG-e L1TriggerJSONMonitoring: L1TriggerJSONMonitoring:hltL1TriggerJSONMonitoring 03-May-2023 18:48:06 CEST Run: 366497 Event: 251484500
L1 trigger results with label [hltGtStage2Digis] not present or invalid
%MSG
----- Begin Fatal Exception 03-May-2023 18:48:06 CEST-----------------------
An exception of category 'ProductNotFound' occurred while
[0] Processing Event run: 366497 lumi: 196 event: 251484500 stream: 0
[1] Running path 'DQMHistograms'
[2] Calling method for module TriggerBxMonitor/'hltTriggerBxMonitor'
Exception Message:
Principal::getByToken: Found zero products matching all criteria
Looking for type: BXVector<GlobalAlgBlk>
Looking for module label: hltGtStage2Digis
Looking for productInstanceName:
Additional Info:
[a] If you wish to continue processing events after a ProductNotFound exception,
add "SkipEvent = cms.untracked.vstring('ProductNotFound')" to the "options" PSet in the configuration.
----- End Fatal Exception -------------------------------------------------
%MSG-w FastMonitoringService: PostProcessPath 03-May-2023 18:48:06 CEST Run: 366497 Event: 251484500
STREAM 0 earlyTermination -: ID:run: 366497 lumi: 196 event: 251484500 LS:196 FromThisContext
%MSG
About this issue
- Original URL
- State: open
- Created a year ago
- Comments: 41 (41 by maintainers)
Yes. It also means any modules needed by those modules which created the data products to ‘keep’.
Not exactly. A Path which sees an exception (either because a module on the Path throws the exception OR an unscheduled module needed by the module on the Path throws an exception) will be marked as having an ‘error’ status. The OutputModule will only run if at least one of its Paths doesn’t see the exception and the Path succeeds as normal.
To illustrate, I’ve put together a small program using some dummy test modules. (NOTE: I modified AddIntsProducer so that it would ignore any data products which are missing from the Event).
The following is a trivialized representation of the HLT.
shouldTryToContinue()
so that even if one of the filters dependent on a module that fails, the globalTrigger will still run.when run, all the OutputModules write the 3 events except for the ‘trackOut’ which writes no events as the only path it depends upon never succeeds (i.e. trackPath is set to the error state for each Event).
From the summary we see