cmssw: how to skip events which throw a `cms::Exception`

The reproducer in [1] (CMSSW_13_0_5_patch1, input file on lxplus) tries to use options.skipEvent (documented, for example, in SWGuideEdmExceptionUse#Framework_Exception_Handling) in order skip an event which is known to throw an exception of type "InvalidGlobalAlgBlkBxCollection" from the module hltStage2GtDigis.

Naively, I was expecting the job to skip the event and succeed. Instead, I see that the job fails because a different module on one EndPath throws a different exception (“ProductNotFound”) while attempting to access the products of hltStage2GtDigis (which are likely not produced because hltStage2GtDigis fails due to "InvalidGlobalAlgBlkBxCollection"). The error message of the reproducer is in [2]. @fwyzard spotted that the message quotes "Begin IgnoreCompletely", and does not quote "Begin SkipEvent". A simple search leads me to this:

      // If we are processing an endpath and the module was scheduled, treat SkipEvent or FailPath
      // as IgnoreCompletely, so any subsequent OutputModules are still run.
      // For unscheduled modules only treat FailPath as IgnoreCompletely but still allow SkipEvent to throw

One workaround is to include ProductNotFound in options.skipEvent.

Question: are there “better” ways ?

Context : this issue is related to https://github.com/cms-sw/cmssw/issues/41489#issuecomment-1532696126, as we look into the feasibility of using options.skipEvent to avoid the frequent HLT crashes seen online these days due to the L1T unpacker (CMSLITOPS-411).

FYI: @silviodonato @cms-sw/hlt-l2

[1]

#!/bin/bash

# cmsrel CMSSW_13_0_5_patch1
# cd CMSSW_13_0_5_patch1/src
# cmsenv

INPUTF=/eos/cms/store/group/dpg_trigger/comm_trigger/TriggerStudiesGroup/FOG/error_stream/run366497/run366497_ls0196_index000095_fu-c2b01-26-01_pid1955211.raw
[ $# -eq 0 ] || INPUTF="${1}"

rm -rf run000000
mkdir run000000

hltConfigFromDB --runNumber 366469 > hlt.py

cat <<@EOF >> hlt.py
process.options.numberOfThreads = 1
process.options.numberOfStreams = 0

del process.MessageLogger
process.load('FWCore.MessageService.MessageLogger_cfi')
process.MessageLogger.cerr.FwkReport.reportEvery = 1
process.MessageLogger.cerr.enableStatistics = False
process.MessageLogger.cerr.threshold = 'INFO'

process.source.fileListMode = True
process.source.fileNames = [ "${INPUTF}" ]

process.options.SkipEvent = cms.untracked.vstring(
  'InvalidGlobalAlgBlkBxCollection',
#  'ProductNotFound',
)

#del process.DQMHistograms
@EOF

cmsRun hlt.py #&> hlt.log

[2]

----- Begin IgnoreCompletely Exception 03-May-2023 18:48:06 CEST-----------------------
An exception of category 'InvalidGlobalAlgBlkBxCollection' occurred while
   [0] Processing  Event run: 366497 lumi: 196 event: 251484500 stream: 0
   [1] Running path 'RatesMonitoring'
   [2] Calling method for module L1TRawToDigi/'hltGtStage2Digis'
Exception Message:
The GlobalAlgBlk unpacker result vector is empty, but is not receiving the first expected header ID! This may be due to corrupted, or poorly formatted events.
uGTBoard: 0
BX: -2
First expected block: 33
Received block: 37
----- End IgnoreCompletely Exception -------------------------------------------------
%MSG-e L1TriggerJSONMonitoring:   L1TriggerJSONMonitoring:hltL1TriggerJSONMonitoring 03-May-2023 18:48:06 CEST  Run: 366497 Event: 251484500
L1 trigger results with label [hltGtStage2Digis] not present or invalid
%MSG
----- Begin Fatal Exception 03-May-2023 18:48:06 CEST-----------------------
An exception of category 'ProductNotFound' occurred while
   [0] Processing  Event run: 366497 lumi: 196 event: 251484500 stream: 0
   [1] Running path 'DQMHistograms'
   [2] Calling method for module TriggerBxMonitor/'hltTriggerBxMonitor'
Exception Message:
Principal::getByToken: Found zero products matching all criteria
Looking for type: BXVector<GlobalAlgBlk>
Looking for module label: hltGtStage2Digis
Looking for productInstanceName: 

   Additional Info:
      [a] If you wish to continue processing events after a ProductNotFound exception,
add "SkipEvent = cms.untracked.vstring('ProductNotFound')" to the "options" PSet in the configuration.

----- End Fatal Exception -------------------------------------------------
%MSG-w FastMonitoringService:  PostProcessPath 03-May-2023 18:48:06 CEST  Run: 366497 Event: 251484500
 STREAM 0 earlyTermination -: ID:run: 366497 lumi: 196 event: 251484500 LS:196  FromThisContext 
%MSG

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Comments: 41 (41 by maintainers)

Most upvoted comments

In particular what does it mean that an output module depends on a module that throws an exception ? Does it mean that it has a matching keep statement ?

Yes. It also means any modules needed by those modules which created the data products to ‘keep’.

Does it mean that it has a SelectEvents entry for a Path that includes that module ?

Not exactly. A Path which sees an exception (either because a module on the Path throws the exception OR an unscheduled module needed by the module on the Path throws an exception) will be marked as having an ‘error’ status. The OutputModule will only run if at least one of its Paths doesn’t see the exception and the Path succeeds as normal.

To illustrate, I’ve put together a small program using some dummy test modules. (NOTE: I modified AddIntsProducer so that it would ignore any data products which are missing from the Event).

The following is a trivialized representation of the HLT.

  • It only have ‘Track’ and ‘Calo’ related paths.
  • There is a ‘globalTrigger’ data product which records to output of the Filters at the end of the two paths. This is marked as shouldTryToContinue() so that even if one of the filters dependent on a module that fails, the globalTrigger will still run.
  • The outputs consist of
    • one dependent only on trackPath
    • one dependent only on caloPath
    • one dependent on either trackPath or caloPath
    • one that doesn’t care about Paths and only cares about globalTrigger (to show how that work with exceptions)
    • one that only runs if there was an exception
  • The ‘Track’ path has an exception happen early on as part of the ‘track hits’.
import FWCore.ParameterSet.Config as cms

process = cms.Process("TEST")

process.source = cms.Source("EmptySource")

process.maxEvents.input = 3
#this is the type thrown by FailingProducer
process.options.TryToContinue = ["NotFound"]
process.options.wantSummary = True

process.trackingHits = cms.EDProducer("FailingProducer")
process.tracks = cms.EDProducer("AddIntsProducer", labels = cms.VInputTag("trackingHits"))
process.trackFilter = cms.EDFilter("IntProductFilter",
   label = cms.InputTag("tracks"),
   threshold = cms.int32(0),
   shouldProduce = cms.bool(True)
)

process.caloClusters = cms.EDProducer("IntProducer", ivalue = cms.int32(1))
process.caloFilter = cms.EDFilter("IntProductFilter",
   label = cms.InputTag("caloClusters"),
   threshold = cms.int32(0),
   shouldProduce = cms.bool(True)
)

process.globalTrigger = cms.EDProducer("AddIntsProducer", labels = cms.VInputTag("trackFilter","caloFilter"))
process.globalTrigger.shouldTryToContinue()

process.trackPath = cms.Path(process.trackingHits+process.tracks+process.trackFilter)
process.caloPath = cms.Path(process.caloClusters+process.caloFilter)
process.globalTriggerPath = cms.Path(process.globalTrigger)

outputTemplate_ = cms.OutputModule("AsciiOutputModule",
                                        outputCommands = cms.untracked.vstring("drop *", "keep edmTriggerResults_*_*_*"),
                                        SelectEvents = cms.untracked.PSet(SelectEvents = cms.vstring()))

process.trackOut = outputTemplate_.clone(SelectEvents = dict(SelectEvents=["trackPath"]))
process.caloOut = outputTemplate_.clone(SelectEvents = dict(SelectEvents=["caloPath"]))
process.trackAndCaloOut = outputTemplate_.clone(SelectEvents=dict(SelectEvents=["trackPath","caloPath"]))
process.globalTriggerOut = outputTemplate_.clone(outputCommands = ["drop *", "keep edmTriggerResults_*_*_*","keep *_globalTrigger__TEST"])
process.exceptionOut = outputTemplate_.clone(SelectEvents=dict(SelectEvents=["exception@*"]))

process.out = cms.EndPath(process.trackOut+process.caloOut+process.trackAndCaloOut+process.exceptionOut+process.globalTriggerOut)

when run, all the OutputModules write the 3 events except for the ‘trackOut’ which writes no events as the only path it depends upon never succeeds (i.e. trackPath is set to the error state for each Event).

From the summary we see

TrigReport ---------- Path   Summary ------------
TrigReport  Trig Bit#   Executed     Passed     Failed      Error Name
TrigReport     1    0          3          0          0          3 trackPath
TrigReport     1    1          3          3          0          0 caloPath
TrigReport     1    2          3          3          0          0 globalTriggerPath

TrigReport ---------- Module Summary ------------
TrigReport    Visited   Executed     Passed     Failed      Error Name
TrigReport          3          3          3          0          0 TriggerResults
TrigReport          3          3          3          0          0 caloClusters
TrigReport          3          3          3          0          0 caloFilter
TrigReport          3          3          3          0          0 caloOut
TrigReport          3          3          3          0          0 caloPath
TrigReport          3          3          3          0          0 exceptionOut
TrigReport          3          3          3          0          0 globalTrigger
TrigReport          3          3          3          0          0 globalTriggerOut
TrigReport          3          3          3          0          0 globalTriggerPath
TrigReport          3          3          3          0          0 out
TrigReport          3          3          3          0          0 trackAndCaloOut
TrigReport          3          0          0          0          3 trackFilter
TrigReport          3          3          3          0          0 trackOut
TrigReport          3          3          3          0          0 trackPath
TrigReport          3          3          0          0          3 trackingHits
TrigReport          3          0          0          0          3 tracks