cmssw: ROOT files with duplicated GUIDs observed on production T0 replay workflows

This is related to a WMCore issue:

https://github.com/dmwm/WMCore/issues/10870

Bug description When deploying T0 replays with a significant amount of jobs, one of the WMCore components fail complaining about duplicated LFNs. Our LFN patterns look like this:

/store/unmerged/HG2202_Val/RelValProdMinBias/GEN-SIM/HG2202_Val_OLD_Alanv4-v22/00000/2AE85F14-94A1-EC11-BBF5-FA163EC7AA59.root

where: 2AE85F14-94A1-EC11-BBF5-FA163EC7AA59 is the GUID extracted from the ROOT file through the framework XML job report.

So we are basically observing 2 different jobs generating files with the same GUID. We get the GUID from the framework XML job report here:

And since the GUID from the FW report seems to be generated here: https://github.com/cms-sw/cmssw/blob/master/FWCore/Utilities/src/Guid.cc#L18-L28

I’m reporting the issue here.

How to reproduce Deploy a Tier0 replay with a significant amount of jobs. I think @germanfgv can help with this if needed. At least one incident has been reported per week lately this year.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 41 (31 by maintainers)

Most upvoted comments

CMSSW_12_2_3 is going to be built: https://github.com/cms-sw/cmssw/issues/37433

I think it would be better to proceed without #37417 (even if it would lead to two separate patch releases).