cmssw: ROOT files with duplicated GUIDs observed on production T0 replay workflows
This is related to a WMCore issue:
https://github.com/dmwm/WMCore/issues/10870
Bug description When deploying T0 replays with a significant amount of jobs, one of the WMCore components fail complaining about duplicated LFNs. Our LFN patterns look like this:
/store/unmerged/HG2202_Val/RelValProdMinBias/GEN-SIM/HG2202_Val_OLD_Alanv4-v22/00000/2AE85F14-94A1-EC11-BBF5-FA163EC7AA59.root
where: 2AE85F14-94A1-EC11-BBF5-FA163EC7AA59 is the GUID extracted from the ROOT file through the framework XML job report.
So we are basically observing 2 different jobs generating files with the same GUID. We get the GUID from the framework XML job report here:
And since the GUID from the FW report seems to be generated here: https://github.com/cms-sw/cmssw/blob/master/FWCore/Utilities/src/Guid.cc#L18-L28
I’m reporting the issue here.
How to reproduce Deploy a Tier0 replay with a significant amount of jobs. I think @germanfgv can help with this if needed. At least one incident has been reported per week lately this year.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 41 (31 by maintainers)
CMSSW_12_2_3 is going to be built: https://github.com/cms-sw/cmssw/issues/37433
I think it would be better to proceed without #37417 (even if it would lead to two separate patch releases).