cmssw: Bug in frontier client leading to failed CMS jobs at a KIT subsite
Dear CMSSW experts,
At one of our subsites at KIT, we have encountered a number of failed jobs (example), which have the following error:
Setting up Frontier log level
Beginning CMSSW wrapper script
slc7_amd64_gcc700 scramv1 CMSSW
Performing SCRAM setup...
Completed SCRAM setup
Retrieving SCRAM project...
Completed SCRAM project
Executing CMSSW
cmsRun -j FrameworkJobReport.xml PSet.py
%MSG-i ThreadStreamSetup: (NoModuleName) 04-Feb-2023 08:56:27 UTC pre-events
setting # threads 4
setting # streams 4
%MSG
error [fn-urlparse.c:59]: config error: bad url 10.3.0.123:3128
error [fn-urlparse.c:59]: config error: bad url 10.3.0.123:3128
error [fn-urlparse.c:59]: config error: bad url 10.3.0.123:3128
error [fn-urlparse.c:59]: config error: bad url 10.3.0.123:3128
error [fn-urlparse.c:59]: config error: bad url 10.3.0.123:3128
error [fn-urlparse.c:59]: config error: bad url 10.3.0.123:3128
error [fn-urlparse.c:59]: config error: bad url 10.3.0.123:3128
----- Begin Fatal Exception 04-Feb-2023 08:57:39 UTC-----------------------
An exception of category 'StdException' occurred while
[0] Constructing the EventProcessor
[1] Constructing ESSource: class=PoolDBESSource label='GlobalTag'
Exception Message:
A std::exception was thrown.
Connection on "frontier://(loadbalance=proxies)(proxyconfigurl=file:///etc/wpad.dat)(backupproxyurl=http://cmsbpfrontier.cern.ch:3128)(backupproxyurl=http://cmsbproxy.fnal.gov:3128)(serverurl=http://cmsfrontier.cern.ch:8000/FrontierProd)(serverurl=http://cmsfrontier1.cern.ch:8000/FrontierProd)(serverurl=http://cmsfrontier2.cern.ch:8000/FrontierProd)(serverurl=http://cmsfrontier3.cern.ch:8000/FrontierProd)/CMS_CONDITIONS" cannot be established ( CORAL : "ConnectionPool::getSessionFromNewConnection" from "CORAL/Services/ConnectionService" )
----- End Fatal Exception -------------------------------------------------
Complete
process id is 9962 status is 66
This has happened for jobs running with CMSSW_10_2_16_UL release.
According to our investigation, it seems to be a bug in frontier client in tag cms/2.8.20:
The client expects a http in front of the IP or hostname, which is contrary to what is written in the PAC specification.
It is fixed in tag cms/2.9.1
So we assume, that CMS would require a new patch release picking up the new tag.
Thank you very much in advance for having a look into this.
Artur Gottmann
About this issue
- Original URL
- State: open
- Created a year ago
- Comments: 18 (15 by maintainers)
@makortel if there is the need to build a new release we will. Right now is probably a not so crowded period release-wise, and we can do so.
The exact release to be built depends on the exact needs. As far as I can see, all updates added on top of 9_4_16 either add new features, or improve the procedures without affecting their physics content. As such, if I have to build a new release, I would rather opt for making a 9_4_22 with the top of the HEAD, and then a UL version of it.
In any case, I would do so if and only if @cms-sw/pdmv-l2 really plans to submit new workflows with it.