cmssw: Bug in frontier client leading to failed CMS jobs at a KIT subsite

Dear CMSSW experts,

At one of our subsites at KIT, we have encountered a number of failed jobs (example), which have the following error:

Setting up Frontier log level
Beginning CMSSW wrapper script
 slc7_amd64_gcc700 scramv1 CMSSW
Performing SCRAM setup...
Completed SCRAM setup
Retrieving SCRAM project...
Completed SCRAM project
Executing CMSSW
cmsRun  -j FrameworkJobReport.xml PSet.py
%MSG-i ThreadStreamSetup:  (NoModuleName) 04-Feb-2023 08:56:27 UTC pre-events
setting # threads 4
setting # streams 4
%MSG
error [fn-urlparse.c:59]: config error: bad url 10.3.0.123:3128
error [fn-urlparse.c:59]: config error: bad url 10.3.0.123:3128
error [fn-urlparse.c:59]: config error: bad url 10.3.0.123:3128
error [fn-urlparse.c:59]: config error: bad url 10.3.0.123:3128
error [fn-urlparse.c:59]: config error: bad url 10.3.0.123:3128
error [fn-urlparse.c:59]: config error: bad url 10.3.0.123:3128
error [fn-urlparse.c:59]: config error: bad url 10.3.0.123:3128
----- Begin Fatal Exception 04-Feb-2023 08:57:39 UTC-----------------------
An exception of category 'StdException' occurred while
   [0] Constructing the EventProcessor
   [1] Constructing ESSource: class=PoolDBESSource label='GlobalTag'
Exception Message:
A std::exception was thrown.
Connection on "frontier://(loadbalance=proxies)(proxyconfigurl=file:///etc/wpad.dat)(backupproxyurl=http://cmsbpfrontier.cern.ch:3128)(backupproxyurl=http://cmsbproxy.fnal.gov:3128)(serverurl=http://cmsfrontier.cern.ch:8000/FrontierProd)(serverurl=http://cmsfrontier1.cern.ch:8000/FrontierProd)(serverurl=http://cmsfrontier2.cern.ch:8000/FrontierProd)(serverurl=http://cmsfrontier3.cern.ch:8000/FrontierProd)/CMS_CONDITIONS" cannot be established ( CORAL : "ConnectionPool::getSessionFromNewConnection" from "CORAL/Services/ConnectionService" )
----- End Fatal Exception -------------------------------------------------
Complete
process id is 9962 status is 66

This has happened for jobs running with CMSSW_10_2_16_UL release.

According to our investigation, it seems to be a bug in frontier client in tag cms/2.8.20:

https://github.com/cms-externals/frontier_client/blob/e96f07fe14a188580470cbbd27ad3fc9b458b5ca/http/fn-urlparse.c#L57-L62

The client expects a http in front of the IP or hostname, which is contrary to what is written in the PAC specification. It is fixed in tag cms/2.9.1

So we assume, that CMS would require a new patch release picking up the new tag.

Thank you very much in advance for having a look into this.

Artur Gottmann

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Comments: 18 (15 by maintainers)

Most upvoted comments

@makortel if there is the need to build a new release we will. Right now is probably a not so crowded period release-wise, and we can do so.

The exact release to be built depends on the exact needs. As far as I can see, all updates added on top of 9_4_16 either add new features, or improve the procedures without affecting their physics content. As such, if I have to build a new release, I would rather opt for making a 9_4_22 with the top of the HEAD, and then a UL version of it.

In any case, I would do so if and only if @cms-sw/pdmv-l2 really plans to submit new workflows with it.