rucio: Database Exceptions MariaDB: Commands out of sync; you can't run this command now

Motivation

I’ve recently started seeing various mysql exceptions.

First, listing file replicas in a dataset gives a database exception message in the client:

(ligo-rucio) ➜  ~ rucio list-file-replicas O1:L-L1_R-11266
2018-11-06 14:16:35,616	ERROR	Database exception.
Details: (_mysql_exceptions.ProgrammingError) (2014, "Commands out of sync; you can't run this command now") [SQL: u'SELECT rses.id AS rses_id, rses.rse AS rses_rse, rses.rse_type AS rses_rse_type, rses.`deterministic` AS rses_deterministic, rses.volatile AS rses_volatile, rses.staging_area AS rses_staging_area, rses.city AS rses_city, rses.region_code AS rses_region_code, rses.country_name AS rses_country_name, rses.continent AS rses_continent, rses.time_zone AS rses_time_zone, rses.`ISP` AS `rses_ISP`, rses.`ASN` AS `rses_ASN`, rses.longitude AS rses_longitude, rses.latitude AS rses_latitude, rses.availability AS rses_availability, rses.updated_at AS rses_updated_at, rses.created_at AS rses_created_at, rses.deleted AS rses_deleted, rses.deleted_at AS rses_deleted_at \nFROM rses \nWHERE rses.deleted = false AND rses.rse = %s'] [parameters: ('LIGO-WA',)] (Background on this error at: http://sqlalche.me/e/f405)

Next, the judge-cleaner log shows the following message several times a second:

2018-11-06 11:20:17,659 3158    DEBUG   rule_cleaner[0/0] index query time 0.002273 fetch size is 1
2018-11-06 11:20:17,660 3158    INFO    rule_cleaner[0/0]: Deleting rule eb8794198a854dcd88846ca95998385c with expression UNL
2018-11-06 11:20:17,703 3158    DEBUG   Deleting lock O1:H-H1_R-1126621184-64.gwf for rule eb8794198a854dcd88846ca95998385c
Exception _mysql_exceptions.InterfaceError: (0, '') in <bound method SSCursor.__del__ of <MySQLdb.cursors.SSCursor object at 0x7f8c531e8950>> ignored
(raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely) (_mysql_exceptions.ProgrammingError) (2014, "Commands out of sync; you can't run this command now") [SQL: u'DELETE FROM locks WHERE locks.scope = %s AND locks.name = %s AND locks.rule_id = %s AND locks.rse_id = %s'] [parameters: ('O1', 'H-H1_R-1126621184-64.gwf', '\xeb\x87\x94\x19\x8a\x85M\xcd\x88\x84l\xa9Y\x988\\', 'NI\t\xdc\xf5\x13I\xb0\x93\xb0\x1a\xf41\xb6\xf5\xc6')] (Background on this error at: http://sqlalche.me/e/f405)
2018-11-06 11:20:17,710 3158    ERROR   Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/rucio/daemons/judge/cleaner.py", line 104, in rule_cleaner
    delete_rule(rule_id=rule_id, nowait=True)
  File "/usr/lib/python2.7/site-packages/rucio/db/sqla/session.py", line 360, in new_funct
    raise DatabaseException(str(error))
DatabaseException: Database exception.
Details: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely) (_mysql_exceptions.ProgrammingError) (2014, "Commands out of sync; you can't run this command now") [SQL: u'DELETE FROM locks WHERE locks.scope = %s AND locks.name = %s AND locks.rule_id = %s AND locks.rse_id = %s'] [parameters: ('O1', 'H-H1_R-1126621184-64.gwf', '\xeb\x87\x94\x19\x8a\x85M\xcd\x88\x84l\xa9Y\x988\\', 'NI\t\xdc\xf5\x13I\xb0\x93\xb0\x1a\xf41\xb6\xf5\xc6')] (Background on this error at: http://sqlalche.me/e/f405)

Finally, the reaper log shows the following message which may or may not be related:

2018-11-02 13:59:22,418 2582    INFO    Reaper 0-1: Running on RSE LIGO-CIT-ARCHIVE None
2018-11-02 13:59:22,441 2582    DEBUG   Reaper 0-1: list_unlocked_replicas on LIGO-CIT-ARCHIVE for None bytes in 0.0183210372925 seconds: 1 replicas
2018-11-02 13:59:22,442 2582    CRITICAL        Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/rucio/daemons/reaper/reaper.py", line 215, in reaper
    prot = rsemgr.create_protocol(rse_info, 'delete', scheme=scheme)
  File "/usr/lib/python2.7/site-packages/rucio/rse/rsemanager.py", line 171, in create_protocol
    protocol_attr = select_protocol(rse_settings, operation, scheme, domain)
  File "/usr/lib/python2.7/site-packages/rucio/rse/rsemanager.py", line 146, in select_protocol
    candidates = _get_possible_protocols(rse_settings, operation, scheme, domain)
  File "/usr/lib/python2.7/site-packages/rucio/rse/rsemanager.py", line 122, in _get_possible_protocols
    ' found : %s.' % str(rse_settings))
RSEProtocolNotSupported: RSE does not support requested protocol.
Details: No protocol for provided settings found : {'third_party_copy_protocol': 1, 'rse_type': 'DISK', 'domain': ['lan', 'wan'], 'availability_delete': True, 'delete_protocol': 1, 'rse': u'LIGO-CIT-ARCHIVE', 'deterministic': True, 'write_protocol': 1, 'read_protocol': 1, 'staging_area': False, 'lfn2pfn_algorithm': 'ligo', 'availability_write': True, 'volatile': False, 'availability_read': True, 'credentials': None, 'verify_checksum': True, 'id': '65b733dd9afb4ec3beee2ac5c0fbc5c6', 'protocols': [{'extended_attributes': None, 'hostname': u'ldas-pcdev5.ligo.caltech.edu', 'prefix': u'/hdfs/frames', 'domains': {'wan': {'read': 1L, 'write': 0L, 'third_party_copy': 1L, 'delete': 0L}, 'lan': {'read': 0L, 'write': 0L, 'delete': 0L}}, 'scheme': u'gsiftp', 'port': 2811L, 'impl': u'rucio.rse.protocols.gfalv2.Default'}]}.

Note that these errors only seem to be affecting deletion of rules and replicas; actual transfer operations appear to be unaffected.

Modification

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 15 (12 by maintainers)

Commits related to this issue

Most upvoted comments

WIPAC was bitten by this bug when we fired a one-off run of rucio-judge-cleaner to delete a rule:

bash-4.2# rucio-judge-cleaner --run-once
/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py:552: Warning: Truncated incorrect DOUBLE value: 'fe946249600df76ab57e5331127514f2'
  cursor.execute(statement, parameters)
2020-04-17 16:43:47,845	21475	INFO	rule_cleaner[0/0]: Deleting rule 4472de8c65214c3991028b91ed2097ff with expression NERSC-LTA
Exception _mysql_exceptions.InterfaceError: (0, '') in <bound method SSCursor.__del__ of <MySQLdb.cursors.SSCursor object at 0x7f98709a5510>> ignored
(raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely)
(_mysql_exceptions.ProgrammingError) (2014, "Commands out of sync; you can't run this command now")
[SQL: DELETE FROM locks WHERE locks.scope = %s AND locks.name = %s AND locks.rule_id = %s AND locks.rse_id = %s]
[parameters: ('lta', '00757c0a62d211eabaa642cd496541f0.zip', 'Dr\xde\x8ce!L9\x91\x02\x8b\x91\xed \x97\xff', '\r\xb4\x1d\x9c\xd2\xc0D$\xbb\x0e\xfb"\xad\x99\xca\'')]
(Background on this error at: http://sqlalche.me/e/f405)
2020-04-17 16:43:47,888	21475	ERROR	Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/rucio/daemons/judge/cleaner.py", line 107, in rule_cleaner
    delete_rule(rule_id=rule_id, nowait=True)
  File "/usr/lib/python2.7/site-packages/rucio/db/sqla/session.py", line 365, in new_funct
    raise DatabaseException(str(error))
DatabaseException: Database exception.
Details: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely)
(_mysql_exceptions.ProgrammingError) (2014, "Commands out of sync; you can't run this command now")
[SQL: DELETE FROM locks WHERE locks.scope = %s AND locks.name = %s AND locks.rule_id = %s AND locks.rse_id = %s]
[parameters: ('lta', '00757c0a62d211eabaa642cd496541f0.zip', 'Dr\xde\x8ce!L9\x91\x02\x8b\x91\xed \x97\xff', '\r\xb4\x1d\x9c\xd2\xc0D$\xbb\x0e\xfb"\xad\x99\xca\'')]
(Background on this error at: http://sqlalche.me/e/f405)

Running Rucio 1.20.1.post1:

bash-4.2# rucio --version
rucio 1.20.1.post1

And MariaDB 10.3.22:

mysql Ver 15.1 Distrib 10.3.22-MariaDB, for Linux (x86_64) using readline 5.1

We are looking into using MariaDB 10.4, per the suggestion in #support on Slack. Will follow up with any results on this issue.