rucio: Database Exceptions MariaDB: Commands out of sync; you can't run this command now
Motivation
I’ve recently started seeing various mysql exceptions.
First, listing file replicas in a dataset gives a database exception message in the client:
(ligo-rucio) ➜ ~ rucio list-file-replicas O1:L-L1_R-11266
2018-11-06 14:16:35,616 ERROR Database exception.
Details: (_mysql_exceptions.ProgrammingError) (2014, "Commands out of sync; you can't run this command now") [SQL: u'SELECT rses.id AS rses_id, rses.rse AS rses_rse, rses.rse_type AS rses_rse_type, rses.`deterministic` AS rses_deterministic, rses.volatile AS rses_volatile, rses.staging_area AS rses_staging_area, rses.city AS rses_city, rses.region_code AS rses_region_code, rses.country_name AS rses_country_name, rses.continent AS rses_continent, rses.time_zone AS rses_time_zone, rses.`ISP` AS `rses_ISP`, rses.`ASN` AS `rses_ASN`, rses.longitude AS rses_longitude, rses.latitude AS rses_latitude, rses.availability AS rses_availability, rses.updated_at AS rses_updated_at, rses.created_at AS rses_created_at, rses.deleted AS rses_deleted, rses.deleted_at AS rses_deleted_at \nFROM rses \nWHERE rses.deleted = false AND rses.rse = %s'] [parameters: ('LIGO-WA',)] (Background on this error at: http://sqlalche.me/e/f405)
Next, the judge-cleaner log shows the following message several times a second:
2018-11-06 11:20:17,659 3158 DEBUG rule_cleaner[0/0] index query time 0.002273 fetch size is 1
2018-11-06 11:20:17,660 3158 INFO rule_cleaner[0/0]: Deleting rule eb8794198a854dcd88846ca95998385c with expression UNL
2018-11-06 11:20:17,703 3158 DEBUG Deleting lock O1:H-H1_R-1126621184-64.gwf for rule eb8794198a854dcd88846ca95998385c
Exception _mysql_exceptions.InterfaceError: (0, '') in <bound method SSCursor.__del__ of <MySQLdb.cursors.SSCursor object at 0x7f8c531e8950>> ignored
(raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely) (_mysql_exceptions.ProgrammingError) (2014, "Commands out of sync; you can't run this command now") [SQL: u'DELETE FROM locks WHERE locks.scope = %s AND locks.name = %s AND locks.rule_id = %s AND locks.rse_id = %s'] [parameters: ('O1', 'H-H1_R-1126621184-64.gwf', '\xeb\x87\x94\x19\x8a\x85M\xcd\x88\x84l\xa9Y\x988\\', 'NI\t\xdc\xf5\x13I\xb0\x93\xb0\x1a\xf41\xb6\xf5\xc6')] (Background on this error at: http://sqlalche.me/e/f405)
2018-11-06 11:20:17,710 3158 ERROR Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/rucio/daemons/judge/cleaner.py", line 104, in rule_cleaner
delete_rule(rule_id=rule_id, nowait=True)
File "/usr/lib/python2.7/site-packages/rucio/db/sqla/session.py", line 360, in new_funct
raise DatabaseException(str(error))
DatabaseException: Database exception.
Details: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely) (_mysql_exceptions.ProgrammingError) (2014, "Commands out of sync; you can't run this command now") [SQL: u'DELETE FROM locks WHERE locks.scope = %s AND locks.name = %s AND locks.rule_id = %s AND locks.rse_id = %s'] [parameters: ('O1', 'H-H1_R-1126621184-64.gwf', '\xeb\x87\x94\x19\x8a\x85M\xcd\x88\x84l\xa9Y\x988\\', 'NI\t\xdc\xf5\x13I\xb0\x93\xb0\x1a\xf41\xb6\xf5\xc6')] (Background on this error at: http://sqlalche.me/e/f405)
Finally, the reaper log shows the following message which may or may not be related:
2018-11-02 13:59:22,418 2582 INFO Reaper 0-1: Running on RSE LIGO-CIT-ARCHIVE None
2018-11-02 13:59:22,441 2582 DEBUG Reaper 0-1: list_unlocked_replicas on LIGO-CIT-ARCHIVE for None bytes in 0.0183210372925 seconds: 1 replicas
2018-11-02 13:59:22,442 2582 CRITICAL Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/rucio/daemons/reaper/reaper.py", line 215, in reaper
prot = rsemgr.create_protocol(rse_info, 'delete', scheme=scheme)
File "/usr/lib/python2.7/site-packages/rucio/rse/rsemanager.py", line 171, in create_protocol
protocol_attr = select_protocol(rse_settings, operation, scheme, domain)
File "/usr/lib/python2.7/site-packages/rucio/rse/rsemanager.py", line 146, in select_protocol
candidates = _get_possible_protocols(rse_settings, operation, scheme, domain)
File "/usr/lib/python2.7/site-packages/rucio/rse/rsemanager.py", line 122, in _get_possible_protocols
' found : %s.' % str(rse_settings))
RSEProtocolNotSupported: RSE does not support requested protocol.
Details: No protocol for provided settings found : {'third_party_copy_protocol': 1, 'rse_type': 'DISK', 'domain': ['lan', 'wan'], 'availability_delete': True, 'delete_protocol': 1, 'rse': u'LIGO-CIT-ARCHIVE', 'deterministic': True, 'write_protocol': 1, 'read_protocol': 1, 'staging_area': False, 'lfn2pfn_algorithm': 'ligo', 'availability_write': True, 'volatile': False, 'availability_read': True, 'credentials': None, 'verify_checksum': True, 'id': '65b733dd9afb4ec3beee2ac5c0fbc5c6', 'protocols': [{'extended_attributes': None, 'hostname': u'ldas-pcdev5.ligo.caltech.edu', 'prefix': u'/hdfs/frames', 'domains': {'wan': {'read': 1L, 'write': 0L, 'third_party_copy': 1L, 'delete': 0L}, 'lan': {'read': 0L, 'write': 0L, 'delete': 0L}}, 'scheme': u'gsiftp', 'port': 2811L, 'impl': u'rucio.rse.protocols.gfalv2.Default'}]}.
Note that these errors only seem to be affecting deletion of rules and replicas; actual transfer operations appear to be unaffected.
Modification
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 15 (12 by maintainers)
Commits related to this issue
- Testing: Add MariaDB 10 in testing; #1759 — committed to bziemons/rucio by bziemons 3 years ago
- Testing: Add MariaDB 10 in testing; #1759 — committed to bziemons/rucio by bziemons 3 years ago
- Testing: Add MariaDB 10 in testing; #1759 — committed to bziemons/rucio by bziemons 3 years ago
- Testing: Add MariaDB 10 in testing; #1759 — committed to bziemons/rucio by bziemons 3 years ago
- Testing: Add MariaDB 10 in testing; #1759 — committed to bziemons/rucio by bziemons 3 years ago
- fixup! Testing: Add MariaDB 10 in testing; #1759 — committed to bziemons/rucio by bziemons 3 years ago
- fixup! Testing: Add MariaDB 10 in testing; #1759 — committed to bziemons/rucio by bziemons 3 years ago
- fixup! Testing: Add MariaDB 10 in testing; #1759 — committed to bziemons/rucio by bziemons 3 years ago
- fixup! Testing: Add MariaDB 10 in testing; #1759 — committed to bziemons/rucio by bziemons 3 years ago
- fixup! Testing: Add MariaDB 10 in testing; #1759 — committed to bziemons/rucio by bziemons 3 years ago
- Testing: Add MariaDB 10 in testing; #1759 — committed to bziemons/rucio by bziemons 3 years ago
- Testing: Add MariaDB 10 in testing; #1759 — committed to bziemons/rucio by bziemons 3 years ago
- Testing: Add MariaDB 10 in testing; #1759 — committed to bziemons/rucio by bziemons 3 years ago
- Testing: Add MariaDB 10 in testing; #1759 — committed to bziemons/rucio by bziemons 3 years ago
- Testing: Add MariaDB 10 in testing; #1759 — committed to bziemons/rucio by bziemons 3 years ago
- Testing: Add MariaDB 10 in testing; #1759 — committed to bziemons/rucio by bziemons 3 years ago
- Testing: Add MariaDB 10 in testing; #1759 — committed to bziemons/rucio by bziemons 3 years ago
- Testing: Add MariaDB 10 in testing; #1759 — committed to bziemons/rucio by bziemons 3 years ago
- Testing: Add MariaDB 10 in testing; #1759 — committed to bziemons/rucio by bziemons 3 years ago
- Testing: Add MariaDB 10 in testing; #1759 — committed to bziemons/rucio by bziemons 3 years ago
WIPAC was bitten by this bug when we fired a one-off run of
rucio-judge-cleanerto delete a rule:Running Rucio 1.20.1.post1:
And MariaDB 10.3.22:
We are looking into using MariaDB 10.4, per the suggestion in #support on Slack. Will follow up with any results on this issue.