airflow: scheduler dies with "MySQLdb._exceptions.OperationalError: (1213, 'Deadlock found when trying to get lock; try restarting transaction')"
Apache Airflow version: 2.0.0
Kubernetes version (if you are using kubernetes) (use kubectl version
):
Environment:
- Cloud provider or hardware configuration: tencent cloud
- OS (e.g. from /etc/os-release): centos7
- Kernel (e.g.
uname -a
): 3.10 - Install tools:
- Others: Server version: 8.0.22 MySQL Community Server - GPL
What happened:
Scheduler dies when I try to restart it. And the logs are as follows:
{2021-01-14 13:29:05,424} {{scheduler_job.py:1293}} ERROR - Exception when executing SchedulerJob._run_scheduler_loop
Traceback (most recent call last):
File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
self.dialect.do_execute(
File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 609, in do_execute
cursor.execute(statement, parameters)
File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/MySQLdb/cursors.py", line 209, in execute
res = self._query(query)
File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/MySQLdb/cursors.py", line 315, in _query
db.query(q)
File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/MySQLdb/connections.py", line 239, in query
_mysql.connection.query(self, query)
MySQLdb._exceptions.OperationalError: (1213, 'Deadlock found when trying to get lock; try restarting transaction')
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1275, in _execute
self._run_scheduler_loop()
File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1349, in _run_scheduler_loop
self.adopt_or_reset_orphaned_tasks()
File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/airflow/utils/session.py", line 65, in wrapper
return func(*args, session=session, **kwargs)
File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1758, in adopt_or_reset_orphaned_tasks
session.query(SchedulerJob)
File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 4063, in update
update_op.exec_()
File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/sqlalchemy/orm/persistence.py", line 1697, in exec_
self._do_exec()
File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/sqlalchemy/orm/persistence.py", line 1895, in _do_exec
self._execute_stmt(update_stmt)
File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/sqlalchemy/orm/persistence.py", line 1702, in _execute_stmt
self.result = self.query._execute_crud(stmt, self.mapper)
File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3568, in _execute_crud
return conn.execute(stmt, self._params)
File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1011, in execute
return meth(self, multiparams, params)
File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
ret = self._execute_context(
File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
self._handle_dbapi_exception(
File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
util.raise_(
File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
raise exception
File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
self.dialect.do_execute(
File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 609, in do_execute
cursor.execute(statement, parameters)
File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/MySQLdb/cursors.py", line 209, in execute
res = self._query(query)
File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/MySQLdb/cursors.py", line 315, in _query
db.query(q)
File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/MySQLdb/connections.py", line 239, in query
_mysql.connection.query(self, query)
sqlalchemy.exc.OperationalError: (MySQLdb._exceptions.OperationalError) (1213, 'Deadlock found when trying to get lock; try restarting transaction')
[SQL: UPDATE job SET state=%s WHERE job.state = %s AND job.latest_heartbeat < %s]
[parameters: ('failed', 'running', datetime.datetime(2021, 1, 14, 5, 28, 35, 157941))]
(Background on this error at: http://sqlalche.me/e/13/e3q8)
{2021-01-14 13:29:06,435} {{process_utils.py:95}} INFO - Sending Signals.SIGTERM to GPID 6293
{2021-01-14 13:29:06,677} {{process_utils.py:61}} INFO - Process psutil.Process(pid=6318, status='terminated') (6318) terminated with exit code None
{2021-01-14 13:29:06,767} {{process_utils.py:201}} INFO - Waiting up to 5 seconds for processes to exit...
{2021-01-14 13:29:06,850} {{process_utils.py:61}} INFO - Process psutil.Process(pid=6320, status='terminated') (6320) terminated with exit code None
{2021-01-14 13:29:06,850} {{process_utils.py:61}} INFO - Process psutil.Process(pid=6319, status='terminated') (6319) terminated with exit code None
{2021-01-14 13:29:06,858} {{process_utils.py:61}} INFO - Process psutil.Process(pid=6321, status='terminated') (6321) terminated with exit code None
{2021-01-14 13:29:06,864} {{process_utils.py:201}} INFO - Waiting up to 5 seconds for processes to exit...
{2021-01-14 13:29:06,876} {{process_utils.py:61}} INFO - Process psutil.Process(pid=6293, status='terminated') (6293) terminated with exit code 0
{2021-01-14 13:29:06,876} {{scheduler_job.py:1296}} INFO - Exited execute loop
What you expected to happen:
Schdeduler should not die.
How to reproduce it:
I don’t know how to reproduce it
Anything else we need to know:
I just upgrade airflow from 1.10.14. Now I try to fix it temporarily by catching the exception in scheduler_job.py
for dag_run in dag_runs:
try:
self._schedule_dag_run(dag_run, active_runs_by_dag_id.get(dag_run.dag_id, set()), session)
except Exception as e:
self.log.exception(e)
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 16 (9 by maintainers)
Commits related to this issue
- Disable row level locking for Mariadb and MySQL <8 (#14031) closes #11899 closes #13668 This PR disable row-level locking for MySQL variants that do not support skip_locked and no_wait -- MySQL <... — committed to apache/airflow by kaxil 3 years ago
- Disable row level locking for Mariadb and MySQL <8 (#14031) closes #11899 closes #13668 This PR disable row-level locking for MySQL variants that do not support skip_locked and no_wait -- MySQL < 8 ... — committed to apache/airflow by kaxil 3 years ago
- Retry critical methods in Scheduler loop in case of OperationalError (#14032) In the case of OperationalError (caused deadlocks, network blips), the scheduler will now retry those methods 3 times. ... — committed to apache/airflow by kaxil 3 years ago
- Retry critical methods in Scheduler loop in case of OperationalError (#14032) In the case of OperationalError (caused deadlocks, network blips), the scheduler will now retry those methods 3 times. c... — committed to apache/airflow by kaxil 3 years ago
- Disable row level locking for Mariadb and MySQL <8 (#14031) closes #11899 closes #13668 This PR disable row-level locking for MySQL variants that do not support skip_locked and no_wait -- MySQL < 8 ... — committed to apache/airflow by kaxil 3 years ago
- Retry critical methods in Scheduler loop in case of OperationalError (#14032) In the case of OperationalError (caused deadlocks, network blips), the scheduler will now retry those methods 3 times. c... — committed to apache/airflow by kaxil 3 years ago
- Disable row level locking for Mariadb and MySQL <8 (#14031) closes #11899 closes #13668 This PR disable row-level locking for MySQL variants that do not support skip_locked and no_wait -- MySQL < 8 ... — committed to astronomer/airflow by kaxil 3 years ago
Upgrade. You are 4 years behind. 1.10 is end of life for almost 3 years. You are one of the last few users out there and there is no-one who would be able to help you. There is no other way it can be addressed.