airflow: scheduler dies with "MySQLdb._exceptions.OperationalError: (1213, 'Deadlock found when trying to get lock; try restarting transaction')"

Apache Airflow version: 2.0.0

Kubernetes version (if you are using kubernetes) (use kubectl version):

Environment:

  • Cloud provider or hardware configuration: tencent cloud
  • OS (e.g. from /etc/os-release): centos7
  • Kernel (e.g. uname -a): 3.10
  • Install tools:
  • Others: Server version: 8.0.22 MySQL Community Server - GPL

What happened:

Scheduler dies when I try to restart it. And the logs are as follows:

{2021-01-14 13:29:05,424} {{scheduler_job.py:1293}} ERROR - Exception when executing SchedulerJob._run_scheduler_loop
Traceback (most recent call last):
  File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
    self.dialect.do_execute(
  File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 609, in do_execute
    cursor.execute(statement, parameters)
  File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/MySQLdb/cursors.py", line 209, in execute
    res = self._query(query)
  File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/MySQLdb/cursors.py", line 315, in _query
    db.query(q)
  File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/MySQLdb/connections.py", line 239, in query
    _mysql.connection.query(self, query)
MySQLdb._exceptions.OperationalError: (1213, 'Deadlock found when trying to get lock; try restarting transaction')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1275, in _execute
    self._run_scheduler_loop()
  File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1349, in _run_scheduler_loop
    self.adopt_or_reset_orphaned_tasks()
  File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/airflow/utils/session.py", line 65, in wrapper
    return func(*args, session=session, **kwargs)
  File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1758, in adopt_or_reset_orphaned_tasks
    session.query(SchedulerJob)
  File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 4063, in update
    update_op.exec_()
  File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/sqlalchemy/orm/persistence.py", line 1697, in exec_
    self._do_exec()
  File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/sqlalchemy/orm/persistence.py", line 1895, in _do_exec
    self._execute_stmt(update_stmt)
  File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/sqlalchemy/orm/persistence.py", line 1702, in _execute_stmt
    self.result = self.query._execute_crud(stmt, self.mapper)
  File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3568, in _execute_crud
    return conn.execute(stmt, self._params)
  File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1011, in execute
    return meth(self, multiparams, params)
  File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
    ret = self._execute_context(
  File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
    self._handle_dbapi_exception(
  File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
    util.raise_(
  File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
    self.dialect.do_execute(
  File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 609, in do_execute
    cursor.execute(statement, parameters)
  File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/MySQLdb/cursors.py", line 209, in execute
    res = self._query(query)
  File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/MySQLdb/cursors.py", line 315, in _query
    db.query(q)
  File "/home/app/.pyenv/versions/3.8.1/envs/airflow-py381/lib/python3.8/site-packages/MySQLdb/connections.py", line 239, in query
    _mysql.connection.query(self, query)
sqlalchemy.exc.OperationalError: (MySQLdb._exceptions.OperationalError) (1213, 'Deadlock found when trying to get lock; try restarting transaction')
[SQL: UPDATE job SET state=%s WHERE job.state = %s AND job.latest_heartbeat < %s]
[parameters: ('failed', 'running', datetime.datetime(2021, 1, 14, 5, 28, 35, 157941))]
(Background on this error at: http://sqlalche.me/e/13/e3q8)
{2021-01-14 13:29:06,435} {{process_utils.py:95}} INFO - Sending Signals.SIGTERM to GPID 6293
{2021-01-14 13:29:06,677} {{process_utils.py:61}} INFO - Process psutil.Process(pid=6318, status='terminated') (6318) terminated with exit code None
{2021-01-14 13:29:06,767} {{process_utils.py:201}} INFO - Waiting up to 5 seconds for processes to exit...
{2021-01-14 13:29:06,850} {{process_utils.py:61}} INFO - Process psutil.Process(pid=6320, status='terminated') (6320) terminated with exit code None
{2021-01-14 13:29:06,850} {{process_utils.py:61}} INFO - Process psutil.Process(pid=6319, status='terminated') (6319) terminated with exit code None
{2021-01-14 13:29:06,858} {{process_utils.py:61}} INFO - Process psutil.Process(pid=6321, status='terminated') (6321) terminated with exit code None
{2021-01-14 13:29:06,864} {{process_utils.py:201}} INFO - Waiting up to 5 seconds for processes to exit...
{2021-01-14 13:29:06,876} {{process_utils.py:61}} INFO - Process psutil.Process(pid=6293, status='terminated') (6293) terminated with exit code 0
{2021-01-14 13:29:06,876} {{scheduler_job.py:1296}} INFO - Exited execute loop

What you expected to happen:

Schdeduler should not die.

How to reproduce it:

I don’t know how to reproduce it

Anything else we need to know:

I just upgrade airflow from 1.10.14. Now I try to fix it temporarily by catching the exception in scheduler_job.py

for dag_run in dag_runs:
    try:
        self._schedule_dag_run(dag_run, active_runs_by_dag_id.get(dag_run.dag_id, set()), session)
    except Exception as e:
        self.log.exception(e)

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 16 (9 by maintainers)

Commits related to this issue

Most upvoted comments

Upgrade. You are 4 years behind. 1.10 is end of life for almost 3 years. You are one of the last few users out there and there is no-one who would be able to help you. There is no other way it can be addressed.