airflow: `airflow db upgrade` Failed to write serialized DAG

Apache Airflow version

2.4.1

What happened

Running airflow db upgrade on an Airflow installation with 100 DAGs fails with this error:

ERROR [airflow.models.dagbag.DagBag] Failed to write serialized DAG: /usr/local/airflow/dags/REDACTED.py
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/airflow/models/dagbag.py", line 615, in _serialize_dag_capturing_errors
    dag_was_updated = SerializedDagModel.write_dag(
  File "/usr/local/lib/python3.9/site-packages/airflow/utils/session.py", line 72, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/airflow/models/serialized_dag.py", line 146, in write_dag
    session.query(literal(True))
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/query.py", line 2810, in first
    return self.limit(1)._iter().first()
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/query.py", line 2894, in _iter
    result = self.session.execute(
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 1688, in execute
    conn = self._connection_for_bind(bind)
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 1529, in _connection_for_bind
    return self._transaction._connection_for_bind(
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 721, in _connection_for_bind
    self._assert_active()
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 601, in _assert_active
    raise sa_exc.PendingRollbackError(
sqlalchemy.exc.PendingRollbackError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "serialized_dag_pkey"
DETAIL:  Key (dag_id)=(REDACTED) already exists.

[SQL: INSERT INTO serialized_dag (dag_id, fileloc, fileloc_hash, data, data_compressed, last_updated, dag_hash, ...

What you think should happen instead

airflow db upgrade should successfully reserialize DAGs at the end of the upgrade just like the airflow dags reserialize command.

How to reproduce

  1. Upgrade to airflow 2.4.1 on an existing codebase
  2. Run airflow db upgrade

Operating System

Debian GNU/Linux 10 (buster)

Versions of Apache Airflow Providers

apache-airflow-providers-amazon==5.1.0
apache-airflow-providers-celery==3.0.0
apache-airflow-providers-cncf-kubernetes==4.3.0
apache-airflow-providers-common-sql==1.2.0
apache-airflow-providers-datadog==3.0.0
apache-airflow-providers-ftp==3.1.0
apache-airflow-providers-http==4.0.0
apache-airflow-providers-imap==3.0.0
apache-airflow-providers-postgres==5.2.1
apache-airflow-providers-redis==3.0.0
apache-airflow-providers-sendgrid==3.0.0
apache-airflow-providers-sftp==4.0.0
apache-airflow-providers-slack==5.1.0
apache-airflow-providers-sqlite==3.2.1
apache-airflow-providers-ssh==3.1.0

Deployment

Other Docker-based deployment

Deployment details

k8s deployment

Anything else

Fails consistently in these two scenarios:

  1. Run db upgrade only:

     airflow db upgrade
    
  2. Run along with reserialize

     airflow dags reserialize --clear-only
     airflow db upgrade
    

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 1
  • Comments: 19 (16 by maintainers)

Most upvoted comments

Ah so airflow db check-migrations -t 0 || airflow db upgrade || true would work then.

Today we upgraded to airflow 2.4.2 - we did not notice this issue during the migration this time around.

We are using the official helm chart, so the migration occurred on deploy via the migration job. We run 900 + dags currently

When we upgraded to airflow 2.4.1 the migration took >20 minutes. After upgrading to airflow 2.4.2 the migration took < 2 minutes.

If there are other data points needed I am happy to help provide some.

(I think even -t 0 is not needed)

Also it may be a good idea for us to add something like airflow db up-to-date-check (pending command name bikeshedding) that checks whether the database needs upgrading. This would be much more reliable than grepping stdout.

It’s already there airflow db check-migrations

Thanks for sharing @troyharvey I think we are going to do the same.