airflow: Random "duplicate key value violates unique constraint" errors when initializing the postgres database
Apache Airflow version
2.3.0 (latest released)
What happened
while testing airflow 2.3.0 locally (using postgresql 12.4), the webserver container shows random errors:
webserver_1 | + airflow db init
...
webserver_1 | + exec airflow webserver
...
webserver_1 | [2022-05-04 18:58:46,011] {{manager.py:568}} INFO - Added Permission menu access on Permissions to role Admin
postgres_1 | 2022-05-04 18:58:46.013 UTC [41] ERROR: duplicate key value violates unique constraint "ab_permission_view_role_permission_view_id_role_id_key"
postgres_1 | 2022-05-04 18:58:46.013 UTC [41] DETAIL: Key (permission_view_id, role_id)=(204, 1) already exists.
postgres_1 | 2022-05-04 18:58:46.013 UTC [41] STATEMENT: INSERT INTO ab_permission_view_role (id, permission_view_id, role_id) VALUES (nextval('ab_permission_view_role_id_seq'), 204, 1) RETURNING ab_permission_view_role.id
webserver_1 | [2022-05-04 18:58:46,015] {{manager.py:570}} ERROR - Add Permission to Role Error: (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "ab_permission_view_role_permission_view_id_role_id_key"
webserver_1 | DETAIL: Key (permission_view_id, role_id)=(204, 1) already exists.
webserver_1 |
webserver_1 | [SQL: INSERT INTO ab_permission_view_role (id, permission_view_id, role_id) VALUES (nextval('ab_permission_view_role_id_seq'), %(permission_view_id)s, %(role_id)s) RETURNING ab_permission_view_role.id]
webserver_1 | [parameters: {'permission_view_id': 204, 'role_id': 1}]
notes:
- when the db is first initialized, i have ~40 errors like this (with ~40 different
permission_view_id
but always the same'role_id': 1
) - when it’s not the first time initializing db, i always have 1 error like this but it shows different
permission_view_id
each time - all these errors don’t seem to have any real negative effects, the webserver is still running and airflow is still running and scheduling tasks
- “occasionally” i do get real exceptions which render the webserver workers all dead:
postgres_1 | 2022-05-05 20:03:30.580 UTC [44] ERROR: duplicate key value violates unique constraint "ab_permission_view_role_permission_view_id_role_id_key"
postgres_1 | 2022-05-05 20:03:30.580 UTC [44] DETAIL: Key (permission_view_id, role_id)=(214, 1) already exists.
postgres_1 | 2022-05-05 20:03:30.580 UTC [44] STATEMENT: INSERT INTO ab_permission_view_role (id, permission_view_id, role_id) VALUES (nextval('ab_permission_view_role_id_seq'), 214, 1) RETURNING ab_permission_view_role.id
webserver_1 | [2022-05-05 20:03:30 +0000] [121] [ERROR] Exception in worker process
webserver_1 | Traceback (most recent call last):
webserver_1 | File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1705, in _execute_context
webserver_1 | self.dialect.do_execute(
webserver_1 | File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 716, in do_execute
webserver_1 | cursor.execute(statement, parameters)
webserver_1 | psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "ab_permission_view_role_permission_view_id_role_id_key"
webserver_1 | DETAIL: Key (permission_view_id, role_id)=(214, 1) already exists.
webserver_1 |
webserver_1 |
webserver_1 | The above exception was the direct cause of the following exception:
webserver_1 |
webserver_1 | Traceback (most recent call last):
webserver_1 | File "/usr/local/lib/python3.8/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker
webserver_1 | worker.init_process()
webserver_1 | File "/usr/local/lib/python3.8/site-packages/gunicorn/workers/base.py", line 134, in init_process
webserver_1 | self.load_wsgi()
webserver_1 | File "/usr/local/lib/python3.8/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi
webserver_1 | self.wsgi = self.app.wsgi()
webserver_1 | File "/usr/local/lib/python3.8/site-packages/gunicorn/app/base.py", line 67, in wsgi
webserver_1 | self.callable = self.load()
webserver_1 | File "/usr/local/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 58, in load
webserver_1 | return self.load_wsgiapp()
webserver_1 | File "/usr/local/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp
webserver_1 | return util.import_app(self.app_uri)
webserver_1 | File "/usr/local/lib/python3.8/site-packages/gunicorn/util.py", line 412, in import_app
webserver_1 | app = app(*args, **kwargs)
webserver_1 | File "/usr/local/lib/python3.8/site-packages/airflow/www/app.py", line 158, in cached_app
webserver_1 | app = create_app(config=config, testing=testing)
webserver_1 | File "/usr/local/lib/python3.8/site-packages/airflow/www/app.py", line 146, in create_app
webserver_1 | sync_appbuilder_roles(flask_app)
webserver_1 | File "/usr/local/lib/python3.8/site-packages/airflow/www/app.py", line 68, in sync_appbuilder_roles
webserver_1 | flask_app.appbuilder.sm.sync_roles()
webserver_1 | File "/usr/local/lib/python3.8/site-packages/airflow/www/security.py", line 580, in sync_roles
webserver_1 | self.update_admin_permission()
webserver_1 | File "/usr/local/lib/python3.8/site-packages/airflow/www/security.py", line 562, in update_admin_permission
webserver_1 | self.get_session.commit()
webserver_1 | File "<string>", line 2, in commit
webserver_1 | File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 1423, in commit
webserver_1 | self._transaction.commit(_to_root=self.future)
webserver_1 | File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 829, in commit
webserver_1 | self._prepare_impl()
webserver_1 | File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 808, in _prepare_impl
webserver_1 | self.session.flush()
webserver_1 | File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 3255, in flush
webserver_1 | self._flush(objects)
webserver_1 | File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 3395, in _flush
webserver_1 | transaction.rollback(_capture_exception=True)
webserver_1 | File "/usr/local/lib/python3.8/site-packages/sqlalchemy/util/langhelpers.py", line 70, in __exit__
webserver_1 | compat.raise_(
webserver_1 | File "/usr/local/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 211, in raise_
webserver_1 | raise exception
webserver_1 | File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 3355, in _flush
webserver_1 | flush_context.execute()
webserver_1 | File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/unitofwork.py", line 453, in execute
webserver_1 | rec.execute(self)
webserver_1 | File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/unitofwork.py", line 576, in execute
webserver_1 | self.dependency_processor.process_saves(uow, states)
webserver_1 | File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/dependency.py", line 1182, in process_saves
webserver_1 | self._run_crud(
webserver_1 | File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/dependency.py", line 1245, in _run_crud
webserver_1 | connection.execute(statement, secondary_insert)
webserver_1 | File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1200, in execute
webserver_1 | return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS)
webserver_1 | File "/usr/local/lib/python3.8/site-packages/sqlalchemy/sql/elements.py", line 313, in _execute_on_connection
webserver_1 | return connection._execute_clauseelement(
webserver_1 | File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1389, in _execute_clauseelement
webserver_1 | ret = self._execute_context(
webserver_1 | File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1748, in _execute_context
webserver_1 | self._handle_dbapi_exception(
webserver_1 | File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1929, in _handle_dbapi_exception
webserver_1 | util.raise_(
webserver_1 | File "/usr/local/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 211, in raise_
webserver_1 | raise exception
webserver_1 | File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1705, in _execute_context
webserver_1 | self.dialect.do_execute(
webserver_1 | File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 716, in do_execute
webserver_1 | cursor.execute(statement, parameters)
webserver_1 | sqlalchemy.exc.IntegrityError: (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "ab_permission_view_role_permission_view_id_role_id_key"
webserver_1 | DETAIL: Key (permission_view_id, role_id)=(214, 1) already exists.
webserver_1 |
webserver_1 | [SQL: INSERT INTO ab_permission_view_role (id, permission_view_id, role_id) VALUES (nextval('ab_permission_view_role_id_seq'), %(permission_view_id)s, %(role_id)s) RETURNING ab_permission_view_role.id]
webserver_1 | [parameters: {'permission_view_id': 214, 'role_id': 1}]
webserver_1 | (Background on this error at: http://sqlalche.me/e/14/gkpj)
webserver_1 | [2022-05-05 20:03:30 +0000] [121] [INFO] Worker exiting (pid: 121)
flower_1 | + exec airflow celery flower
scheduler_1 | + exec airflow scheduler
webserver_1 | [2022-05-05 20:03:31 +0000] [118] [INFO] Worker exiting (pid: 118)
webserver_1 | [2022-05-05 20:03:31 +0000] [119] [INFO] Worker exiting (pid: 119)
webserver_1 | [2022-05-05 20:03:31 +0000] [120] [INFO] Worker exiting (pid: 120)
worker_1 | + exec airflow celery worker
However such exceptions are rare and pure random, i can’t find a way to reproduce them consistently.
What you think should happen instead
prior to 2.3.0 there were no such errors
How to reproduce
No response
Operating System
Linux Mint 20.3
Versions of Apache Airflow Providers
No response
Deployment
Docker-Compose
Deployment details
No response
Anything else
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project’s Code of Conduct
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 16
- Comments: 70 (55 by maintainers)
I don’t think downgrade library will work? Seems like we have to wait for v2.3.1.
I am also facing exact thing for v2.3.0 using docker-compose installation
Not sure if this helps speed up the troubleshooting process but I’m seeing the same
duplicate key value violates unique constraint
error when running airflow in standalone mode with one gunicorn worker.interestingly, I only started getting this error message when I switched
AUTH_TYPE
fromAUTH_DB
toAUTH_OAUTH
How do you get past the error? Every time our webserver pod spins up it fails on this, kills the gunicorn works, and then restarts.
if the preload fix is indeed the fix, you can suppress this for now by setting the
GUNICORN_CMD_ARGS
environment variable to--preload
No confusion. There were two similar issues - one of them (for some users) fixed in 2.4.2 and another similar issue for some other users that will be released in 2.4.3. The milestone is always “current” thinking and if it moves, it moves.
i checked out
2.4.2
and didthen built my own airflow
then installed it
no more “duplicate key value violates unique constraint” errors 🚀 🚀 🚀 🚀 🚀 🚀
Just to repeat it @jhtimmins - I think you are the only one who somewhat knows what the problem is about and what the solution might be (and you attempted to fix it in #24065). Do you have a clue what it could be? I think I would be willing to help with that and maybe even be able to provide some proposals to improve, but I have no clue where to start and what the reason might be (and seems that this one is still plaguing some users).
Any please where we could start looking at/ideas/hypotheses what could be wrong?
both https://github.com/apache/airflow/blob/ee100a592e07761f3b32294f3ad82c4a6c7cf74d/airflow/www/app.py#L136 and https://github.com/apache/airflow/blob/ee100a592e07761f3b32294f3ad82c4a6c7cf74d/airflow/www/app.py#L146 are doing clashing reads/writes to the database once this line https://github.com/apache/airflow/blob/ee100a592e07761f3b32294f3ad82c4a6c7cf74d/airflow/cli/commands/webserver_command.py#L475 spins up 4 (by default) gunicorn workers. so i guess if i do this
i know it’s weird to alter it on the fly. an alternative is to set an env var such as
and then check it along side with
webserver.update_fab_perms
in https://github.com/apache/airflow/blob/ee100a592e07761f3b32294f3ad82c4a6c7cf74d/airflow/www/app.py#L67 and https://github.com/apache/airflow/blob/ee100a592e07761f3b32294f3ad82c4a6c7cf74d/airflow/www/extensions/init_appbuilder.py#L643i tried the env var solution, it worked. but it’s janky. it requires an extra env var 🤔