airflow: Scheduler errors when using MSSQL as the backend database
Apache Airflow version: 1.10.11
Kubernetes version (if you are using kubernetes) (use kubectl version
):
Environment:
- Cloud provider or hardware configuration:
- OS (e.g. from /etc/os-release):
- Kernel (e.g.
uname -a
): - Install tools:
- Others:
What happened:
What you expected to happen:
How to reproduce it:
Set MSSQL as the sql connection backend and examine the airflow scheduler logs
[2020-07-22 08:27:16,334] {settings.py:306} DEBUG - Disposing DB connection pool (PID 7714)
Process DagFileProcessor1036-Process:
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1278, in _execute_context
cursor, statement, parameters, context
File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 593, in do_execute
cursor.execute(statement, parameters)
pyodbc.ProgrammingError: ('42000', "[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Incorrect syntax near '1'. (102) (SQLExecDirectW)")
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/usr/local/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 159, in _run_file_processor
pickle_dags)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/utils/db.py", line 74, in wrapper
return func(*args, **kwargs)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 1593, in process_file
paused_dag_ids = models.DagModel.get_paused_dag_ids(dag_ids=dagbag.dag_ids)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/utils/db.py", line 74, in wrapper
return func(*args, **kwargs)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/models/dag.py", line 1808, in get_paused_dag_ids
.filter(DagModel.dag_id.in_(dag_ids))
File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3341, in all
return list(self)
File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3503, in __iter__
return self._execute_and_instances(context)
File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3528, in _execute_and_instances
result = conn.execute(querycontext.statement, self._params)
File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1014, in execute
return meth(self, multiparams, params)
File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1133, in _execute_clauseelement
distilled_params,
File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1318, in _execute_context
e, statement, parameters, cursor, context
File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1512, in _handle_dbapi_exception
sqlalchemy_exception, with_traceback=exc_info[2], from_=e
File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 178, in raise_
raise exception
File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1278, in _execute_context
cursor, statement, parameters, context
File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 593, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.ProgrammingError: (pyodbc.ProgrammingError) ('42000', "[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Incorrect syntax near '1'. (102) (SQLExecDirectW)")
[SQL: SELECT dag.dag_id AS dag_dag_id
FROM dag
WHERE dag.is_paused IS 1 AND dag.dag_id IN (?)]
[parameters: ('test-dag',)]
(Background on this error at: http://sqlalche.me/e/13/f405)
Anything else we need to know:
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 3
- Comments: 17 (13 by maintainers)
I would indeed like to use the survey (I think this is the only way) to just query and ask the community what are their preferences - we do not need to submit to the preferences - but I think right now we have very little data to make a conscious decision on all the points you raised (which versions to support, which one to drop). I think asking is no harm 😃.
Airflow does not support MSSQL officially. Maybe you would like to add tests/support for it in CI @apache/airflow-committers ? WDYT? I think MSSQL comes rather often in the discussions/issues and maybe it’s worth to consider official support for it?
Hi, I can bring some informations on:
what versions do we support, should we add support to more versions.
→ Targeting an older version of SQL Server = newer versions can generally run it. If anything, the user can always set the compatibility level of his newer SQL Server to target an older version, and the issue is solved.
SQL Server 2016 is fine for testing. Given that Airflow is a ‘new’ technology, I think that people will modernize themself if they have the guarantee that Airflow will run smoothly.
I reflected this afternoon, as part of the StackOverFlow 2020 surver, SQL Server is the third most used Database, not far away from Postgre.
If we talk only of Relation Databases, the major ones are : MySQL, Postgre, SQL Server, Oracle, MariaDB (DB2 almost not used, and mainly used for legacy mainframe, and SQL lite not a good fit for Airflow Production).
MySQL-MariaDB are kind of similar, the latter is a fork of the other. Both Postgre & Oracle speak PL/SQL, even though there might be slight differences, and different isolation level support. There remains just SQL Server, alone. So there are points of supporting it for the sake of completness of RDBMS support. Ms-sql is also a way to open Airflow to a new segment of users.
However, this has a technical/humain cost, add on that the fact that the Python ecosystem/developpers are more used to work with Postgre/MySQL. I hope the community will succeed in achieving its major Airflow milestones so that this may give some room & time to add support for SQL Server.
Bear in mind we already need to test Airflow with newer versions of Postgres and MySQL too
Postgres-version: [9.6, 10] Mysql-version: [5.7]
Soon we should add support for Postgres 12 (maybe drop 9.6 - separate discussion)
Should we add support for MySQL 8 since it was GA since 19 April 2018
The test suite will keep on increasing.
Same with MSSQL - if we say yes, what versions do we support, should we add support to more versions. The test matrices will keep on increasing.
My recommendation would be to support Postgres and MySQL and improve the support across multiple versions like I mentioned above.
Adding it to the CI would be a first step. Curious what we will encounter. I’m all for it.
got around it by using