airflow: Scheduler errors when using MSSQL as the backend database

Apache Airflow version: 1.10.11

Kubernetes version (if you are using kubernetes) (use kubectl version):

Environment:

  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:

What happened:

What you expected to happen:

How to reproduce it:

Set MSSQL as the sql connection backend and examine the airflow scheduler logs

[2020-07-22 08:27:16,334] {settings.py:306} DEBUG - Disposing DB connection pool (PID 7714)
Process DagFileProcessor1036-Process:
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1278, in _execute_context
    cursor, statement, parameters, context
  File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 593, in do_execute
    cursor.execute(statement, parameters)
pyodbc.ProgrammingError: ('42000', "[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Incorrect syntax near '1'. (102) (SQLExecDirectW)")
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/usr/local/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 159, in _run_file_processor
    pickle_dags)
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow/utils/db.py", line 74, in wrapper
    return func(*args, **kwargs)
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 1593, in process_file
    paused_dag_ids = models.DagModel.get_paused_dag_ids(dag_ids=dagbag.dag_ids)
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow/utils/db.py", line 74, in wrapper
    return func(*args, **kwargs)
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow/models/dag.py", line 1808, in get_paused_dag_ids
    .filter(DagModel.dag_id.in_(dag_ids))
  File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3341, in all
    return list(self)
  File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3503, in __iter__
    return self._execute_and_instances(context)
  File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3528, in _execute_and_instances
    result = conn.execute(querycontext.statement, self._params)
  File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1014, in execute
    return meth(self, multiparams, params)
  File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1133, in _execute_clauseelement
    distilled_params,
  File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1318, in _execute_context
    e, statement, parameters, cursor, context
  File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1512, in _handle_dbapi_exception
    sqlalchemy_exception, with_traceback=exc_info[2], from_=e
  File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 178, in raise_
    raise exception
  File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1278, in _execute_context
    cursor, statement, parameters, context
  File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 593, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.ProgrammingError: (pyodbc.ProgrammingError) ('42000', "[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Incorrect syntax near '1'. (102) (SQLExecDirectW)")
[SQL: SELECT dag.dag_id AS dag_dag_id 
FROM dag 
WHERE dag.is_paused IS 1 AND dag.dag_id IN (?)]
[parameters: ('test-dag',)]
(Background on this error at: http://sqlalche.me/e/13/f405)

Anything else we need to know:

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 3
  • Comments: 17 (13 by maintainers)

Most upvoted comments

I would indeed like to use the survey (I think this is the only way) to just query and ask the community what are their preferences - we do not need to submit to the preferences - but I think right now we have very little data to make a conscious decision on all the points you raised (which versions to support, which one to drop). I think asking is no harm 😃.

Airflow does not support MSSQL officially. Maybe you would like to add tests/support for it in CI @apache/airflow-committers ? WDYT? I think MSSQL comes rather often in the discussions/issues and maybe it’s worth to consider official support for it?

Hi, I can bring some informations on: what versions do we support, should we add support to more versions.

  1. In general, ms-sql is quite robust in term of newer versions, in the sense that newer versions do not break old ones, especially since SQL Server 2008-R2.
  2. Whatever SQL Server version you are running, you can set, on a Database basis, the Compatibility Level to a prior version. E.g. for an instance of SQL Server 2016, having multiple DBs inside it, you can set one DB to kind of run with the code/syntax/rules of SQL Server 2012, another with the code/syntax/rules of SQL Server 2014, etc.

→ Targeting an older version of SQL Server = newer versions can generally run it. If anything, the user can always set the compatibility level of his newer SQL Server to target an older version, and the issue is solved.

  1. Today, there are still people running older version of (SQL Server 2000!), but those would be for legacy business applications. For an app like Airflow, you can assume that people will use a server not hosting those kind of legacy applications.
  2. The easiest would be to test on SQL Server 2016. Having Airflow working on SQL Server 2016 would be super! And I think that all ms-sql folks will get by this. I am not aware of any break since SQL Server 2016 to the last released version (SQL Server 2019).
  3. For the sake of informations, here are some major changes with older verions in term of syntax:
  • SQL Server 2014: does not support for XML/Json data types, nor some functions such as string splitting, has less support on columnar indexes.
  • SQL Server 2012: not aware of major change compared to SQL Server 2014. Don’t support columnar indexes.
  • SQL Server 2008R2: this version doesn’t support analytical functions (count() over (partition by…)).
  • There is no point to go further, I don’t think that anyone will install Airflow on SQL Server 2008 anyway.

SQL Server 2016 is fine for testing. Given that Airflow is a ‘new’ technology, I think that people will modernize themself if they have the guarantee that Airflow will run smoothly.


I reflected this afternoon, as part of the StackOverFlow 2020 surver, SQL Server is the third most used Database, not far away from Postgre.

If we talk only of Relation Databases, the major ones are : MySQL, Postgre, SQL Server, Oracle, MariaDB (DB2 almost not used, and mainly used for legacy mainframe, and SQL lite not a good fit for Airflow Production).

MySQL-MariaDB are kind of similar, the latter is a fork of the other. Both Postgre & Oracle speak PL/SQL, even though there might be slight differences, and different isolation level support. There remains just SQL Server, alone. So there are points of supporting it for the sake of completness of RDBMS support. Ms-sql is also a way to open Airflow to a new segment of users.

However, this has a technical/humain cost, add on that the fact that the Python ecosystem/developpers are more used to work with Postgre/MySQL. I hope the community will succeed in achieving its major Airflow milestones so that this may give some room & time to add support for SQL Server.

Bear in mind we already need to test Airflow with newer versions of Postgres and MySQL too

Postgres-version: [9.6, 10] Mysql-version: [5.7]

Soon we should add support for Postgres 12 (maybe drop 9.6 - separate discussion)

Should we add support for MySQL 8 since it was GA since 19 April 2018

The test suite will keep on increasing.

Same with MSSQL - if we say yes, what versions do we support, should we add support to more versions. The test matrices will keep on increasing.

My recommendation would be to support Postgres and MySQL and improve the support across multiple versions like I mentioned above.

Adding it to the CI would be a first step. Curious what we will encounter. I’m all for it.

got around it by using

sed -i 's/DagModel.is_paused.is_(True)/DagModel.is_paused == True/g' /home/airflow/.local/lib/python3.7/site-packages/airflow/models/dag.py