apscheduler: BackgroundScheduler.get_jobs() hangs when used with Flask and Sqlalchemy

Motivation

On startup, I’d like to be able to add a persistent job store and add a job only if that job store is empty.

What works

With sqlalchemy, but without flask, the following code works:

from time import sleep
import sqlalchemy as sa

# Connect to example.sqlite and add a new job only if there are no jobs already.

from apscheduler.schedulers.background import BackgroundScheduler

log = print

engine = sa.create_engine('sqlite:///{}'.format('example.sqlite'))

def alarm():
    print('Alarm')

if __name__ == '__main__':
    scheduler = BackgroundScheduler()
    log("created scheduler")

    scheduler.add_jobstore('sqlalchemy', engine=engine)
    log("Added jobstore")

    scheduler.start()
    log("Started scheduler")
    if not scheduler.get_jobs():
        log("Added job")
        scheduler.add_job(alarm, 'interval', seconds=20)
    else:
        log("Didn't add job.")
    try:
        while True:
            sleep(2)
    except (KeyboardInterrupt, SystemExit):
        pass

With Flask but without sqlalchemy, the following works. It doesn’t make use of persistent storage, of course, but get_jobs() will return [].:

import os

from time import sleep
import flask

from apscheduler.schedulers.background import BackgroundScheduler

# Verify that apscheduler works with flask, as long as we don't use
# persistent storage.

# run with
# & { $env:FLASK_APP='demo.py'; $env:FLASK_DEBUG=1; python -m flask
 run}

app = flask.Flask(__name__)
log = app.logger.info
log("Created App")


def alarm():
    print('Alarm')


if not app.debug or os.environ.get("WERKZEUG_RUN_MAIN") == 'true':
    scheduler = BackgroundScheduler()
    log("created scheduler")

    scheduler.start()
    log("Started scheduler")
    if not scheduler.get_jobs():
        app.logger.info("Added job")
        scheduler.add_job(alarm, 'interval', seconds=20)
    else:
        app.logger.info("Didn't add job.")

What doesn’t work:

When I try to add a persistent job store to this flask app, scheduler.get_jobs() hangs:

import os

from time import sleep
import sqlalchemy as sa
import flask

from apscheduler.schedulers.background import BackgroundScheduler

# Hangs at get_jobs()

app = flask.Flask(__name__)
log = app.logger.info
log("Created App")

### NEW ###
engine = sa.create_engine('sqlite:///{}'.format('example.sqlite'))
###########

def alarm():
    print('Alarm')

# Don't create two schedulers when running in debug mode.
if not app.debug or os.environ.get("WERKZEUG_RUN_MAIN") == 'true':
    scheduler = BackgroundScheduler()
    log("created scheduler")

    ### NEW ###
    scheduler.add_jobstore('sqlalchemy', engine=engine)
    log("Added jobstore")
    ###########

    scheduler.start()
    log("Started scheduler")
    if not scheduler.get_jobs():
        app.logger.info("Added job")
        scheduler.add_job(alarm, 'interval', seconds=20)
    else:
        app.logger.info("Didn't add job.")

Environment

Windows 10 Python 3.5.3

Running in a virtual environment with:

Package      Version
------------ -------
APScheduler  3.4.0
click        6.7
Flask        0.12.2
itsdangerous 0.24
Jinja2       2.10
MarkupSafe   1.0
pip          9.0.1
pytz         2017.3
setuptools   37.0.0
six          1.11.0
SQLAlchemy   1.1.15
tzlocal      1.4
Werkzeug     0.12.2
wheel        0.30.0

Edit:

This also appears on Linux with 3.6.2. All the package versions are the same.

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 1
  • Comments: 23 (9 by maintainers)

Commits related to this issue

Most upvoted comments

Thanks for responding @agronholm. I dropped flask-apscheduler and am now using APScheduler directly. I am able to replicate the issue. I’ve done some debugging and have found a deadlock caused in relation to _jobstores_lock.

After starting APScheduler a thread is spun off which executes _process_jobs, this sets the _jobstores_lock here: https://github.com/agronholm/apscheduler/blob/cbf2eeb21695343c1996e59732adbc8fbbab6842/apscheduler/schedulers/base.py#L929

A query executed within the context of this lock by a function called get_due_jobs never returns. I followed the execution down to the last executed line and it seems there is an issue receiving a new cursor from SQLAlchemy’s connection pool (https://github.com/zzzeek/sqlalchemy/blob/master/lib/sqlalchemy/pool.py#L970).

Simultaneously, while the ^^^ request hangs, the main thread executes add_job which also requests a lock from _jobstores_lock. That lock request occurs here: https://github.com/agronholm/apscheduler/blob/cbf2eeb21695343c1996e59732adbc8fbbab6842/apscheduler/schedulers/base.py#L428

My config

init.py
from apscheduler.schedulers.background import BackgroundScheduler
from flask import Flask
from flask_sqlalchemy import SQLAlchemy
from flask_migrate import Migrate

app = Flask(__name__)

app.config.from_object('config.default')

db = SQLAlchemy(app)
migrate = Migrate(app, db)

scheduler = BackgroundScheduler()
scheduler.add_jobstore('sqlalchemy', engine=db.engine)
scheduler.start()

def job():
    print("HELLO JOB")

app.logger.info("Adding job")
scheduler.add_job(job, 'interval', minutes=10, replace_existing=True, id='test_job')
app.logger.info("Added job")
config/default.py
DEBUG=True
SQLALCHEMY_TRACK_MODIFICATIONS=False
SECRET_KEY="SOMETHINGSECRET"
MAIL_SUPPRESS_SEND=True
SQLALCHEMY_DATABASE_URI="postgresql://wf@localhost:5432/wf_development"
SQLALCHEMY_ECHO=True
PROPAGATE_EXCEPTIONS=True

Output

 * Serving Flask app "app"
 * Forcing debug mode on
2017-11-29 13:33:49,194 INFO sqlalchemy.engine.base.Engine select version()
2017-11-29 13:33:49,195 INFO sqlalchemy.engine.base.Engine {}
2017-11-29 13:33:49,197 INFO sqlalchemy.engine.base.Engine select current_schema()
2017-11-29 13:33:49,197 INFO sqlalchemy.engine.base.Engine {}
2017-11-29 13:33:49,198 INFO sqlalchemy.engine.base.Engine SELECT CAST('test plain returns' AS VARCHAR(60)) AS anon_1
2017-11-29 13:33:49,198 INFO sqlalchemy.engine.base.Engine {}
2017-11-29 13:33:49,199 INFO sqlalchemy.engine.base.Engine SELECT CAST('test unicode returns' AS VARCHAR(60)) AS anon_1
2017-11-29 13:33:49,199 INFO sqlalchemy.engine.base.Engine {}
2017-11-29 13:33:49,200 INFO sqlalchemy.engine.base.Engine show standard_conforming_strings
2017-11-29 13:33:49,200 INFO sqlalchemy.engine.base.Engine {}
2017-11-29 13:33:49,201 INFO sqlalchemy.engine.base.Engine select relname from pg_class c join pg_namespace n on n.oid=c.relnamespace where pg_catalog.pg_table_is_visible(c.oid) and relname=%(name)s
2017-11-29 13:33:49,201 INFO sqlalchemy.engine.base.Engine {'name': u'apscheduler_jobs'}
--------------------------------------------------------------------------------
INFO in __init__ [./app/__init__.py:20]:
Adding job
--------------------------------------------------------------------------------

Yes this is still relevant. I ran into this issue a few days ago using gunicorn, flask, apscheduler and sqlalchemy.

ref_to_obj (called by get_jobs) was able to import correctly before we add a job. However, as long as add_job is invoked once, it no longer works. we also have the same issue when we try to add_job using flask/sqlalchemy/apscheduler. The lock is not released after a job is added (in the sense the first add_job returns), so we cannot add the second job (second add_job hangs). Since other ppl have the same issue, just wondering if any one has solved this issue.

I’m experiencing a very similar issue using APScheduler through the https://github.com/viniciuschiele/flask-apscheduler project. I too am trying to use the SQLAlchemy jobstore, in my case backed by a postgresql DB. I initialize and start the scheduler, and try to call add_job, which causes my project to hang. If I disable the SQLAlchemy job store, all works as expected.

Environment

Mac 10.13.1 Python 2.7.13

Package                       Version    
----------------------------- -----------    
APScheduler                   3.4.0 
click                         6.7   
Flask                         0.12.2     
Flask-APScheduler             1.7.1      
Flask-Cors                    3.0.3      
Flask-Mail                    0.9.1      
flask-marshmallow             0.8.0      
Flask-Migrate                 2.1.1      
Flask-SQLAlchemy              2.3.2     
Jinja2                        2.10 
MarkupSafe                    1.0 
pip                           9.0.1     
pytz                          2017.3     
setuptools                    38.2.3 
simplejson                    3.13.2     
six                           1.11.0     
SQLAlchemy                    1.1.15     
sqlalchemy-json               0.2.1      
SQLAlchemy-Utils              0.32.21     
tzlocal                       1.4 
webargs                       1.8.1      
Werkzeug                      0.12.2     
wheel                         0.30.0     

I met a problem like https://github.com/unbit/uwsgi/issues/844 . And solved by add --enable-threads. MIL issues Maybe help. DEBUG should be false in pro env.

This is definitely still an issue, but it’s not with APScheduler, but with the way most Flask apps use a Global app instance and that the docs for Flask-APScheduler shows calling start and then defining your tasks.

root cause is that when trying to load the state data from the job_store, the module that is trying to be imported with __import__, hasn’t actually been read by the interpreter yet and thus isn’t in the name space. Why’s it hangs? That’s a good question, I believe since the module is not done being fully loaded, it enters some kind of circular dependency that the interpreter can’t detect.

By simply ensuring that the entire module is loaded before calling scheduler.start() resolved this issue for me.

I placed scheduler.start() at the bottom of my module file where I defined scheduler.

Just curious, anyone knows what root cause is? we have the same issue: add_jobs hangs there with a flask/sqlalchemy/apscheduler combination. we traced back to __import__ in ref_to_obj in util.py, which hangs forever.