docker-airflow: Airflow Worker and Scheduler is not picking up job

hello everyone,

I am trying to run docker-compose-CeleryExecutor.yml, however, worker is not picking up job.

I am seeing following message.

webserver_1  | [2017-06-24 02:59:43 +0000] [29] [INFO] Handling signal: ttin
webserver_1  | [2017-06-24 02:59:43 +0000] [68] [INFO] Booting worker with pid: 68

After that I am not seeing scheduler and worker picking up job and executing

About this issue

  • Original URL
  • State: open
  • Created 7 years ago
  • Reactions: 42
  • Comments: 39 (1 by maintainers)

Most upvoted comments

Any update on this from the Airflow team? This seems like a critical issues, that tasks are not getting run.

I am seeing this issue intermittently as well, with Airflow running on Amazon ECS

Same here. I’m using Airflow as a standalone app (not in Docker or Kubernetes). When I start the DAG, the DAG shows the state as running. But none of the tasks show queued or started. It takes a long time. I don’t have any other DAG running or anybody else using this Airflow.

I’m running Airflow on kubernetes based on this Dockerfile setting + some adjustment and facing the similar issue. When I manually run a DAG, some will run but after that, all remaining tasks will get stuck at a queued status. I use CeleryExecutor with Redis.

[INFO] Handling signal: ttin
[INFO] Booting worker with pid: xxx

I also see this log on a web container, but not sure if it’s related. The web server cannot retrieve a log from a worker directly, but it eventually can be seen via S3 when a task is complete, so I thought it’s not a critical problem. Is the log retrieval related to this issue?

So far, every time I see this issue I manually “clear” the task which get stuck, then it will run. I really have no clue what is the root cause of this problem🙁

Same problem as @mukeshnayak1

[2019-03-01 11:45:05 +0200] [49524] [INFO] Handling signal: ttin
[2019-03-01 11:45:05 +0200] [9318] [INFO] Booting worker with pid: 9318
[2019-03-01 11:45:05,631] {__init__.py:51} INFO - Using executor SequentialExecutor
[2019-03-01 11:45:05,832] {models.py:273} INFO - Filling up the DagBag from /Users/alexeyd/airflow/dags
[2019-03-01 11:45:06 +0200] [49524] [INFO] Handling signal: ttou
[2019-03-01 11:45:06 +0200] [9115] [INFO] Worker exiting (pid: 9115)
[2019-03-01 11:45:36 +0200] [49524] [INFO] Handling signal: ttin
[2019-03-01 11:45:36 +0200] [9373] [INFO] Booting worker with pid: 9373
[2019-03-01 11:45:36,937] {__init__.py:51} INFO - Using executor SequentialExecutor
[2019-03-01 11:45:37,142] {models.py:273} INFO - Filling up the DagBag from /Users/alexeyd/airflow/dags
[2019-03-01 11:45:38 +0200] [49524] [INFO] Handling signal: ttou
[2019-03-01 11:45:38 +0200] [9166] [INFO] Worker exiting (pid: 9166)

Do you have an idea how to deal with it? Running tasks works though…

I’m having the same issue, but only when using CeleryExecutor. LocalExecutor seems to work.

  • check if airflow scheduler is running
  • check if airflow webserver is running
  • check if all DAGs are set to On in the web UI
  • check if the DAGs have a start date which is in the past
  • check if the DAGs have a proper schedule (before the schedule date) which is shown in the web UI
  • check if the dag has the proper pool and queue.

solutiion1: try to run export C_FORCE_ROOT='true'

solution2: run airflow worker as non-root user

We are running this docker image in ECS one or more tasks in the same dag get queued but dont start workers and scheduler seem to have low cpu utilization. All tasks are databricks operators which sleep 60 secs after checking job status.

I am having the same issue with running in ECS and using Databricks operators. I have 4 main DAGs, two with 2 tasks and two with 1 task. The scheduling seems to work fine for a while, then it stalls, with the DAG still “running”, but tasks completed. It stays this way indefinitely, and no new tasks trigger. Restarting the service allows it to continue.

As an added complication for debugging, I’m running in Fargate and it’s not possible (or not easy) to see internals of the container.

Has anyone able to solve this issue , running into the exact same error , with dag in running state but not actually picked by the scheduler.

[INFO] Handling signal: ttou [INFO] Worker exiting (pid: 31418) [INFO] Handling signal: ttin [INFO] Booting worker with pid: 32308

Read in a stackoverflow post that the handling ttou/ttin is a gunicorn behaviour to refresh workers and is as expected but doesnt quite explain why scheduler is not picking up anything.