docker-airflow: Airflow Worker and Scheduler is not picking up job
hello everyone,
I am trying to run docker-compose-CeleryExecutor.yml, however, worker is not picking up job.
I am seeing following message.
webserver_1 | [2017-06-24 02:59:43 +0000] [29] [INFO] Handling signal: ttin
webserver_1 | [2017-06-24 02:59:43 +0000] [68] [INFO] Booting worker with pid: 68
After that I am not seeing scheduler and worker picking up job and executing
About this issue
- Original URL
- State: open
- Created 7 years ago
- Reactions: 42
- Comments: 39 (1 by maintainers)
Any update on this from the Airflow team? This seems like a critical issues, that tasks are not getting run.
I am seeing this issue intermittently as well, with Airflow running on Amazon ECS
Same here. I’m using Airflow as a standalone app (not in Docker or Kubernetes). When I start the DAG, the DAG shows the state as running. But none of the tasks show queued or started. It takes a long time. I don’t have any other DAG running or anybody else using this Airflow.
I’m running Airflow on kubernetes based on this Dockerfile setting + some adjustment and facing the similar issue. When I manually run a DAG, some will run but after that, all remaining tasks will get stuck at a queued status. I use CeleryExecutor with Redis.
I also see this log on a web container, but not sure if it’s related. The web server cannot retrieve a log from a worker directly, but it eventually can be seen via S3 when a task is complete, so I thought it’s not a critical problem. Is the log retrieval related to this issue?
So far, every time I see this issue I manually “clear” the task which get stuck, then it will run. I really have no clue what is the root cause of this problem🙁
Same problem as @mukeshnayak1
Do you have an idea how to deal with it? Running tasks works though…
I’m having the same issue, but only when using CeleryExecutor. LocalExecutor seems to work.
poolandqueue.solutiion1: try to run
export C_FORCE_ROOT='true'solution2: run airflow worker as non-root user
We are running this docker image in ECS one or more tasks in the same dag get queued but dont start workers and scheduler seem to have low cpu utilization. All tasks are databricks operators which sleep 60 secs after checking job status.
I am having the same issue with running in ECS and using Databricks operators. I have 4 main DAGs, two with 2 tasks and two with 1 task. The scheduling seems to work fine for a while, then it stalls, with the DAG still “running”, but tasks completed. It stays this way indefinitely, and no new tasks trigger. Restarting the service allows it to continue.
As an added complication for debugging, I’m running in Fargate and it’s not possible (or not easy) to see internals of the container.
Has anyone able to solve this issue , running into the exact same error , with dag in running state but not actually picked by the scheduler.
[INFO] Handling signal: ttou [INFO] Worker exiting (pid: 31418) [INFO] Handling signal: ttin [INFO] Booting worker with pid: 32308
Read in a stackoverflow post that the handling ttou/ttin is a gunicorn behaviour to refresh workers and is as expected but doesnt quite explain why scheduler is not picking up anything.