airflow: Constant "The scheduler does not appear to be running" warning on the UI following 2.6.0 upgrade
Apache Airflow version
2.6.0
What happened
Ever since we upgraded to Airflow 2.6.0 from 2.5.2, we have seen that there is a warning stating “The scheduler does not appear to be running” intermittently.
This warning goes away by simply refreshing the page. And this conforms with our findings that the scheduler has not been down at all, at any point. By calling the /health point constantly, we can get it to show an “unhealthy” status:
These are just approx. 6 seconds apart:
{"metadatabase": {"status": "healthy"}, "scheduler": {"latest_scheduler_heartbeat": "2023-05-11T07:42:36.857007+00:00", "status": "healthy"}}
{"metadatabase": {"status": "healthy"}, "scheduler": {"latest_scheduler_heartbeat": "2023-05-11T07:42:42.409344+00:00", "status": "unhealthy"}}
This causes no operational issues, but it is misleading for end-users. What could be causing this?
What you think should happen instead
The warning should not be shown unless the last heartbeat was at least 30 sec earlier (default config).
How to reproduce
There are no concrete steps to reproduce it, but the warning appears in the UI after a few seconds of browsing around, or simply refresh the /health endpoint constantly.
Operating System
Debian GNU/Linux 11
Versions of Apache Airflow Providers
apache-airflow-providers-amazon==8.0.0 apache-airflow-providers-celery==3.1.0 apache-airflow-providers-cncf-kubernetes==6.1.0 apache-airflow-providers-common-sql==1.4.0 apache-airflow-providers-docker==3.6.0 apache-airflow-providers-elasticsearch==4.4.0 apache-airflow-providers-ftp==3.3.1 apache-airflow-providers-google==10.0.0 apache-airflow-providers-grpc==3.1.0 apache-airflow-providers-hashicorp==3.3.1 apache-airflow-providers-http==4.3.0 apache-airflow-providers-imap==3.1.1 apache-airflow-providers-microsoft-azure==6.0.0 apache-airflow-providers-microsoft-mssql==3.3.2 apache-airflow-providers-microsoft-psrp==2.2.0 apache-airflow-providers-microsoft-winrm==3.0.0 apache-airflow-providers-mysql==5.0.0 apache-airflow-providers-odbc==3.2.1 apache-airflow-providers-oracle==3.0.0 apache-airflow-providers-postgres==5.4.0 apache-airflow-providers-redis==3.1.0 apache-airflow-providers-sendgrid==3.1.0 apache-airflow-providers-sftp==4.2.4 apache-airflow-providers-slack==7.2.0 apache-airflow-providers-snowflake==4.0.5 apache-airflow-providers-sqlite==3.3.2 apache-airflow-providers-ssh==3.6.0
Deployment
Official Apache Airflow Helm Chart
Deployment details
Deployed on AKS with helm
Anything else
None more than in the description above.
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project’s Code of Conduct
About this issue
- Original URL
- State: closed
- Created a year ago
- Reactions: 3
- Comments: 24 (17 by maintainers)
Commits related to this issue
- Fix calculation of health check threshold for SchedulerJob The change ##30302 split Job from JobRunner, but it missed the fact that SchedulerJob had a special case of checking the threshold - instead... — committed to potiuk/airflow by potiuk a year ago
- Fix calculation of health check threshold for SchedulerJob (#31277) The change ##30302 split Job from JobRunner, but it missed the fact that SchedulerJob had a special case of checking the threshold... — committed to apache/airflow by potiuk a year ago
- Fix calculation of health check threshold for SchedulerJob (#31277) The change ##30302 split Job from JobRunner, but it missed the fact that SchedulerJob had a special case of checking the threshold ... — committed to apache/airflow by potiuk a year ago
For anyone watching who wants a workaround for Airflow 2.6.0, you can simply set
AIRFLOW__SCHEDULER__JOB_HEARTBEAT_SEC
to30
, or whatever you hadAIRFLOW__SCHEDULER__SCHEDULER_HEALTH_CHECK_THRESHOLD
set to.We are facing the same problem after upgrading Airflow from 2.5.3 to 2.6.0
Yes. Thanks @arjunanan6 -> we just also merged #31277 with the fix to 2.6.1 (and rc3 will be up shortly for voting/testing - so if you want to take it to a spin for 2.6.1 and revert back to the original settings/configuration, that would be perfect.