airflow: Constant "The scheduler does not appear to be running" warning on the UI following 2.6.0 upgrade

Apache Airflow version

2.6.0

What happened

Ever since we upgraded to Airflow 2.6.0 from 2.5.2, we have seen that there is a warning stating “The scheduler does not appear to be running” intermittently.

This warning goes away by simply refreshing the page. And this conforms with our findings that the scheduler has not been down at all, at any point. By calling the /health point constantly, we can get it to show an “unhealthy” status:

These are just approx. 6 seconds apart:

{"metadatabase": {"status": "healthy"}, "scheduler": {"latest_scheduler_heartbeat": "2023-05-11T07:42:36.857007+00:00", "status": "healthy"}}

{"metadatabase": {"status": "healthy"}, "scheduler": {"latest_scheduler_heartbeat": "2023-05-11T07:42:42.409344+00:00", "status": "unhealthy"}}

This causes no operational issues, but it is misleading for end-users. What could be causing this?

What you think should happen instead

The warning should not be shown unless the last heartbeat was at least 30 sec earlier (default config).

How to reproduce

There are no concrete steps to reproduce it, but the warning appears in the UI after a few seconds of browsing around, or simply refresh the /health endpoint constantly.

Operating System

Debian GNU/Linux 11

Versions of Apache Airflow Providers

apache-airflow-providers-amazon==8.0.0 apache-airflow-providers-celery==3.1.0 apache-airflow-providers-cncf-kubernetes==6.1.0 apache-airflow-providers-common-sql==1.4.0 apache-airflow-providers-docker==3.6.0 apache-airflow-providers-elasticsearch==4.4.0 apache-airflow-providers-ftp==3.3.1 apache-airflow-providers-google==10.0.0 apache-airflow-providers-grpc==3.1.0 apache-airflow-providers-hashicorp==3.3.1 apache-airflow-providers-http==4.3.0 apache-airflow-providers-imap==3.1.1 apache-airflow-providers-microsoft-azure==6.0.0 apache-airflow-providers-microsoft-mssql==3.3.2 apache-airflow-providers-microsoft-psrp==2.2.0 apache-airflow-providers-microsoft-winrm==3.0.0 apache-airflow-providers-mysql==5.0.0 apache-airflow-providers-odbc==3.2.1 apache-airflow-providers-oracle==3.0.0 apache-airflow-providers-postgres==5.4.0 apache-airflow-providers-redis==3.1.0 apache-airflow-providers-sendgrid==3.1.0 apache-airflow-providers-sftp==4.2.4 apache-airflow-providers-slack==7.2.0 apache-airflow-providers-snowflake==4.0.5 apache-airflow-providers-sqlite==3.3.2 apache-airflow-providers-ssh==3.6.0

Deployment

Official Apache Airflow Helm Chart

Deployment details

Deployed on AKS with helm

Anything else

None more than in the description above.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 3
  • Comments: 24 (17 by maintainers)

Commits related to this issue

Most upvoted comments

For anyone watching who wants a workaround for Airflow 2.6.0, you can simply set AIRFLOW__SCHEDULER__JOB_HEARTBEAT_SEC to 30, or whatever you had AIRFLOW__SCHEDULER__SCHEDULER_HEALTH_CHECK_THRESHOLD set to.

We are facing the same problem after upgrading Airflow from 2.5.3 to 2.6.0

Yes. Thanks @arjunanan6 -> we just also merged #31277 with the fix to 2.6.1 (and rc3 will be up shortly for voting/testing - so if you want to take it to a spin for 2.6.1 and revert back to the original settings/configuration, that would be perfect.