airflow: Airflow Scheduler liveness probe crashing (version 2.0)

Apache Airflow version 2.0:

Kubernetes version 1.18.14

Environment: Azure - AKS

What happened:

I have just upgraded my Airflow from 1.10.13 to 2.0. I am running it in Kubernetes (AKS Azure) with Kubernetes Executor. Unfortunately, I see my Scheduler getting killed every 15-20 mins due to Liveness probe failing. Hence my pod keeps restarting.

Liveness probe

import os
os.environ['AIRFLOW__CORE__LOGGING_LEVEL'] = 'ERROR'
os.environ['AIRFLOW__LOGGING__LOGGING_LEVEL'] = 'ERROR'

from airflow.jobs.scheduler_job import SchedulerJob
from airflow.utils.db import create_session
from airflow.utils.net import get_hostname
import sys

with create_session() as session:
  job = session.query(SchedulerJob).filter_by(hostname=get_hostname()).order_by(
      SchedulerJob.latest_heartbeat.desc()).limit(1).first()

sys.exit(0 if job.is_alive() else 1)

Scheduler logs

[2021-02-16 12:18:22,422] {scheduler_job.py:933} DEBUG - No tasks to consider for execution.
[2021-02-16 12:18:22,426] {base_executor.py:147} DEBUG - 0 running task instances
[2021-02-16 12:18:22,426] {base_executor.py:148} DEBUG - 0 in queue
[2021-02-16 12:18:22,426] {base_executor.py:149} DEBUG - 32 open slots
[2021-02-16 12:18:22,427] {base_executor.py:158} DEBUG - Calling the <class 'airflow.executors.kubernetes_executor.KubernetesExecutor'> sync method
[2021-02-16 12:18:22,427] {kubernetes_executor.py:337} DEBUG - Syncing KubernetesExecutor
[2021-02-16 12:18:22,427] {kubernetes_executor.py:263} DEBUG - KubeJobWatcher alive, continuing
[2021-02-16 12:18:22,439] {scheduler_job.py:1751} INFO - Resetting orphaned tasks for active dag runs
[2021-02-16 12:18:22,452] {settings.py:290} DEBUG - Disposing DB connection pool (PID 12819)
[2021-02-16 12:18:22,460] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor490-Process' pid=12819 parent=9286 stopped exitcode=0>
[2021-02-16 12:18:23,009] {settings.py:290} DEBUG - Disposing DB connection pool (PID 12826)
[2021-02-16 12:18:23,017] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor491-Process' pid=12826 parent=9286 stopped exitcode=0>
[2021-02-16 12:18:23,594] {settings.py:290} DEBUG - Disposing DB connection pool (PID 12833)

... Many of these Disposing DB connection pool entries here

[2021-02-16 12:20:08,212] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor675-Process' pid=14146 parent=9286 stopped exitcode=0>
[2021-02-16 12:20:08,916] {settings.py:290} DEBUG - Disposing DB connection pool (PID 14153)
[2021-02-16 12:20:08,924] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor676-Process' pid=14153 parent=9286 stopped exitcode=0>
[2021-02-16 12:20:09,475] {settings.py:290} DEBUG - Disposing DB connection pool (PID 14160)
[2021-02-16 12:20:09,484] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor677-Process' pid=14160 parent=9286 stopped exitcode=0>
[2021-02-16 12:20:10,044] {settings.py:290} DEBUG - Disposing DB connection pool (PID 14167)
[2021-02-16 12:20:10,053] {scheduler_job.py:309} DEBUG - Waiting for <ForkProcess name='DagFileProcessor678-Process' pid=14167 parent=9286 stopped exitcode=0>
[2021-02-16 12:20:10,610] {settings.py:290} DEBUG - Disposing DB connection pool (PID 14180)
[2021-02-16 12:23:42,287] {scheduler_job.py:746} INFO - Exiting gracefully upon receiving signal 15
[2021-02-16 12:23:43,290] {process_utils.py:95} INFO - Sending Signals.SIGTERM to GPID 9286
[2021-02-16 12:23:43,494] {process_utils.py:201} INFO - Waiting up to 5 seconds for processes to exit...
[2021-02-16 12:23:43,503] {process_utils.py:61} INFO - Process psutil.Process(pid=14180, status='terminated', started='12:20:09') (14180) terminated with exit code None
[2021-02-16 12:23:43,503] {process_utils.py:61} INFO - Process psutil.Process(pid=9286, status='terminated', exitcode=0, started='12:13:35') (9286) terminated with exit code 0
[2021-02-16 12:23:43,506] {process_utils.py:95} INFO - Sending Signals.SIGTERM to GPID 9286
[2021-02-16 12:23:43,506] {scheduler_job.py:1296} INFO - Exited execute loop
[2021-02-16 12:23:43,523] {cli_action_loggers.py:84} DEBUG - Calling callbacks: []
[2021-02-16 12:23:43,525] {settings.py:290} DEBUG - Disposing DB connection pool (PID 7)

Scheduler deployment

---
################################
## Airflow Scheduler Deployment/StatefulSet
#################################
kind: Deployment
apiVersion: apps/v1
metadata:
  name: airflow-scheduler
  namespace: airflow
  labels:
    tier: airflow
    component: scheduler
spec:
  replicas: 1
  selector:
    matchLabels:
      tier: airflow
      component: scheduler
  template:
    metadata:
      labels:
        tier: airflow
        component: scheduler
      annotations:
        cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
    spec:
      nodeSelector:
        {}
      affinity:
        {}
      tolerations:
        []
      restartPolicy: Always
      terminationGracePeriodSeconds: 10
      serviceAccountName: airflow-scheduler
      securityContext:
        runAsUser: 50000
        fsGroup: 50000
      initContainers:
        - name: run-airflow-migrations
          image: apache/airflow:2.0.0-python3.8
          imagePullPolicy: IfNotPresent
          # Support running against 1.10.x and 2.0.0dev/master
          args: ["bash", "-c", "airflow db upgrade"]
          env:          
            # Dynamically created environment variables
            # Dynamically created secret envs
                      
            # Hard Coded Airflow Envs
            - name: AIRFLOW__CORE__FERNET_KEY
              valueFrom:
                secretKeyRef:
                  name: fernet-key
                  key: fernet-key
            - name: AIRFLOW__CORE__SQL_ALCHEMY_CONN
              valueFrom:
                secretKeyRef:
                  name: airflow-airflow-metadata
                  key: connection
            - name: AIRFLOW_CONN_AIRFLOW_DB
              valueFrom:
                secretKeyRef:
                  name: airflow-airflow-metadata
                  key: connection
      containers:
        # Always run the main scheduler container.
        - name: scheduler
          image: apache/airflow:2.0.0-python3.8
          imagePullPolicy: Always
          args: ["bash", "-c", "exec airflow scheduler"]
          env:          
            # Dynamically created environment variables
            # Dynamically created secret envs
                      
            # Hard Coded Airflow Envs
            - name: AIRFLOW__CORE__FERNET_KEY
              valueFrom:
                secretKeyRef:
                  name: fernet-key
                  key: fernet-key
            - name: AIRFLOW__CORE__SQL_ALCHEMY_CONN
              valueFrom:
                secretKeyRef:
                  name: airflow-airflow-metadata
                  key: connection
            - name: AIRFLOW_CONN_AIRFLOW_DB
              valueFrom:
                secretKeyRef:
                  name: airflow-airflow-metadata
                  key: connection
            - name: DEPENDENCIES
              value: "/opt/airflow/dags/repo/dags/dependencies/"
          # If the scheduler stops heartbeating for 5 minutes (10*30s) kill the
          # scheduler and let Kubernetes restart it
          livenessProbe:
            failureThreshold: 10
            periodSeconds: 30
            exec:
              command:
                - python
                - -Wignore
                - -c
                - |
                  import os
                  os.environ['AIRFLOW__CORE__LOGGING_LEVEL'] = 'ERROR'
                  os.environ['AIRFLOW__LOGGING__LOGGING_LEVEL'] = 'ERROR'

                  from airflow.jobs.scheduler_job import SchedulerJob
                  from airflow.utils.db import create_session
                  from airflow.utils.net import get_hostname
                  import sys

                  with create_session() as session:
                      job = session.query(SchedulerJob).filter_by(hostname=get_hostname()).order_by(
                          SchedulerJob.latest_heartbeat.desc()).limit(1).first()

                  sys.exit(0 if job.is_alive() else 1)
          resources:
            {}
          volumeMounts:
            - name: config
              mountPath: /opt/airflow/pod_templates/pod_template_file.yaml
              subPath: pod_template_file.yaml
              readOnly: true
            - name: logs
              mountPath: "/opt/airflow/logs"
            - name: config
              mountPath: "/opt/airflow/airflow.cfg"
              subPath: airflow.cfg
              readOnly: true
            - name: dags
              mountPath: /opt/airflow/dags
            - name: logs-conf
              mountPath: "/opt/airflow/config/log_config.py"
              subPath: log_config.py
              readOnly: true
            - name: logs-conf-ini
              mountPath: "/opt/airflow/config/__init__.py"
              subPath: __init__.py
              readOnly: true
        - name: git-sync
          image: "k8s.gcr.io/git-sync:v3.1.6"
          securityContext:
            runAsUser: 65533
          env:
            - name: GIT_SYNC_REV
              value: "HEAD"
            - name: GIT_SYNC_BRANCH
              value: "master"
            - name: GIT_SYNC_REPO
              value:  HIDDEN
            - name: GIT_SYNC_DEPTH
              value: "1"
            - name: GIT_SYNC_ROOT
              value: "/git"
            - name: GIT_SYNC_DEST
              value: "repo"
            - name: GIT_SYNC_ADD_USER
              value: "true"
            - name: GIT_SYNC_WAIT
              value: "60"
            - name: GIT_SYNC_MAX_SYNC_FAILURES
              value: "0"
            - name: GIT_SYNC_USERNAME
              valueFrom:
                secretKeyRef:
                  name: 'codecommit-key'
                  key: username
            - name: GIT_SYNC_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: 'codecommit-key'
                  key: password
          volumeMounts:
          - name: dags
            mountPath: /git
        # Always start the garbage collector sidecar.
        - name: scheduler-gc
          image: apache/airflow:2.0.0-python3.8
          imagePullPolicy: Always
          args: ["bash", "/clean-logs"]
          volumeMounts:
            - name: logs
              mountPath: "/opt/airflow/logs"
            - name: logs-conf
              mountPath: "/opt/airflow/config/log_config.py"
              subPath: log_config.py
              readOnly: true
            - name: logs-conf-ini
              mountPath: "/opt/airflow/config/__init__.py"
              subPath: __init__.py
              readOnly: true
      volumes:
        - name: config
          configMap:
            name: airflow-airflow-config
        - name: dags
          emptyDir: {}
        - name: logs
          emptyDir: {}
        - name: logs-conf
          configMap:
            name: airflow-airflow-config
        - name: logs-conf-ini
          configMap:
            name: airflow-airflow-config

image

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 3
  • Comments: 23 (7 by maintainers)

Most upvoted comments

I managed to fix my restart by setting up the following configs:

[kubernetes]
...
delete_option_kwargs = {"grace_period_seconds": 10}
enable_tcp_keepalive = True
tcp_keep_idle = 30
tcp_keep_intvl = 30
tcp_keep_cnt = 30

I have another Airflow instance running in AWS - Kubernetes. That one runs fine with any version, I realized the problem is with Azure Kubernetes, the rest api calls to the api server.

We are facing the same issue (scheduler liveness probe always failing and restarting the scheduler). Details:

Airflow: Version 1.10.14 & 1.10.13 Kubernetes: Version 1.20.2 (DigitalOcean) Helm airflow-stable/airflow: Version 7.16.0

Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  27m                default-scheduler  Successfully assigned airflow/airflow-scheduler-75c6c96d68-r9j4m to apollo-kaon3thg1-882c2
  Normal   Pulled     27m                kubelet            Container image "alpine/git:latest" already present on machine
  Normal   Created    27m                kubelet            Created container git-clone
  Normal   Started    27m                kubelet            Started container git-clone
  Normal   Pulled     26m                kubelet            Container image "alpine/git:latest" already present on machine
  Normal   Created    26m                kubelet            Created container git-sync
  Normal   Started    26m                kubelet            Started container git-sync
  Normal   Killing    12m (x2 over 19m)  kubelet            Container airflow-scheduler failed liveness probe, will be restarted
  Normal   Pulled     11m (x3 over 26m)  kubelet            Container image "apache/airflow:1.10.14-python3.7" already present on machine
  Normal   Created    11m (x3 over 26m)  kubelet            Created container airflow-scheduler
  Normal   Started    11m (x3 over 26m)  kubelet            Started container airflow-scheduler
  Warning  Unhealthy  6m (x12 over 21m)  kubelet            Liveness probe failed:

And the logs are basically on a loop:

1] {scheduler_job.py:280} DEBUG - Waiting for <ForkProcess(DagFileProcessor409-Process, stopped)>
[2021-02-23 22:58:35,578] {scheduler_job.py:1435} DEBUG - Starting Loop...
[2021-02-23 22:58:35,578] {scheduler_job.py:1446} DEBUG - Harvesting DAG parsing results
[2021-02-23 22:58:35,579] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
[2021-02-23 22:58:35,579] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
[2021-02-23 22:58:35,580] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
[2021-02-23 22:58:35,580] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
[2021-02-23 22:58:35,580] {scheduler_job.py:1448} DEBUG - Harvested 0 SimpleDAGs
[2021-02-23 22:58:35,581] {scheduler_job.py:1514} DEBUG - Heartbeating the executor
[2021-02-23 22:58:35,581] {base_executor.py:122} DEBUG - 0 running task instances
[2021-02-23 22:58:35,582] {base_executor.py:123} DEBUG - 0 in queue
[2021-02-23 22:58:35,582] {base_executor.py:124} DEBUG - 32 open slots
[2021-02-23 22:58:35,582] {base_executor.py:133} DEBUG - Calling the <class 'airflow.executors.kubernetes_executor.KubernetesExecutor'> sync method
[2021-02-23 22:58:35,587] {scheduler_job.py:1469} DEBUG - Ran scheduling loop in 0.01 seconds
[2021-02-23 22:58:35,587] {scheduler_job.py:1472} DEBUG - Sleeping for 1.00 seconds
[2021-02-23 22:58:36,589] {scheduler_job.py:1484} DEBUG - Sleeping for 0.99 seconds to prevent excessive logging
[2021-02-23 22:58:36,729] {settings.py:310} DEBUG - Disposing DB connection pool (PID 6719)
[2021-02-23 22:58:36,930] {settings.py:310} DEBUG - Disposing DB connection pool (PID 6717)
[2021-02-23 22:58:37,258] {scheduler_job.py:280} DEBUG - Waiting for <ForkProcess(DagFileProcessor410-Process, stopped)>
[2021-02-23 22:58:37,259] {scheduler_job.py:280} DEBUG - Waiting for <ForkProcess(DagFileProcessor411-Process, stopped)>
[2021-02-23 22:58:37,582] {scheduler_job.py:1435} DEBUG - Starting Loop...
[2021-02-23 22:58:37,583] {scheduler_job.py:1446} DEBUG - Harvesting DAG parsing results
[2021-02-23 22:58:37,584] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
[2021-02-23 22:58:37,586] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
[2021-02-23 22:58:37,588] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
[2021-02-23 22:58:37,589] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
[2021-02-23 22:58:37,591] {scheduler_job.py:1448} DEBUG - Harvested 0 SimpleDAGs
[2021-02-23 22:58:37,592] {scheduler_job.py:1514} DEBUG - Heartbeating the executor
[2021-02-23 22:58:37,593] {base_executor.py:122} DEBUG - 0 running task instances
[2021-02-23 22:58:37,602] {base_executor.py:123} DEBUG - 0 in queue
[2021-02-23 22:58:37,604] {base_executor.py:124} DEBUG - 32 open slots
[2021-02-23 22:58:37,605] {base_executor.py:133} DEBUG - Calling the <class 'airflow.executors.kubernetes_executor.KubernetesExecutor'> sync method
[2021-02-23 22:58:37,607] {scheduler_job.py:1460} DEBUG - Heartbeating the scheduler
[2021-02-23 22:58:37,620] {base_job.py:197} DEBUG - [heartbeat]
[2021-02-23 22:58:37,630] {scheduler_job.py:1469} DEBUG - Ran scheduling loop in 0.05 seconds
[2021-02-23 22:58:37,631] {scheduler_job.py:1472} DEBUG - Sleeping for 1.00 seconds
[2021-02-23 22:58:38,165] {settings.py:310} DEBUG - Disposing DB connection pool (PID 6769)
[2021-02-23 22:58:38,268] {settings.py:310} DEBUG - Disposing DB connection pool (PID 6765)
[2021-02-23 22:58:38,276] {scheduler_job.py:280} DEBUG - Waiting for <ForkProcess(DagFileProcessor412-Process, started)>
[2021-02-23 22:58:38,284] {scheduler_job.py:280} DEBUG - Waiting for <ForkProcess(DagFileProcessor413-Process, stopped)>
[2021-02-23 22:58:38,633] {scheduler_job.py:1484} DEBUG - Sleeping for 0.95 seconds to prevent excessive logging
[2021-02-23 22:58:39,331] {settings.py:310} DEBUG - Disposing DB connection pool (PID 6797)
[2021-02-23 22:58:39,361] {settings.py:310} DEBUG - Disposing DB connection pool (PID 6801)
[2021-02-23 22:58:39,589] {scheduler_job.py:1435} DEBUG - Starting Loop...
[2021-02-23 22:58:39,589] {scheduler_job.py:1446} DEBUG - Harvesting DAG parsing results
[2021-02-23 22:58:39,590] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
[2021-02-23 22:58:39,590] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
[2021-02-23 22:58:39,590] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
[2021-02-23 22:58:39,590] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
[2021-02-23 22:58:39,591] {scheduler_job.py:1448} DEBUG - Harvested 0 SimpleDAGs
[2021-02-23 22:58:39,591] {scheduler_job.py:1514} DEBUG - Heartbeating the executor
[2021-02-23 22:58:39,591] {base_executor.py:122} DEBUG - 0 running task instances
[2021-02-23 22:58:39,592] {base_executor.py:123} DEBUG - 0 in queue
[2021-02-23 22:58:39,593] {base_executor.py:124} DEBUG - 32 open slots
[2021-02-23 22:58:39,594] {base_executor.py:133} DEBUG - Calling the <class 'airflow.executors.kubernetes_executor.KubernetesExecutor'> sync method
[2021-02-23 22:58:39,596] {scheduler_job.py:1469} DEBUG - Ran scheduling loop in 0.01 seconds
[2021-02-23 22:58:39,597] {scheduler_job.py:1472} DEBUG - Sleeping for 1.00 seconds
[2021-02-23 22:58:40,305] {scheduler_job.py:280} DEBUG - Waiting for <ForkProcess(DagFileProcessor414-Process, stopped)>
[2021-02-23 22:58:40,306] {scheduler_job.py:280} DEBUG - Waiting for <ForkProcess(DagFileProcessor415-Process, stopped)>
[2021-02-23 22:58:40,599] {scheduler_job.py:1484} DEBUG - Sleeping for 0.99 seconds to prevent excessive logging
[2021-02-23 22:58:41,349] {settings.py:310} DEBUG - Disposing DB connection pool (PID 6829)
[2021-02-23 22:58:41,386] {settings.py:310} DEBUG - Disposing DB connection pool (PID 6831)
[2021-02-23 22:58:41,595] {scheduler_job.py:1435} DEBUG - Starting Loop...
[2021-02-23 22:58:41,595] {scheduler_job.py:1446} DEBUG - Harvesting DAG parsing results
[2021-02-23 22:58:41,596] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
[2021-02-23 22:58:41,597] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
[2021-02-23 22:58:41,598] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
[2021-02-23 22:58:41,599] {dag_processing.py:658} DEBUG - Received message of type DagParsingStat
[2021-02-23 22:58:41,600] {scheduler_job.py:1448} DEBUG - Harvested 0 SimpleDAGs
[2021-02-23 22:58:41,601] {scheduler_job.py:1514} DEBUG - Heartbeating the executor
[2021-02-23 22:58:41,602] {base_executor.py:122} DEBUG - 0 running task instances
[2021-02-23 22:58:41,602] {base_executor.py:123} DEBUG - 0 in queue
[2021-02-23 22:58:41,604] {base_executor.py:124} DEBUG - 32 open slots
[2021-02-23 22:58:41,604] {base_executor.py:133} DEBUG - Calling the <class 'airflow.executors.kubernetes_executor.KubernetesExecutor'> sync method
[2021-02-23 22:58:41,607] {scheduler_job.py:1469} DEBUG - Ran scheduling loop in 0.01 seconds
[2021-02-23 22:58:41,608] {scheduler_job.py:1472} DEBUG - Sleeping for 1.00 seconds

EDIT: Tried it on Airflow 1.10.13 and same thing. Updated versions above.

I’m seeing a similar issue when trying to run airflow on minikube.

$ minikube version
minikube version: v1.17.1

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.3", GitCommit:"1e11e4a2108024935ecfcb2912226cedeafd99df", GitTreeState:"clean", BuildDate:"2020-10-14T12:50:19Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.2", GitCommit:"faecb196815e248d3ecfb03c680a4507229c2a56", GitTreeState:"clean", BuildDate:"2021-01-13T13:20:00Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}

I can reproduce it always with:

$ minikube delete
$ minikube start
$ curl -OL https://github.com/apache/airflow/archive/master.zip
$ unzip master.zip
$ helm dep update airflow-master/chart/
$ helm install airflow ./airflow-master/chart

After running for a while, scheduler restarts periodically:

$ kubectl get pods
NAME                                 READY   STATUS    RESTARTS   AGE
airflow-postgresql-0                 1/1     Running   0          17m
airflow-scheduler-5567f545c8-qv7cg   2/2     Running   3          17m
airflow-statsd-5556dc96bc-twbz9      1/1     Running   0          17m
airflow-webserver-65cc966d7c-68wnv   1/1     Running   0          17m
$ kubectl describe pod airflow-scheduler-5567f545c8-qv7cg
Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  18m                   default-scheduler  Successfully assigned default/airflow-scheduler-5567f545c8-qv7cg to minikube
  Normal   Pulling    18m                   kubelet            Pulling image "apache/airflow:2.0.0"
  Normal   Pulled     17m                   kubelet            Successfully pulled image "apache/airflow:2.0.0" in 42.1015082s
  Normal   Created    17m                   kubelet            Created container wait-for-airflow-migrations
  Normal   Started    17m                   kubelet            Started container wait-for-airflow-migrations
  Normal   Started    16m                   kubelet            Started container scheduler-gc
  Normal   Pulled     16m                   kubelet            Container image "apache/airflow:2.0.0" already present on machine
  Normal   Created    16m                   kubelet            Created container scheduler-gc
  Normal   Killing    11m                   kubelet            Container scheduler failed liveness probe, will be restarted
  Normal   Pulled     11m (x2 over 16m)     kubelet            Container image "apache/airflow:2.0.0" already present on machine
  Normal   Started    11m (x2 over 16m)     kubelet            Started container scheduler
  Normal   Created    11m (x2 over 16m)     kubelet            Created container scheduler
  Warning  Unhealthy  3m20s (x27 over 16m)  kubelet            Liveness probe failed:

We are experiencing this when deploying the chart into a local installations of k3d

% kubectl get pods -n airflow-test-local            
NAME                 READY   STATUS       RESTARTS  AGE
airflow-statsd-5556dc96bc-w28cz   1/1    Running      0     7m29s
airflow-postgresql-0         1/1    Running      0     7m29s
airflow-webserver-7d5fbc5675-x6dc7  1/1    Running      0     7m29s
airflow-scheduler-7f59d9c69c-5v9pl  2/3    CrashLoopBackOff  7     7m29s
airflow-cleanup-1614276000-xbcmz   0/1    Completed     0     39s
airflow-scheduler-7f59d9c69c-cvzvx  2/3    CrashLoopBackOff  7     7m29s

We also found some interesting WARNING’s when looking into the wait-for-airflow-migrations container…

% kubectl logs airflow-webserver-7d5fbc5675-x6dc7 -c wait-for-airflow-migrations -n airflow-test-local
BACKEND=postgresql
DB_HOST=airflow-postgresql.airflow-test-local.svc.cluster.local
DB_PORT=5432
....
[2021-02-25 17:53:43,435] {migration.py:163} INFO - Context impl PostgresqlImpl.
[2021-02-25 17:53:43,436] {migration.py:170} INFO - Will assume transactional DDL.
[2021-02-25 17:53:49,416] {providers_manager.py:299} WARNING - Exception when importing 'airflow.providers.microsoft.azure.hooks.wasb.WasbHook' from 'apache-airflow-providers-microsoft-azure' package: No module named 'azure.storage.blob'
[2021-02-25 17:53:50,300] {providers_manager.py:299} WARNING - Exception when importing 'airflow.providers.microsoft.azure.hooks.wasb.WasbHook' from 'apache-airflow-providers-microsoft-azure' package: No module named 'azure.storage.blob'
[2021-02-25 17:53:51,345] {<string>:35} INFO - Waiting for migrations... 1 second(s)
[2021-02-25 17:53:52,349] {<string>:35} INFO - Waiting for migrations... 2 second(s)
[2021-02-25 17:53:53,352] {<string>:35} INFO - Waiting for migrations... 3 second(s)
[2021-02-25 17:53:54,355] {<string>:35} INFO - Waiting for migrations... 4 second(s)
[2021-02-25 17:53:55,358] {<string>:35} INFO - Waiting for migrations... 5 second(s)
echiu@MT-308022 chart %

I don’t think that Azure hooks should be interpreted by default…