istio: Istio galley failing liveness probes

Describe the bug

The istio-galley pod fails the Liveness probe which causes the container to be constantly restarted.

Expected behavior

I expect the istio-galley not to constantly fail

Steps to reproduce the bug

Install Istio using the helm template.

Version

$ kubectl get nodes
NAME                       STATUS    ROLES     AGE       VERSION
aks-agentpool-36280988-0   Ready     agent     2h        v1.10.6
aks-agentpool-36280988-1   Ready     agent     2h        v1.10.6
aks-agentpool-36280988-2   Ready     agent     2h        v1.10.6

$ istioctl version
Version: 1.0.0
GitRevision: 3a136c90ec5e308f236e0d7ebb5c4c5e405217f4
User: root@71a9470ea93c
Hub: gcr.io/istio-release
GolangVersion: go1.10.1
BuildStatus: Clean

Is Istio Auth enabled or not?

$ helm template install/kubernetes/helm/istio --name istio --namespace istio-system > $HOME/istio.yaml
$ kubectl create -f $HOME/istio.yaml

Environment Azure

Cluster state

istio-dump.tar.gz

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 31 (11 by maintainers)

Commits related to this issue

Most upvoted comments

@ChrisPei I’ve tried the latest daily build where is some fixes were added already, unfortunately, galley still not stable yet.

Readiness and liveliness probes were failing on some providers because GET requests were blocking for multiple seconds. Mitigate the issue by decreasing the frequency of GET (to avoid possible throttling) and increase the acceptable health check interval from 4s to 10s.

See https://github.com/istio/istio/pull/7605 for short term fix. Longer term fix requires switching to proper controller-style reconciliation. That work should be aligned with the sidecar injector.

related https://github.com/Azure/AKS/issues/620

Let’s close this issue since the galley specific problem seems to be addressed in the latest 1.0.1 release branch. Let’s use https://github.com/istio/istio/issues/7675 to track the telemetry/policy liveness issue.

istio-system   grafana-777bdf8c65-hvff5                    1/1       Running            0          7h
istio-system   istio-citadel-7bcf9c887-kkpdp               1/1       Running            0          7h
istio-system   istio-cleanup-secrets-tk2nh                 0/1       Completed          0          7h
istio-system   istio-egressgateway-ffb7b49b9-ctfqv         1/1       Running            0          7h
istio-system   istio-galley-5545644d94-vcjpt               1/1       Running            0          7h
istio-system   istio-grafana-post-install-ttx5p            0/1       Completed          2          7h
istio-system   istio-ingressgateway-5fcfd54b78-zn2w7       1/1       Running            0          7h
istio-system   istio-pilot-76454c4c7b-vzcxs                2/2       Running            0          7h
istio-system   istio-policy-6cdcf67597-8kmfp               1/2       CrashLoopBackOff   7          10m
istio-system   istio-policy-6cdcf67597-bzcx9               1/2       CrashLoopBackOff   137        6h
istio-system   istio-policy-6cdcf67597-qm7s9               2/2       Running            0          7h
istio-system   istio-policy-6cdcf67597-vm9ch               1/2       CrashLoopBackOff   119        5h
istio-system   istio-policy-6cdcf67597-x77nf               1/2       CrashLoopBackOff   99         4h
istio-system   istio-security-post-install-j59xq           0/1       Completed          2          7h
istio-system   istio-sidecar-injector-567d44fdd6-zrs79     1/1       Running            2          7h
istio-system   istio-statsd-prom-bridge-549d687fd9-bhfp6   1/1       Running            0          7h
istio-system   istio-telemetry-85d747d5fb-l49mk            2/2       Running            16         7h
istio-system   istio-telemetry-85d747d5fb-mr8zv            2/2       Running            0          7h
istio-system   istio-telemetry-85d747d5fb-p8hp9            2/2       Running            19         6h
istio-system   istio-telemetry-85d747d5fb-r7rtq            2/2       Running            51         6h
istio-system   istio-telemetry-85d747d5fb-zh792            1/2       CrashLoopBackOff   141        6h
istio-system   istio-tracing-7596597bd7-9z8dm              1/1       Running            0          7h
istio-system   prometheus-6ffc56584f-n9jk7                 1/1       Running            0          7h
istio-system   servicegraph-b8f946545-82zn2                1/1       Running            0          7h

Events:
  Type     Reason     Age                 From                               Message
  ----     ------     ----                ----                               -------
  Warning  Unhealthy  13m (x411 over 6h)  kubelet, aks-agentpool-10685398-2  Liveness probe failed: Get http://10.244.0.11:9093/version: dial tcp 10.244.0.11:9093: connect: connection refused
  Warning  BackOff    3m (x1625 over 6h)  kubelet, aks-agentpool-10685398-2  Back-off restarting failed container

I had to decouple the health check and reconciliation updates to make this work on AKS (see https://github.com/istio/istio/pull/7986). I ran this overnight and didn’t see any pod restarts.

I’ll check again tomorrow on my azure test cluster with the latest changes from #7605.