istio: Istio galley failing liveness probes
Describe the bug
The istio-galley pod fails the Liveness probe which causes the container to be constantly restarted.
Expected behavior
I expect the istio-galley not to constantly fail
Steps to reproduce the bug
Install Istio using the helm template.
Version
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
aks-agentpool-36280988-0 Ready agent 2h v1.10.6
aks-agentpool-36280988-1 Ready agent 2h v1.10.6
aks-agentpool-36280988-2 Ready agent 2h v1.10.6
$ istioctl version
Version: 1.0.0
GitRevision: 3a136c90ec5e308f236e0d7ebb5c4c5e405217f4
User: root@71a9470ea93c
Hub: gcr.io/istio-release
GolangVersion: go1.10.1
BuildStatus: Clean
Is Istio Auth enabled or not?
$ helm template install/kubernetes/helm/istio --name istio --namespace istio-system > $HOME/istio.yaml
$ kubectl create -f $HOME/istio.yaml
Environment Azure
Cluster state
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 31 (11 by maintainers)
Commits related to this issue
- Improve config validation health checks Readiness and liveliness probes were failing on some providers because GET requests were blocking for multiple seconds. Mitigate the issue by decreasing the fr... — committed to ayj/istio by ayj 6 years ago
- Improve config validation health checks (#7605) Readiness and liveliness probes were failing on some providers because GET requests were blocking for multiple seconds. Mitigate the issue by decreas... — committed to istio/istio by ayj 6 years ago
- Improve config validation health checks (#7605) Readiness and liveliness probes were failing on some providers because GET requests were blocking for multiple seconds. Mitigate the issue by decreasin... — committed to ayj/istio by ayj 6 years ago
- Improve config validation health checking (#8118) * Improve config validation health checks (#7605) Readiness and liveliness probes were failing on some providers because GET requests were block... — committed to istio/istio by ayj 6 years ago
@ChrisPei I’ve tried the latest daily build where is some fixes were added already, unfortunately, galley still not stable yet.
Readiness and liveliness probes were failing on some providers because GET requests were blocking for multiple seconds. Mitigate the issue by decreasing the frequency of GET (to avoid possible throttling) and increase the acceptable health check interval from 4s to 10s.
See https://github.com/istio/istio/pull/7605 for short term fix. Longer term fix requires switching to proper controller-style reconciliation. That work should be aligned with the sidecar injector.
related https://github.com/Azure/AKS/issues/620
Let’s close this issue since the galley specific problem seems to be addressed in the latest 1.0.1 release branch. Let’s use https://github.com/istio/istio/issues/7675 to track the telemetry/policy liveness issue.
I had to decouple the health check and reconciliation updates to make this work on AKS (see https://github.com/istio/istio/pull/7986). I ran this overnight and didn’t see any pod restarts.
I’ll check again tomorrow on my azure test cluster with the latest changes from #7605.