istio: Istio Gateway does not have readiness/liveness enabled
Describe the bug
We rely on Istio Gateway to point the traffic into the cluster. Through our testing, we found out unready Envoy process can be added to the fleet to receive the traffic. This will result to Connection Refused
at the client side.
More alarmingly, we also found out that sometimes Envoy process fail to establish a connection with the pilot, please see the log from the gateway pod below:
Expected behavior Only healthy gateway should be added to the fleet to receive traffic.
Steps to reproduce the bug Deploy the istio gateway and expose it as a NodePort service. Scale up the gateway multiple times while sending some traffic to the gateway, you should be able to reproduce the bug.
I added the readinessProbe
manually and it reduced the ConnectionRefuse
errors but sometimes as the above error log showed, Envoy proxy fails to start and still result ConnectionRefuse
.
Version Kubernetes: 1.12 Istio: 1.1-snapshot-4
Installation Using Helm
Environment 4.14.67-coreos
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 4
- Comments: 20 (16 by maintainers)
Commits related to this issue
- Add readiness check for Ingress Gateway (#3063) (#11001) Enabling the same readiness probe for Ingress Gateway that is being used for sidecars. — committed to philrud/istio by deleted user 5 years ago
- Add readiness check for Ingress Gateway (#3063) (#11001) (#11548) Enabling the same readiness probe for Ingress Gateway that is being used for sidecars. — committed to istio/istio by philrud 5 years ago
- Add readiness check for Ingress Gateway (#3063) (#11001) (#11548) Enabling the same readiness probe for Ingress Gateway that is being used for sidecars. — committed to louiscryan/istio by philrud 5 years ago
- Add readiness check for Ingress Gateway (#3063) (#11001) (#11548) Enabling the same readiness probe for Ingress Gateway that is being used for sidecars. — committed to smawson/istio by philrud 5 years ago
- Merge release-1.1 to master (#11722) * Incremental EDS only need updated service names (#11117) * Configure envoy_bootstrap_v2.json to use the configured admin port (#11214) * Configure envoy_b... — committed to istio/istio by deleted user 5 years ago
- Add readiness check for Ingress Gateway (#3063) (#11001) (#11548) Enabling the same readiness probe for Ingress Gateway that is being used for sidecars. — committed to louiscryan/istio by philrud 5 years ago
- Merge master into collab-galley (#12630) * Merge release-1.1 to master (#11722) * Incremental EDS only need updated service names (#11117) * Configure envoy_bootstrap_v2.json to use the configure... — committed to istio/istio by ozevren 5 years ago
@berstend Hi. As @tcnghia said, we just add the probe manually. But please read my comments in this thread, adding
readinessProbe
does not address the problem entirely. You will still have the risk when just scaling up the gateway deployments for example. Because there is no guarantee which will be faster, the configuration streaming from pilot or the actual production requests.My suggestion is prepare a new deployment and use
istioctl proxy-status
to do a sync check. Once all the tests are passed, then update the gateway service manifest.Hopefully it helps.