ingress-gce: Small amount of 502s during rollouts with NEG even with "mitigations"
From #583:
For the NEG Programming Latency issue, even with minReadySeconds and terminationGracePeriodSeconds set to 180, I’m still seeing a small amount of 502s during rollouts. Is this expected? My test was during a rollout of 11 pods, sending 100 requests per second to them overall. I saw 96 502s over the course of the rollout.
I’d like to understand the cause of this issue. My current guess is that Kubernetes terminates the pod and stops routing traffic to it, but the NEG is updated after the pod is terminated. If so, could we perhaps solve this with a pre-stop hook that detaches the pod from the NEG before terminating?
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 7
- Comments: 45 (19 by maintainers)
So when the pod gets deleted, 2 things happen in parallel:
A. Kubelet sends SIGTERM to the containers. B. Endpoint gets deprogrammed from the LB.
B takes more time as A usually happens very fast. So it is recommended to configure the pod to do 2 things when SIGTERM is recieved:
I should make it a bit more clear. When pod is deleted (added deletion timestamp), these things happens in parallel:
After Endpoints resource got updated (pod endpoint removed from ready addresses), service programming starts:
There is a small time gap between step 1 and step 3&4. But programming iptables is generally faster than programming LBs, hence the gap is more visible.
To avoid service disruption during pod deletion, the key is to keep serving requests during graceful termination. That will leave enough time for LB or iptables to get fully programmed.
Ideally, K8s should remove the pod from service backend first and then send SIGTERM to the containers (Assuming most containers does not handle SIGTERM properly). But currently there is no such mechanism in K8s and the service life cycle is loosely coupled with pod life cycle. Hence, it is recommended to have proper SIGTERM handling on pods.
For generic HTTP server, yes, it means keep serving existing requests, but should be sending connection: close on the response so client have to reconnect. Servers that have a “lame duck” state handling that can actively reject existing sessions with the clients would start behaving differently.