ingress-gce: Small amount of 502s during rollouts with NEG even with "mitigations"

From #583:

For the NEG Programming Latency issue, even with minReadySeconds and terminationGracePeriodSeconds set to 180, I’m still seeing a small amount of 502s during rollouts. Is this expected? My test was during a rollout of 11 pods, sending 100 requests per second to them overall. I saw 96 502s over the course of the rollout.

I’d like to understand the cause of this issue. My current guess is that Kubernetes terminates the pod and stops routing traffic to it, but the NEG is updated after the pod is terminated. If so, could we perhaps solve this with a pre-stop hook that detaches the pod from the NEG before terminating?

About this issue

Original URL
State: closed
Created 5 years ago
Reactions: 7
Comments: 45 (19 by maintainers)

Most upvoted comments

So when the pod gets deleted, 2 things happen in parallel:

A. Kubelet sends SIGTERM to the containers. B. Endpoint gets deprogrammed from the LB.

B takes more time as A usually happens very fast. So it is recommended to configure the pod to do 2 things when SIGTERM is recieved:

Failed the LB health check. (This tells the LB to stop sending traffic no matter B is completed or not)
Keep responding to traffic.

freehan on Sep 20, 2019

I should make it a bit more clear. When pod is deleted (added deletion timestamp), these things happens in parallel:

kubelet send SIGTERM to pods
endpoint controller update endpoints for corresponding services (if pod is selected by the service)

After Endpoints resource got updated (pod endpoint removed from ready addresses), service programming starts:

kube-proxy programs iptables to remove pod from service backend
other LB provider reacts to remove pod from LB backend.

There is a small time gap between step 1 and step 3&4. But programming iptables is generally faster than programming LBs, hence the gap is more visible.

To avoid service disruption during pod deletion, the key is to keep serving requests during graceful termination. That will leave enough time for LB or iptables to get fully programmed.

Ideally, K8s should remove the pod from service backend first and then send SIGTERM to the containers (Assuming most containers does not handle SIGTERM properly). But currently there is no such mechanism in K8s and the service life cycle is loosely coupled with pod life cycle. Hence, it is recommended to have proper SIGTERM handling on pods.

freehan on Jul 25, 2019

For generic HTTP server, yes, it means keep serving existing requests, but should be sending connection: close on the response so client have to reconnect. Servers that have a “lame duck” state handling that can actively reject existing sessions with the clients would start behaving differently.

bowei on Jul 25, 2019