aws-load-balancer-controller: Support Custom Pod Status for PodReadinessGate to Block Premature Pod Termination

Presently when using IP target mode on the ALB, it’s difficult to guarantee zero downtime rolling updates. This is because there is additional state in the ALB Target Group which is not propagated back to Kubernetes: initial,draining, and healthy.

This was discussed in: https://github.com/kubernetes-sigs/aws-alb-ingress-controller/issues/660 & https://github.com/kubernetes-sigs/aws-alb-ingress-controller/issues/814

This issue is to discuss adding support to the ALB controller for setting a custom Pod Status which mirrors the ALB TargetGroup. This can then be used with PodReadinessGate and MaxUnavailable / PodDisruptionBudget to ensure a rolling update does not proceed faster than the new Pods becoming healthy on the ALB.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 34
  • Comments: 29 (8 by maintainers)

Most upvoted comments

I’m working on this.

We were able to slow down the termination of pods to let the ALB catch up thus preventing 504s by using lifecycle prestop hook

        lifecycle:
          preStop:
            exec:
              command:
              - /bin/bash
              - -c
              - sleep 25
              - apachectl -k graceful-stop

@nirnanaaa @devkid were you able to make some progress on this? I think we need to pick this up as the ingress controller is unable to do zero-downtime deployments in the current state.

I won’t have time to continue working on this in the next two weeks, but I have pushed an initial implementation here: https://github.com/devkid/aws-alb-ingress-controller/tree/feature/pod-readiness-gate (it compiles and has documentation, but I didn’t test it yet nor did I adjust the unit tests).

It works by reconciling the pod conditions in the same step where it reconciles the ALB target group state. I think it wouldn’t work in its current state because right now only the Ready endpoints of a service are registered with the ALB. When a pod declares the ALB readiness gate, it would never turn Ready because it would wait for the readiness gate (turn healthy in ALB), which would never be fulfilled because the pod was never registered with the ALB (because it was not Ready yet) - so endless loop. To solve this, the ALB ingress controller would need to support the publishNotReadyAddresses attribute of a Kubernetes services and it would need to be set to true if one would want to use this pod readiness gate feature (then all pods - also unready ones - would be registered with the target group, pass the health check and only then appear Ready in kubernetes).

If someone feels like picking up from where I left, feel free to do so. Otherwise I would probably continue in ~2-3 weeks.

@nirnanaaa et al.: were you able to get a new PR set up? It seems like this issue has stalled and it’s a pretty large blocker for anyone wanting to use IP target mode in a production environment. Thanks!

I will setup a new PR, to propose such a controller.

We are having a similar issue as well. When pods are terminated the ALB Ingress Controller rapidly updates the ALB target group however the changes take a few seconds to take effect which causes a short period where clients may get 504s.

We are using IP mode meaning each Pod is in the target group since we require the real ALB source IP not the source IP kube-proxy uses in order to use NetworkPolicies.

To workaround this we are taking the approach of running kube-proxy without source NAT (https://kubernetes.io/docs/tutorials/services/source-ip/) though this has draw backs as well but masks the issue to a degree.

What I would really like to see is one of two things:

  • ALB to keep pace with the rapid changes in k8s
  • k8s give the ALB time to catch up by delaying termination of the pod.