aws-load-balancer-controller: ALB sending requests to pods after ingress controller deregisters them leading to 504s

I have the following ingress controller defined with the relevant bits being that it is target-type ip with a 30 second deregistration_delay.

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: staging-ingress
  namespace: "default"
  annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/target-type: ip
    alb.ingress.kubernetes.io/certificate-arn: REDACTED
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/ssl-policy: ELBSecurityPolicy-TLS-1-2-2017-01
    alb.ingress.kubernetes.io/healthcheck-path: /up
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}, {"HTTPS": 443}]'
    alb.ingress.kubernetes.io/target-group-attributes: deregistration_delay.timeout_seconds=30
    alb.ingress.kubernetes.io/load-balancer-attributes: idle_timeout.timeout_seconds=30,access_logs.s3.enabled=true,access_logs.s3.bucket=REDACTED,access_logs.s3.prefix=REDACTED
    alb.ingress.kubernetes.io/actions.ssl-redirect: '{"Type": "redirect", "RedirectConfig": { "Protocol": "HTTPS", "Port": "443", "StatusCode": "HTTP_301"}}'
spec:
  rules:
    - host: REDACTED
      http:
        paths:
          - path: /*
            backend:
              serviceName: ssl-redirect
              servicePort: use-annotation
          - path: /
            backend:
              serviceName: graphql-api
              servicePort: 80

When I delete a pod, either manually or as part of a rolling deploy, I see 504s returned from ALB. 504s are returned when ALB cannot form a connection to its target within 10s. Here is one such message from the ALB logs:

https 2019-11-06T01:35:39.438256Z app/bd528925-default-stagingin-1b90/1a99d4560435cd18 54.248.220.14:24888 172.18.158.197:8080 -1 -1 -1 504 - 229 303 "GET REDACTED:443/up HTTP/1.1" "Amazon-Route53-Health-Check-Service (ref af4d4b0b-e40c-4f18-a18f-12f472889080; report http://amzn.to/1vsZADi)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2 arn:aws:elasticloadbalancing:us-east-1:REDACTED:targetgroup/bd528925-7628515b83cd9ca7cdb/50afd948cb0a6587 "Root=1-5dc22361-b0d66bf23f845202c2c390ec" "REDACTED" "arn:aws:acm:us-east-1:REDACTED:certificate/REDACTED" 3 2019-11-06T01:35:29.436000Z "forward" "-" "-" "172.18.158.197:8080" "-"

There’s a lot going on there, but the important part is that the request is received at 2019-11-06T01:35:29.436000Z and the error is emitted 10s later at 2019-11-06T01:35:39.438256Z.

I investigated the ingress controller logs and I can see that the pod in question, 172.18.158.197:8080, is deregistered at 2019-11-06T01:35:25.891546Z 4 seconds prior to when the above request is received.

I1106 01:35:25.891546       1 targets.go:95] default/staging-ingress: Removing targets from arn:aws:elasticloadbalancing:us-east-1:REDACTED:targetgroup/bd528925-7628515b83cd9ca7cdb/50afd948cb0a6587: 172.18.158.197:8080

My understanding is that once a target is set to “deregistering” ALB will not forward it anymore requests. It’s unclear to me how this requests seems to be breaking that rule - any thoughts?

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 29
  • Comments: 34 (1 by maintainers)

Most upvoted comments

Checking back in here. I think I was able to resolve this issue with the preStop sleep workaround to achieve 100% availability though deployments.

Couple things I changed from my initial config:

# Extend the pods shutdown grace period from the default of 30s to 60s.
# This goes in the pod template spec.
terminationGracePeriodSeconds: 60

# Increase the sleep before SIGTERM to 25s. I had this as 5s previously and it wasn't enough.
lifecycle:
  preStop:
    exec:
      command: ["sleep", "25"]

Extending the sleep time allows ALB to send a few requests even after the dereregistration time without the pod rejecting them. My pod needs max 30s to gracefully answer all requests and terminate, so the 25s sleep + 30s = 55s. The extended terminationGracePeriodSeconds of 60s allows that whole shutdown process to happen without killing the pod.

I used https://github.com/JoeDog/siege to send constant concurrent requests to the load balancer while deploying and achieved 100% availability.

We are able to reproduce this 100% of time with target-type as ip. We added 30 seconds sleep delay for preStop which resolved 502s. However, 504s happen even after target has been removed from ALB. We have verified that 504s are caused by ALB forwarding requests to target that was removed.

Any update on this? This should be reopened given the comments here, IMO.

@Elyahou I don’t feel too bad because this is an ongoing issue of over a year at this point but I moved on to kong-ingress-controller and have been very happy with it. The things that ALB handled nicely - for me primarily was the AWS certs - have now all been handled by adding a couple more pieces, namely cert-manager and external-dns so that I no longer worry about the ALB lifecycle conflicts.

ALB ingress seemed like an awesome idea and was dead-simple to use, but the inability to sync up the lifecycles b/t pod and ALB delivery just became too much.

This is an issue for us too. A BIG one.

…It’s incredible to me how this has been allowed to go this long without concrete fix or resolution 😂 . Talk about “Bias for Action” and “Ownership” 🎃

AWS philosophizes and writes literature on the 6 Well-Architected Pillars.

Also releases a Load Balancer Ingress Controller that regularly throws 50xs, because it doesn’t work as advertised/as expected, even though the customer using their tool is doing everything in exactly the way they’re being told to.

…And then lets the issue just kinda linger and languish for years…and then tells the customer that the “fix” is just a bunch of workarounds that impact the customer’s kubernetes cluster in other ways…like delayed pod termination, which then impacts the scale-speed and cost characteristics of the cluster. 👍🏼

/remove-lifecycle stale

I think I might be currently encountering this issue. I’ll have to do further tests to verify, but it seems like this issue should be reopened.

For those who use the setup:

  • ALB
  • AWS CNI
  • ALB Ingress Controller
  • Linkerd v2

…here’s an example of how to avoid the problem:

https://github.com/linkerd/linkerd2/issues/3747#issuecomment-557593979

I am not an expert in ALB internals but I feel it happens because of cross-region nature (and, maybe, multi-tenant). It takes time to propagate changes to all instances, so ALB will send some traffic for a while even after a target is requested to be deregister. So the overall “sleep” configuration for preStop hook should be calculated based on:

  • time for ALB Ingress Controller to send the deregistration API call, especially if AWS WAF is in play
  • Deregistration Delay configured for the ALB
  • some additional time for ALB to propagate all the changes (from my tests it takes up to 5 seconds, but it can be really different in your setup)
  • and after the main container exited, some additional time (yet another preStop hook) specifically for sidecar containers such as Istio, Linkerd or any other to keep internal network up till the very end.