aws-load-balancer-controller: 502/503 During deploys and/or pod termination
Hi! First of all, I appreciate the community and all their work on this project, it is very helpful and a good solution to route directly to pods from an ALB.
However, during testing, I’ve noticed intermittent 502/503s during deploys of our statefulset. My current hypothesis is that during a deploy, the statefulset controller kills a pod in need of updates, and there is latency between this happening and the alb ingress controller updating the alb target to draining. During this delay, requests are sent to the terminating pod and return 502 (our nginx sidecar) and/or 503 (aws alb).
Has anyone else seen this problem, and potentially have a solution for it? Ideally we’d remove the pod from the alb target group before killing the pod, if this is in fact what is happening.
I have the following Service and Ingress:
---
kind: Service
apiVersion: v1
metadata:
name: svc-headless
namespace: dev
spec:
clusterIP: None
selector:
app: svc
ports:
- name: http
port: 9000
Ingress
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: svc-external
namespace: dev
annotations:
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/target-type: ip
alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}, {"HTTPS":443}]'
alb.ingress.kubernetes.io/actions.ssl-redirect: '{"Type": "redirect", "RedirectConfig": { "Protocol": "HTTPS", "Port": "443", "StatusCode": "HTTP_301"}}'
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/security-groups: sg-xxxxxxxxxx,sg-yyyyyyyyyyy
alb.ingress.kubernetes.io/healthcheck-interval-seconds: 5
alb.ingress.kubernetes.io/healthcheck-timeout-seconds: 3
alb.ingress.kubernetes.io/success-codes: 200,201,401
alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-1:XXXXXXXXXX:certificate/uuid
alb.ingress.kubernetes.io/subnets: subnet-aaaaa,subnet-bbbbb,subnet-cccc
labels:
app: svc
spec:
rules:
- http:
paths:
- path: /*
backend:
serviceName: ssl-redirect
servicePort: use-annotation
- path: /*
backend:
serviceName: svc-headless
servicePort: 9000
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 20
- Comments: 28 (3 by maintainers)
@douglaz See this thread which covers the same issue with a couple of solutions: https://github.com/kubernetes-sigs/aws-alb-ingress-controller/issues/1064
tldr:
--feature-gates=waf=falseto alb-ingress-controller container args. Right now the controller makes WAF requests for every deploy, and AWS throttling these requests can cause a delay in updating targets. If you’re not using waf, skipping it entirely prevents these delays./remove-lifecycle rotten
Hi @M00nF1sh, thanks for the response.
That would work, however it gets us back to the exact problem I’m trying to solve. We have a large amount of instances, in various node groups. This quickly balloons the amount of attached instances to the target group. The pods we’d like to direct traffic to belong to a small instance group – so this would work, if we could select those ec2 instances (k8s nodes) directly. Is there a way to filter or limit which cluster nodes get attached (via kubernetes node label, ec2 tag, or otherwise) ?