ingress-nginx: ingress-nginx-controller 1.5.1 is not registering new nodes on ELB
What happened:
After upgrading to nginx-ingress 1.5.1, EC2 instance nodes are no longer being registered to the AWS load balancer when using a Classic ELB. As a result, no instances may be registered to the load balancer, and ingresses using the ingress controller return 503 Service Unavailable: Back-end server is at capacity.
If you manually update any seemingly any value in the ingress-nginx-controller service, this triggers the ELB to update, but if you don’t do this it will never reconcile the node changes.
What you expected to happen:
Instances are registered to the load balancer are necessary and ingresses work normally.
NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.):
NGINX Ingress controller
Release: v1.5.1
Build: d003aae913cc25f375deb74f898c7f3c65c06f05
Repository: https://github.com/kubernetes/ingress-nginx
nginx version: nginx/1.21.6
Kubernetes version (use kubectl version):
1.24
Environment:
- AWS EKS 1.24
- Amazon Linux 2
How was the ingress-nginx-controller installed:
Customized helm chart
How to reproduce this issue:
This problem is specific to AWS load balancers and won’t be reproducible in minikube/kind
Anything else we need to know:
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 2
- Comments: 38 (4 by maintainers)
The TAM gonna give me update from internal team sometimes this week
Long story short, the aws out of tree cloud controller manager introduced a change which caused this misbehavior you all see reported here. But the fix was merged in and tagged for v1.26 which corresponds to k8s 1.26, and I don’t think they backport the fix to previous versions < 1.26. And EKS 1.26 won’t be released for at least another quarter or so this leaves us all hanging looking for workaround or alternative. My workaround is install aws load balancer controller and use it only as the external cloud controller manager for nginx-ingress load balancer/ASG/Target group. But doing so will not preserve the existing LB, so DNS change is expected as a result. Of course you can switch to aws alb controller entirely for everything ingress and service exposure to ELB.
This appears to be a regression in Kubernetes 1.24. When
externalTrafficPolicyis set toLocalthen new nodes will not be automatically registered with the ELB. It doesn’t matter what version of the ingress controller you’re using, because it’s a problem with the in-tree LoadBalancer service type. If you changeexternalTrafficPolicytoClusterthen it will work as expected.It’s worth noting that the nlb-with-tls-termination manifest sets
externalTrafficPolicy: localI suspect this is only a problem with Classic ELBs but I haven’t tested with other load balancer types.
It’s possible this may be an EKS/Kubernetes issue since it seems to be more related to the
LoadBalancerservice type than anything to do with the actual ingress controller. I’ve opened a support case and will hopefully report back soon with more info.Was the fix for this released in 1.25 after all? I haven’t done extensive testing, but after upgrading my cluster to 1.25 I’m now seeing nodes correctly registered in Target Groups on the NLB. I understand the underlying bug was in the Cloud Controller Manager and not ingress-nginx itself. I’ve tested the behavior with both ingress-nginx and Traefik.
See https://github.com/kubernetes/ingress-nginx/issues/9041#issuecomment-1252255678