ingress-nginx: ingress-nginx-controller 1.5.1 is not registering new nodes on ELB

What happened:

After upgrading to nginx-ingress 1.5.1, EC2 instance nodes are no longer being registered to the AWS load balancer when using a Classic ELB. As a result, no instances may be registered to the load balancer, and ingresses using the ingress controller return 503 Service Unavailable: Back-end server is at capacity.

If you manually update any seemingly any value in the ingress-nginx-controller service, this triggers the ELB to update, but if you don’t do this it will never reconcile the node changes.

What you expected to happen:

Instances are registered to the load balancer are necessary and ingresses work normally.

NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.):

NGINX Ingress controller
  Release:       v1.5.1
  Build:         d003aae913cc25f375deb74f898c7f3c65c06f05
  Repository:    https://github.com/kubernetes/ingress-nginx
  nginx version: nginx/1.21.6

Kubernetes version (use kubectl version):

1.24

Environment:

AWS EKS 1.24
Amazon Linux 2

How was the ingress-nginx-controller installed:

Customized helm chart

How to reproduce this issue:

This problem is specific to AWS load balancers and won’t be reproducible in minikube/kind

Anything else we need to know:

About this issue

Original URL
State: closed
Created 2 years ago
Reactions: 2
Comments: 38 (4 by maintainers)

Most upvoted comments

@DekusDenial sorry for trouble, but maybe You have some news?

The TAM gonna give me update from internal team sometimes this week

DekusDenial on Jan 13, 2023

See #9041 (comment)

I gave up and gonna switch to alb controller as the out of tree controller manager

Long story short, the aws out of tree cloud controller manager introduced a change which caused this misbehavior you all see reported here. But the fix was merged in and tagged for v1.26 which corresponds to k8s 1.26, and I don’t think they backport the fix to previous versions < 1.26. And EKS 1.26 won’t be released for at least another quarter or so this leaves us all hanging looking for workaround or alternative. My workaround is install aws load balancer controller and use it only as the external cloud controller manager for nginx-ingress load balancer/ASG/Target group. But doing so will not preserve the existing LB, so DNS change is expected as a result. Of course you can switch to aws alb controller entirely for everything ingress and service exposure to ELB.

DekusDenial on Feb 16, 2023

This appears to be a regression in Kubernetes 1.24. When externalTrafficPolicy is set to Local then new nodes will not be automatically registered with the ELB. It doesn’t matter what version of the ingress controller you’re using, because it’s a problem with the in-tree LoadBalancer service type. If you change externalTrafficPolicy to Cluster then it will work as expected.

It’s worth noting that the nlb-with-tls-termination manifest sets externalTrafficPolicy: local

I suspect this is only a problem with Classic ELBs but I haven’t tested with other load balancer types.

johnjeffers on Dec 15, 2022

It’s possible this may be an EKS/Kubernetes issue since it seems to be more related to the LoadBalancer service type than anything to do with the actual ingress controller. I’ve opened a support case and will hopefully report back soon with more info.

johnjeffers on Dec 5, 2022

but still have no first hand info on why the fix won’t come for <1.26

Was the fix for this released in 1.25 after all? I haven’t done extensive testing, but after upgrading my cluster to 1.25 I’m now seeing nodes correctly registered in Target Groups on the NLB. I understand the underlying bug was in the Cloud Controller Manager and not ingress-nginx itself. I’ve tested the behavior with both ingress-nginx and Traefik.

joshuaganger on Mar 13, 2023

See https://github.com/kubernetes/ingress-nginx/issues/9041#issuecomment-1252255678

DekusDenial on Dec 19, 2022