kubernetes: Changing an NLB service from externalTrafficPolicy Cluster to Local results in a stuck Service
What happened:
I had an NLB Service object (in AWS naturally) where we decided we needed the source IP address and so changed the ExternalTrafficPolicy from Cluster to Local, but didn’t delete the Service object - and didn’t want to, because that would break Route53 DNS entries.
The Service started throwing Warning and 400 errors from the AWS API and did not self-heal.
Warning CreatingLoadBalancerFailed 9m17s service-controller Error creating load balancer (will retry): failed to ensure load balancer for service XXX/YYY: Error modifying target group health check: "InvalidConfigurationRequest: You cannot change the health check protocol for a target group with the TCP protocol\n\tstatus code: 400, request id: f8e9446d-b7d3-11e9-886c-3f71b1696d69"
What you expected to happen: The controller should delete out the invalid target group and recreate it in order to configure the NLB as requested.
How to reproduce it (as minimally and precisely as possible):
- Create a LoadBalancer Service annotated as an NLB with default ExternalTrafficPolicy, backed by some running pods.
- Wait for the NLB to be fully provisioned.
- Change the service to have ExternalTrafficPolicy Local.
Anything else we need to know?:
Environment:
- Kubernetes version (use
kubectl version): 1.13 (eks latest version) - Cloud provider or hardware configuration: AWS
- OS (e.g:
cat /etc/os-release): unknown - eks control plane - Kernel (e.g.
uname -a): unknown - eks control plane - Install tools: eksctl
- Network plugin and version (if this is a network-related bug): unknown - eks control plane
- Others:
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 4
- Comments: 22 (9 by maintainers)
Fix in #94546 is part of Kubernetes 1.20.
I was able to replicate this in the master branch. I’ll investigate a fix for this.
/reopen