kops: kops rolling-update doesn't de-register instances from ELB network load balancer gracefully

1. What kops version are you running? The command kops version, will display this information.

1.20.0

2. What Kubernetes version are you running? kubectl version will print the version if a cluster is running or provide the Kubernetes version specified as a kops flag.

1.20.5

3. What cloud provider are you using?

AWS

4. What commands did you run? What is the simplest way to reproduce this issue?

kops rolling-update cluster <cluster-name> --instance-group <node-group-name> --yes --force

5. What happened after the commands executed?

Before running kops rolling-update, I ran a script in a loop to make HTTP requests to an NLB endpoint sitting in front of an echo server. At the point when kops rolling-update reported that it was stopping an instance, the HTTP requests started hanging and then recovered after ~90 seconds.

6. What did you expect to happen?

I expected the HTTP requests to continue to be handled successfully.

7. Please provide your cluster manifest. Execute kops get --name my.example.com -o yaml to display your cluster manifest. You may want to remove your cluster name and other sensitive information.

8. Please run the commands with most verbose logging by adding the -v 10 flag. Paste the logs into this report, or in a gist and provide the gist link here.

9. Anything else do we need to know?

I believe this issue is due to the fact that kops rolling-update detaches instances from their Auto Scaling groups without de-registering them from their NLB target groups first. My targetgroup has a de-registration delay of 90 seconds so this might explain the 90 second recovery time.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 27 (15 by maintainers)

Most upvoted comments

Partial fix in #11273

Filed kubernetes/website#27639

Found in Kubernetes 1.19 release notes:

Service load balancers no longer exclude nodes marked unschedulable from the candidate nodes. The service load balancer exclusion label should be used instead.

Users upgrading from 1.18 who have cordoned nodes should set the node.kubernetes.io/exclude-from-external-load-balancers label on the impacted nodes before upgrading if they wish those nodes to remain excluded from service load balancers. (#90823, @smarterclayton) [SIG Apps, Cloud Provider and Network]