kubernetes: Starting in 1.25 clusters, services of type=LB and xTP=Local sometimes does not update node backends on load balancers

What happened?

When upgrading nodes from 1.24 to 1.25, on a cluster where master is already at 1.25, I notice that my Service type=LoadBalancer and xTP=Local have an incorrect set of nodes after the nodes have been upgraded. The set contains only the old nodes that no longer exist resulting my service being unavailable through my load balancer.

What did you expect to happen?

I would expect that the load balancer to be properly updated with the new set of nodes after the upgrade.

How can we reproduce it (as minimally and precisely as possible)?

  1. Create 1.25 Cluster with 1.24 nodes
  2. Deploy a service of type=LoadBalancer and xTP=Local
  3. Upgrade the 1.24 nodes to 1.25
  4. After upgrade is finished look at the node list for the Load Balancer.

Anything else we need to know?

The existing logging is not enough to diagnose the issue. I added some more logs and ran the KCM at log level =5 to find the root cause.

There was a change introduced to reduce the number syncs for xTP=Local services: #109706. With this change, there are situations where the xTP=Local never gets updated.

The following is the chain of events.

  1. Node is created or deleted and causes triggerNodeSync():

https://github.com/kubernetes/kubernetes/blob/a866cbe2e5bbaa01cfd5e969aa3e033f3282a8a2/staging/src/k8s.io/cloud-provider/controllers/service/controller.go#L169-L192

  1. Inside nodeTriggerSync(), the nodeLister (line 264) filters for Ready only nodes so it does not have the new node or still contains the deleted node, which means that c.needFullSync = false when line 281 is executed.

https://github.com/kubernetes/kubernetes/blob/a866cbe2e5bbaa01cfd5e969aa3e033f3282a8a2/staging/src/k8s.io/cloud-provider/controllers/service/controller.go#L260-L288

  1. Following down the chain of functions that are called (across goroutines communicating with nodeSyncCh), we end up at nodeSyncInternal. Because c.needFullSync = false, we will only do a sync of services that were marked for retry. If the state previously was good, this means c.servicesToUpdate has 0 services before entering updateLoadBalancerHosts. https://github.com/kubernetes/kubernetes/blob/a866cbe2e5bbaa01cfd5e969aa3e033f3282a8a2/staging/src/k8s.io/cloud-provider/controllers/service/controller.go#L725-L741

  2. Nodes are queried again from the NodeLister. But this time the new node or the deleted node is reflected. https://github.com/kubernetes/kubernetes/blob/a866cbe2e5bbaa01cfd5e969aa3e033f3282a8a2/staging/src/k8s.io/cloud-provider/controllers/service/controller.go#L782-L811

  3. nodeSyncService is then parallelized based on the length of services. In this case we have no services, so we do no updates but on line 808, c.lastSyncedNodes is set to the nodes found in step 4. https://github.com/kubernetes/kubernetes/blob/a866cbe2e5bbaa01cfd5e969aa3e033f3282a8a2/staging/src/k8s.io/cloud-provider/controllers/service/controller.go#L808

A subsequent full sync does not fix things. We go through steps 1 through 5 again, the difference being c.needFullSync=true. This will mean that in step 4 & 5, c.servicesToUpdate will not be empty resulting in the following:

  1. inside nodeSyncService we filter based on predicates for the xTP=Local and xTP=Cluster. In the case of xTP=Local since we do not pay attention to Ready status, all of the nodes in the c.lastSyncedNodes will be the oldNodes. And since no node creations or deletions have occurred, all the newNodes will be the same. This results in no sync (line 767).

https://github.com/kubernetes/kubernetes/blob/a866cbe2e5bbaa01cfd5e969aa3e033f3282a8a2/staging/src/k8s.io/cloud-provider/controllers/service/controller.go#L759-L777

In the the xTP=Cluster, the node ready status is used to filter the nodes, so c.LastSyncedNodes would already have the newly created node, but it would not be ready. Which means the node will not be part of oldNodes but it will exist in newNodes and allows the sync to continue as expected.

Kubernetes version

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.2", GitCommit:"5835544ca568b757a8ecae5c153f317e5736700e", GitTreeState:"clean", BuildDate:"2022-09-21T14:33:49Z", GoVersion:"go1.19.1", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.2-gke.300", GitCommit:"6f9a8e57036ff71785ef9c90998437413a3a8ff5", GitTreeState:"clean", BuildDate:"2022-09-26T09:26:16Z", GoVersion:"go1.19.1 X:boringcrypto", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider

Tested on GKE, however this should theoretically be possible on any 1.25+ cluster.

OS version

N/A

Install tools

N/A

Container runtime (CRI) and version (if applicable)

N/A

Related plugins (CNI, CSI, …) and versions (if applicable)

N/A

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 15 (15 by maintainers)

Commits related to this issue

Most upvoted comments

I have done a test with code on master and looked through the changes that have been made since 1.25 was cut and this issue seems to be fixed. I believe this issue is specific only to 1.25