kubernetes: EndpointSlice controller sync fails when Node is not found

What happened?

The following appears in EndpointSlice controller logs when a Node specified by pod.spec.nodeName is not present in the Node informer cache:

Error syncing endpoint slices for service "example/service", retrying: Error: node: "example-123" not found.

This is caused by https://github.com/kubernetes/kubernetes/blob/baad1caee9dea68c6f9de9c14b23cfe6a96a80f8/pkg/controller/endpointslice/reconciler.go#L174-L177

Node is used to populate zone, without it, that field would be empty for a given endpoint: https://github.com/kubernetes/kubernetes/blob/baad1caee9dea68c6f9de9c14b23cfe6a96a80f8/pkg/controller/endpointslice/utils.go#L73-L76

What did you expect to happen?

I’m not sure. My biggest concern here is that the controller could potentially get stuck indefinitely for a single Service if one or more of the Pods was orphaned. I have not seen it get stuck indefinitely, but I have seen the controller get stuck on a Service for ~6 minutes for this reason.

None of the alternative options seem great here. Some options:

  1. Drop endpoints that don’t have a valid node in cache
  2. Log but don’t return error if node is not found, resulting in endpoints lacking a zone
  3. Leave as is

With 1, we wouldn’t retrigger the Service so may miss adding the node info back whenever it was available. We’d also risk dropping all endpoints if there was a bug in our Node informer cache. With 2, we would break topology hints (it bails out if one or more endpoints is missing zone) and any other implementation that relies on or expects zone to be present. With 3, we risk the controller getting stuck on a Service indefinitely if it keeps on including orphaned Pods.

So with that said, this may not be a bug, and there may not be anything better we can do here, I just wanted to open this up for broader discussion in case anyone else had run into something similar or had other approaches we could take.

How can we reproduce it (as minimally and precisely as possible)?

Put a bogus value in pod.spec.nodeName.

Anything else we need to know?

No response

Kubernetes version

Applies to any running EndpointSlice controller, so likely 1.18+.

Cloud provider

Any

OS version

N/A

Install tools

N/A

Container runtime (CRI) and and version (if applicable)

N/A

Related plugins (CNI, CSI, …) and versions (if applicable)

N/A

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 37 (36 by maintainers)

Most upvoted comments

I think there are some exceptions where pod garbage collection does not happen immediately (cc @bobbypage).

pod garbage collection does not happen immediately - it will only run the when threshold is reached - terminatedPodThreshold. This defaults to 12500 in kube-controller-manager today. Pod GC will also happen for pods bound to a node object which no longer exists, so called “orphaned pods” xref.

heh, this is easy to repro with an unit test

  • Log but don’t return error if node is not found, resulting in endpoints lacking a zone

Since the Node object is only required for topology, I think that is more resilient to log and keep going, in an edge case, with current implementation, a Service with one endpoint will not work at all … Also, I see there is a handler for Node objects, so I think that when the Node event is received the controller will reconcile and set the topology information…

I thought in something like

diff --git a/pkg/controller/endpointslice/reconciler.go b/pkg/controller/endpointslice/reconciler.go
index 4756a0d3448..8aaab7b3a43 100644
--- a/pkg/controller/endpointslice/reconciler.go
+++ b/pkg/controller/endpointslice/reconciler.go
@@ -173,7 +173,7 @@ func (r *reconciler) reconcileByAddressType(service *corev1.Service, pods []*cor
 
                node, err := r.nodeLister.Get(pod.Spec.NodeName)
                if err != nil {
-                       return err
+                       klog.Warningf("Skipping topology information for pod %s: %v", pod.Namespace+"/"+pod.Name, err)
                }
                endpoint := podToEndpoint(pod, node, service, addressType)
                if len(endpoint.Addresses) > 0 {
diff --git a/pkg/controller/endpointslice/reconciler_test.go b/pkg/controller/endpointslice/reconciler_test.go
index 2e71419f352..b83ebedea63 100644
--- a/pkg/controller/endpointslice/reconciler_test.go
+++ b/pkg/controller/endpointslice/reconciler_test.go
@@ -460,6 +460,47 @@ func TestReconcile1Pod(t *testing.T) {
        }
 }
 
+// given an existing endpoint slice and 1 pods matching the service but no Node present on the informer cache
+// an slice should be created and log a message about the missing topology information
+func TestReconcileCreateEndpointSliceNodeMissing(t *testing.T) {
+       client := newClientset()
+       setupMetrics()
+       namespace := "test"
+       pods := []*corev1.Pod{newPod(1, namespace, true, 1, false)}
+       svc, _ := newServiceAndEndpointMeta("foo", namespace)
+
+       numActionsBefore := len(client.Actions())
+
+       r := newReconciler(client, []*corev1.Node{}, defaultMaxEndpointsPerSlice)
+       reconcileHelper(t, r, &svc, pods, []*discovery.EndpointSlice{}, time.Now())
+
+       assert.Len(t, client.Actions(), numActionsBefore+1, "Expected 1 additional clientset action")
+       actions := client.Actions()
+       assert.True(t, actions[numActionsBefore].Matches("create", "endpointslices"), "Action should be create endpoint slice")
+
+       slices := fetchEndpointSlices(t, client, namespace)
+
+       assert.Len(t, slices, 1, "Expected 1 endpoint slices")
+       assert.Regexp(t, "^"+svc.Name, slices[0].Name)
+       assert.Equal(t, svc.Name, slices[0].Labels[discovery.LabelServiceName])
+       protoTCP := corev1.ProtocolTCP
+       assert.EqualValues(t, []discovery.EndpointPort{{Name: utilpointer.StringPtr("foo"), Protocol: &protoTCP, Port: utilpointer.Int32Ptr(80)}}, slices[0].Ports)
+       expectedEndpoints := []discovery.Endpoint{{
+               Addresses:  []string{"1.2.3.5"},
+               Conditions: discovery.EndpointConditions{Ready: utilpointer.BoolPtr(true), Serving: utilpointer.BoolPtr(true), Terminating: utilpointer.BoolPtr(false)},
+               Zone:       nil,
+               NodeName:   utilpointer.StringPtr("node-1"),
+               TargetRef: &corev1.ObjectReference{
+                       Kind:      "Pod",
+                       Namespace: namespace,
+                       Name:      "pod1",
+               },
+       }}
+       assert.EqualValues(t, expectedEndpoints, slices[0].Endpoints)
+       expectTrackedGeneration(t, r.endpointSliceTracker, &slices[0], 1)
+       expectMetrics(t, expectedMetrics{desiredSlices: 1, actualSlices: 1, desiredEndpoints: 1, addedPerSync: 1, removedPerSync: 0, numCreated: 1, numUpdated: 0, numDeleted: 0, slicesChangedPerSync: 1})
+}
+

The following appears in EndpointSlice controller logs when a Node specified by pod.spec.nodeName is not present in the Node informer cache:

that is a bit surprising, isn’t it? for the pod to be created there had to be a node 🤔 I wonder if we have something else here