kubernetes: FailedToUpdateEndpointSlices Error updating Endpoint Slices for Service

What happened: Hi Folks, after every deployment we see this for about an hour. Seems to be harmless but wondering if this is a bug in v1.17.3 kubectl describe svc my-svc

Events:
  Type     Reason                        Age   From                       Message
  ----     ------                        ----  ----                       -------
  Warning  FailedToUpdateEndpointSlices  35m   endpoint-slice-controller  Error updating Endpoint Slices for Service my-svc/my-app: Error updating my-app-h7q6v EndpointSlice for Service my-svc/my-app: Operation cannot be fulfilled on endpointslices.discovery.k8s.io "my-app-h7q6v": the object has been modified; please apply your changes to the latest version and try again

What you expected to happen: Events: <none>

How to reproduce it (as minimally and precisely as possible): kubectl rollout restart deployment my_deploy

Anything else we need to know?:

Environment: Server Version: version.Info{Major:“1”, Minor:“17”, GitVersion:“v1.17.3”, GitCommit:“06ad960bfd03b39c8310aaf92d1e7c12ce618213”, GitTreeState:“archive”, BuildDate:“2020-03-20T16:41:14Z”, GoVersion:“go1.13.4”, Compiler:“gc”, Platform:“linux/amd64”}

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 6
  • Comments: 29 (16 by maintainers)

Most upvoted comments

Hey @jijojv, thanks for reporting this! This is not actually anything to worry about and I think the best solution will be for us to stop publishing that event if the error is related to an out of date cache like this. Due to the nature of the controller reacting to changes in Services and attempting to update related EndpointSlices, it can run into problems if the locally cached copy of EndpointSlices it has is out of date. It will naturally retry and resolve the issue when the cache updates. I’ll work on a fix here to lower the logging and see if there are some ways to reduce the probability of this happening.

/remove-triage unresolved

I’m currently faced with the same error, but it’s not after every deployment. The error suddenly shows up hours after deployment is done. There are some pods that emit DNS resolution error logs within 10s after the FailedToUpdateEndpointSlices emitted. Example log emitted:

caused by: Post "https://sts.ap-southeast-1.amazonaws.com/": dial tcp: lookup sts.ap-southeast-1.amazonaws.com: i/o timeout

Is this related or a different issue? This is on v1.23 EKS.

I am still getting this error.

Hey @ltagliamonte-dd, unfortunately we can no longer patch v1.18 so the mitigation for this only made it back as far as 1.19.

@drawn4427 what version of Kubernetes are you using? For reference, the oldest version of Kubernetes that got this patch was v1.19.9.