kubernetes: Sporadic "too old resource version" errors from master pods

We see sporadic “too old resource version” errors from the master pods.

(We understand from http://stackoverflow.com/questions/34322969/cause-of-apiserver-received-an-error-that-is-not-an-unversioned-errors-from-ku/34330607 that this could be expected behavior during an upgrade – can we get confirmation of that, and will those be changed from “error” level to “warning” level at any point?)

Our bigger concern is that these errors continue to occur – not constantly, but routinely, anywhere from hours to minutes between them.

Is there something we need to perform in order to alleviate this error spam?

Specifically, when we restart our HA masters (eg for an upgrade), we delete the master nodes, regenerate pod yamls, restart hyperkube, then patch the master nodes to be unschedulable:

kubectl --server=https://10.1.4.41:6443 --kubeconfig=... delete node sea5m1kmaster1
node "sea5m1kmaster1" deleted

kubectl --server=https://10.1.4.41:6443 --kubeconfig=... delete node sea5m1kmaster2
node "sea5m1kmaster2" deleted

# generate fresh yamls for apiserver / podmaster / scheduler / controller manager
# restart hyperkube

kubectl --server=https://10.1.4.41:6443 --kubeconfig=... patch node sea5m1kmaster1 -p {\"spec\":{\"unschedulable\":true}}
"sea5m1kmaster1" patched

kubectl --server=https://10.1.4.41:6443 --kubeconfig=... patch node sea5m1kmaster2 -p {\"spec\":{\"unschedulable\":true}}
"sea5m1kmaster2" patched

Every now and then, unrelated to any restarts or other visible errors, we get 1-2 of these errors on our masters:

2016-02-25T18:42:48.7850+00:00 tuk6r2kmaster2 [err] [docker] E0225 18:42:48.785758       1 errors.go:62] apiserver received an error that is not an unversioned.Status: too old resource version: 1785219 (1787027)
2016-02-25T18:42:48.7870+00:00 tuk6r2kmaster2 [err] [docker] E0225 18:42:48.787036       1 reflector.go:227] /usr/src/go/src/runtime/asm_amd64.s:2232: Failed to watch *api.Pod: too old resource version: 1785219 (1787027)
2016-02-25T18:46:00.7180+00:00 tuk6r2kmaster2 [err] [docker] E0225 18:46:00.718118       1 errors.go:62] apiserver received an error that is not an unversioned.Status: too old resource version: 1785219 (1787027)
2016-02-25T18:46:00.6850+00:00 tuk6r2kmaster1 [err] [docker] E0225 18:46:00.685606       1 errors.go:62] apiserver received an error that is not an unversioned.Status: too old resource version: 1785219 (1787027)
2016-02-25T18:46:00.7880+00:00 tuk6r2kmaster1 [err] [kubelet] E0225 18:46:00.716662    3999 reflector.go:227] pkg/kubelet/config/apiserver.go:43: Failed to watch *api.Pod: too old resource version: 1785219 (1787027)
2016-02-25T18:46:01.1700+00:00 tuk6r2kmaster2 [err] [kubelet] E0225 18:46:00.688601   62642 reflector.go:227] pkg/kubelet/config/apiserver.go:43: Failed to watch *api.Pod: too old resource version: 1785219 (1787027)

Is this indicative of a problem, perhaps in our upgrade flow? Or is this expected routine log behavior?

Version Info:

$ kubectl --server=https://10.1.4.41:6443 --... version
Client Version: version.Info{Major:"1", Minor:"1", GitVersion:"v1.1.3", GitCommit:"6a81b50c7e97bbe0ade075de55ab4fa34f049dc2", GitTreeState:"clean"}
Server Version: version.Info{Major:"1", Minor:"1", GitVersion:"v1.1.3", GitCommit:"6a81b50c7e97bbe0ade075de55ab4fa34f049dc2", GitTreeState:"clean"}

About this issue

Original URL
State: closed
Created 8 years ago
Comments: 46 (23 by maintainers)

Commits related to this issue

Merge pull request #22024 from gnufied/backport-cinder-attach-limit UPSTREAM: 72980: Fix Cinder volume limits Origin-commit: 11407b139d8dcf9ecfe6aa382640f92051ecc680 — committed to openshift/kubernetes by k8s-publishing-bot 5 years ago

Most upvoted comments

@lavalamp First and foremost, many thanks for your help and attention!

That being said, will this still be an “error”-level log, or will it be lowered to “warning/info/etc”-level, since it’s expected behavior?

The problem with ignoring errors (quotes deliberately omitted) in Kubernetes log files that is that it’s basically recommending to put tape over the engine warning lights.

For instance, one big problem is that the symptom may not always be immediately apparent – they don’t all result in pods dying. (For instance, routing misconfiguration can cause “kubectl log” to fail even though scheduling works fine, which can be a nasty surprise once someone tries to check logs in prod… node advertise-address misconfiguration can result in etcd thrashing and steady error logs without technically “breaking” anything…) Kubernetes does self-heal errors, but when you see the same error occurring steadily and always, that’s generally a sign that you’re looking at something that isn’t working properly, not simply a random error that Kubernetes recovered from.

Logging something as “error”-level means that you’re concerned about it, which means that I’M concerned about it too.

Waiting until something breaks heavily enough to get a phone call in the middle of the night, or until some new service pod runs into the previously-subtle symptom and fails a deployment – that’s not a sustainable approach.

(Also, as you get more adoption by companies using Kubernetes clusters in prod environments, you’re going to have more QA & prod requirements that any and all error logs are investigated fully to completion – saying “we don’t see anything obviously broken” isn’t sufficient for us to close our tickets, whether we personally suspect that we don’t need to worry about it or not.)

+17

DreadPirateShawn on Mar 11, 2016

If it’s something we don’t have to act upon and we don’t want to be polluted, I would go even further and push that to debug level. Honestly, I would love not to see it at all. All our Prometheus are spitting that out every 3/5 seconds and that means $$$ in stackdriver bills on GKE 😦

+15

gmauleon on Aug 5, 2019

These messages creates noise in the logs with warn severity. If you make it informational that will be right, as this is just a “normal course of operations”. Thanks!

+15

OlegPS on Apr 15, 2019

I’m debugging an issue in my cluster and the only warning in my logs leads me to this rabbit hole here. Me too, another thing is that this message if it’s a normal and expected should not be leveled as "W"arning but "I"nfo

fvigotti on Apr 26, 2020

Yes, I’m totally open to the message being improved.

lavalamp on Apr 8, 2020

@connatix-cradulescu this is perfectly expected, no worries. The messages are several hours apart.

When nothing happens in your cluster, the watches established by the Kubernetes client don’t get a chance to get refreshed naturally, and eventually time out. These messages simply indicate that these watches are being re-created.

antoineco on Feb 27, 2020

please is this going to be fixed? My operator is failing on this and it restarts constantly because it is trying to write to a file in /tmp and it says it cant because the file it tries to write it to does not exist.

smiklosovic on Jan 9, 2020

Spotted in v1.10.12

grzesuav on Mar 28, 2019

Yeah, those are multiple minutes apart, completely expected behavior. Also, almost certainly exercising the case that the previously mentioned Bookmarks will optimize.

lavalamp on Jan 30, 2020