kubernetes: node is not ready because of kubelet report a meaningful conflict error

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug

What happened:

kubelet report an error.kubectl get node is not ready there is a meaningful conflict (firstResourceVersion: "104201", currentResourceVersion: "4293"): diff1={"metadata":{"resourceVersion":"4293"},"status":{"conditions":[{"lastHeartbeatTime":"2018-01-03T07:38:24Z","lastTransitionTime":"2018-01-03T07:42:59Z","message":"Kubelet stopped posting node status.","reason":"NodeStatusUnknown","status":"Unknown","type":"DiskPressure"},{"lastHeartbeatTime":"2018-01-03T07:38:24Z","lastTransitionTime":"2018-01-03T07:42:59Z","message":"Kubelet stopped posting node status.","reason":"NodeStatusUnknown","status":"Unknown","type":"MemoryPressure"},{"lastHeartbeatTime":"2018-01-03T07:38:24Z","lastTransitionTime":"2018-01-03T07:42:59Z","message":"Kubelet stopped posting node status.","reason":"NodeStatusUnknown","status":"Unknown","type":"OutOfDisk"},{"lastHeartbeatTime":"2018-01-03T07:38:24Z","lastTransitionTime":"2018-01-03T07:42:59Z","message":"Kubelet stopped posting node status.","reason":"NodeStatusUnknown","status":"Unknown","type":"Ready"}]}} , diff2={"status":{"conditions":[{"lastHeartbeatTime":"2018-01-04T09:31:09Z","lastTransitionTime":"2018-01-04T09:31:09Z","message":"kubelet has no disk pressure","reason":"KubeletHasNoDiskPressure","status":"False","type":"DiskPressure"},{"lastHeartbeatTime":"2018-01-04T09:31:09Z","lastTransitionTime":"2018-01-04T09:31:09Z","message":"kubelet has sufficient memory available","reason":"KubeletHasSufficientMemory","status":"False","type":"MemoryPressure"},{"lastHeartbeatTime":"2018-01-04T09:31:09Z","lastTransitionTime":"2018-01-04T09:31:09Z","message":"kubelet has sufficient disk space available","reason":"KubeletHasSufficientDisk","status":"False","type":"OutOfDisk"},{"lastHeartbeatTime":"2018-01-04T09:31:09Z","lastTransitionTime":"2018-01-04T09:31:09Z","message":"kubelet is posting ready status","reason":"KubeletReady","status":"True","type":"Ready"}],"nodeInfo":{"gpus":[]}}} E0104 17:31:09.779522 7223 kubelet_node_status.go:318] Unable to update node status: update node status exceeds retry count

What you expected to happen:

I found a PR about this issue, #44788 it has been picked to 1.6. I want to know why this issue still happend.

How to reproduce it (as minimally and precisely as possible):

this issue is incidental, bug I found an issue still meet this problem #52498 he solves it by changing etcd 3.1.10 instead of 3.2.7. when I change the leader of etcd cluster, this issue will be gone. is this a known incompatibility between k8s and etcd or a etcd bug?

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version): 1.6.9
  • Etcd version: 3.0.17

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 1
  • Comments: 18 (14 by maintainers)

Commits related to this issue

Most upvoted comments

I’ve seen several of these bug reports, and am having one myself, but they all get ignored.

Here is the work-around to restore the node:

  1. SSH onto the affected node (somehow)
  2. Stop the kubelet (systemctl stop kubelet)
  3. Delete the node from Kubernetes kubectl delete nodes <node-name>
  4. Restart the kubelet, it will re-register itself and clear the conflict.

I still think this is a bug in the kubelet though, I’m going to investigate that code.