kubernetes: kube-controller-manager spamming errors in log

kubernetes version: 1.5.3 platform: aws deploy tool: kops

my master’s /var/log/kube-controller-manager.log

is receiving this type of log:

I0302 21:51:11.715415       6 node_status_updater.go:74] Could not update node status. Failed to find node "ip-172-20-114-85.ec2.internal" in NodeInformer cache. <nil>
I0302 21:51:11.815662       6 node_status_updater.go:74] Could not update node status. Failed to find node "ip-172-20-114-85.ec2.internal" in NodeInformer cache. <nil>
I0302 21:51:11.915905       6 node_status_updater.go:74] Could not update node status. Failed to find node "ip-172-20-114-85.ec2.internal" in NodeInformer cache. <nil>
I0302 21:51:12.016217       6 node_status_updater.go:74] Could not update node status. Failed to find node "ip-172-20-114-85.ec2.internal" in NodeInformer cache. <nil>
I0302 21:51:12.116508       6 node_status_updater.go:74] Could not update node status. Failed to find node "ip-172-20-114-85.ec2.internal" in NodeInformer cache. <nil>
I0302 21:51:12.216725       6 node_status_updater.go:74] Could not update node status. Failed to find node "ip-172-20-114-85.ec2.internal" in NodeInformer cache. <nil>
I0302 21:51:12.316962       6 node_status_updater.go:74] Could not update node status ...

lots of times per second, this node “ip-172-20-114-85.ec2.internal” does not exist in the cluster anymore and is not shown if I kubectl get nodes

shouldn’t it be removed from status_updater?

About this issue

Original URL
State: closed
Created 7 years ago
Reactions: 12
Comments: 24 (12 by maintainers)

Commits related to this issue

Node status updater now deletes the node entry in attach updates when node is missing in NodeInformer cache. Fixes #42438. - Added RemoveNodeFromAttachUpdates as part of node status updater operation... — committed to verult/kubernetes by verult 7 years ago
Node status updater now deletes the node entry in attach updates when node is missing in NodeInformer cache. Fixes #42438. - Added RemoveNodeFromAttachUpdates as part of node status updater operation... — committed to verult/kubernetes by verult 7 years ago
Merge pull request #45923 from verult/cxing/NodeStatusUpdaterFix Automatic merge from submit-queue (batch tested with PRs 46383, 45645, 45923, 44884, 46294) Node status updater now deletes the node ... — committed to smarterclayton/kubernetes by deleted user 7 years ago
Node status updater now deletes the node entry in attach updates when node is missing in NodeInformer cache. Fixes #42438. - Added RemoveNodeFromAttachUpdates as part of node status updater operation... — committed to huawei-cloudnative/kubernetes by verult 7 years ago
Node status updater now deletes the node entry in attach updates when node is missing in NodeInformer cache. Fixes #42438. - Added RemoveNodeFromAttachUpdates as part of node status updater operation... — committed to verult/kubernetes by verult 7 years ago
Node status updater now deletes the node entry in attach updates when node is missing in NodeInformer cache. Fixes #42438. - Added RemoveNodeFromAttachUpdates as part of node status updater operation... — committed to verult/kubernetes by verult 7 years ago
Merge pull request #47007 from verult/NodeStatusUpdaterFix-1.4 Automatic merge from submit-queue Node status updater now deletes the node entry in attach updates when… - Added RemoveNodeFromAttachU... — committed to kubernetes/kubernetes by deleted user 7 years ago
Merge pull request #46301 from verult/NodeStatusUpdaterFix-1.5 Automatic merge from submit-queue Node status updater now deletes the node entry in attach updates when node is missing in NodeInformer... — committed to kubernetes/kubernetes by deleted user 7 years ago
Merge pull request #50806 from verult/VolumeNotYetAttached Automatic merge from submit-queue (batch tested with PRs 50806, 48789, 49922, 49935, 50438) On AttachDetachController node status update, d... — committed to kubernetes/kubernetes by deleted user 7 years ago

Most upvoted comments

The NodeInformer cache is actually reporting the correct state. The problem is that inside the node status updater, there is another data structure keeping track of nodes that require a status update (see ActualStateOfWorld.GetVolumesToReportAttached(), called inside UpdateNodeStatuses() in node_status_updater.go). The entry for the deleted node is never removed from the data structure; rather, the updater is set to try to update the dead node again next time, which leads to the message being logged every 100ms.

The solution is to remove the corresponding node entry once the node is deleted. A fix is on its way.

verult on May 17, 2017

Here you go, my apologies.

verult on Jun 6, 2017

@verult I think that’s how the bug surfaced for me. When I force killed my node we had just started running a StatefulSet with an attached persistent volume, and I think there was one running on the very node I killed.

steinnes on May 11, 2017

@verult Let’s make sure the fix also gets ported back to all affected branches (1.6, 1.5, and 1.4).

CC @kubernetes/sig-storage-bugs

saad-ali on May 10, 2017

correct

blakebarnett on Apr 5, 2017