kubernetes: kube-controller-manager spamming errors in log

kubernetes version: 1.5.3 platform: aws deploy tool: kops

my master’s /var/log/kube-controller-manager.log

is receiving this type of log:

I0302 21:51:11.715415       6 node_status_updater.go:74] Could not update node status. Failed to find node "ip-172-20-114-85.ec2.internal" in NodeInformer cache. <nil>
I0302 21:51:11.815662       6 node_status_updater.go:74] Could not update node status. Failed to find node "ip-172-20-114-85.ec2.internal" in NodeInformer cache. <nil>
I0302 21:51:11.915905       6 node_status_updater.go:74] Could not update node status. Failed to find node "ip-172-20-114-85.ec2.internal" in NodeInformer cache. <nil>
I0302 21:51:12.016217       6 node_status_updater.go:74] Could not update node status. Failed to find node "ip-172-20-114-85.ec2.internal" in NodeInformer cache. <nil>
I0302 21:51:12.116508       6 node_status_updater.go:74] Could not update node status. Failed to find node "ip-172-20-114-85.ec2.internal" in NodeInformer cache. <nil>
I0302 21:51:12.216725       6 node_status_updater.go:74] Could not update node status. Failed to find node "ip-172-20-114-85.ec2.internal" in NodeInformer cache. <nil>
I0302 21:51:12.316962       6 node_status_updater.go:74] Could not update node status ...

lots of times per second, this node “ip-172-20-114-85.ec2.internal” does not exist in the cluster anymore and is not shown if I kubectl get nodes

shouldn’t it be removed from status_updater?

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 12
  • Comments: 24 (12 by maintainers)

Commits related to this issue

Most upvoted comments

The NodeInformer cache is actually reporting the correct state. The problem is that inside the node status updater, there is another data structure keeping track of nodes that require a status update (see ActualStateOfWorld.GetVolumesToReportAttached(), called inside UpdateNodeStatuses() in node_status_updater.go). The entry for the deleted node is never removed from the data structure; rather, the updater is set to try to update the dead node again next time, which leads to the message being logged every 100ms.

The solution is to remove the corresponding node entry once the node is deleted. A fix is on its way.

Here you go, my apologies.

@verult I think that’s how the bug surfaced for me. When I force killed my node we had just started running a StatefulSet with an attached persistent volume, and I think there was one running on the very node I killed.

@verult Let’s make sure the fix also gets ported back to all affected branches (1.6, 1.5, and 1.4).

CC @kubernetes/sig-storage-bugs

correct