kubernetes: kube-controller-manager spamming errors in log
kubernetes version: 1.5.3 platform: aws deploy tool: kops
my master’s /var/log/kube-controller-manager.log
is receiving this type of log:
I0302 21:51:11.715415 6 node_status_updater.go:74] Could not update node status. Failed to find node "ip-172-20-114-85.ec2.internal" in NodeInformer cache. <nil>
I0302 21:51:11.815662 6 node_status_updater.go:74] Could not update node status. Failed to find node "ip-172-20-114-85.ec2.internal" in NodeInformer cache. <nil>
I0302 21:51:11.915905 6 node_status_updater.go:74] Could not update node status. Failed to find node "ip-172-20-114-85.ec2.internal" in NodeInformer cache. <nil>
I0302 21:51:12.016217 6 node_status_updater.go:74] Could not update node status. Failed to find node "ip-172-20-114-85.ec2.internal" in NodeInformer cache. <nil>
I0302 21:51:12.116508 6 node_status_updater.go:74] Could not update node status. Failed to find node "ip-172-20-114-85.ec2.internal" in NodeInformer cache. <nil>
I0302 21:51:12.216725 6 node_status_updater.go:74] Could not update node status. Failed to find node "ip-172-20-114-85.ec2.internal" in NodeInformer cache. <nil>
I0302 21:51:12.316962 6 node_status_updater.go:74] Could not update node status ...
lots of times per second, this node “ip-172-20-114-85.ec2.internal” does not exist in the cluster anymore and is not shown if I kubectl get nodes
shouldn’t it be removed from status_updater?
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 12
- Comments: 24 (12 by maintainers)
Commits related to this issue
- Node status updater now deletes the node entry in attach updates when node is missing in NodeInformer cache. Fixes #42438. - Added RemoveNodeFromAttachUpdates as part of node status updater operation... — committed to verult/kubernetes by verult 7 years ago
- Node status updater now deletes the node entry in attach updates when node is missing in NodeInformer cache. Fixes #42438. - Added RemoveNodeFromAttachUpdates as part of node status updater operation... — committed to verult/kubernetes by verult 7 years ago
- Merge pull request #45923 from verult/cxing/NodeStatusUpdaterFix Automatic merge from submit-queue (batch tested with PRs 46383, 45645, 45923, 44884, 46294) Node status updater now deletes the node ... — committed to smarterclayton/kubernetes by deleted user 7 years ago
- Node status updater now deletes the node entry in attach updates when node is missing in NodeInformer cache. Fixes #42438. - Added RemoveNodeFromAttachUpdates as part of node status updater operation... — committed to huawei-cloudnative/kubernetes by verult 7 years ago
- Node status updater now deletes the node entry in attach updates when node is missing in NodeInformer cache. Fixes #42438. - Added RemoveNodeFromAttachUpdates as part of node status updater operation... — committed to verult/kubernetes by verult 7 years ago
- Node status updater now deletes the node entry in attach updates when node is missing in NodeInformer cache. Fixes #42438. - Added RemoveNodeFromAttachUpdates as part of node status updater operation... — committed to verult/kubernetes by verult 7 years ago
- Merge pull request #47007 from verult/NodeStatusUpdaterFix-1.4 Automatic merge from submit-queue Node status updater now deletes the node entry in attach updates when… - Added RemoveNodeFromAttachU... — committed to kubernetes/kubernetes by deleted user 7 years ago
- Merge pull request #46301 from verult/NodeStatusUpdaterFix-1.5 Automatic merge from submit-queue Node status updater now deletes the node entry in attach updates when node is missing in NodeInformer... — committed to kubernetes/kubernetes by deleted user 7 years ago
- Merge pull request #50806 from verult/VolumeNotYetAttached Automatic merge from submit-queue (batch tested with PRs 50806, 48789, 49922, 49935, 50438) On AttachDetachController node status update, d... — committed to kubernetes/kubernetes by deleted user 7 years ago
The NodeInformer cache is actually reporting the correct state. The problem is that inside the node status updater, there is another data structure keeping track of nodes that require a status update (see ActualStateOfWorld.GetVolumesToReportAttached(), called inside UpdateNodeStatuses() in node_status_updater.go). The entry for the deleted node is never removed from the data structure; rather, the updater is set to try to update the dead node again next time, which leads to the message being logged every 100ms.
The solution is to remove the corresponding node entry once the node is deleted. A fix is on its way.
Here you go, my apologies.
@verult I think that’s how the bug surfaced for me. When I force killed my node we had just started running a StatefulSet with an attached persistent volume, and I think there was one running on the very node I killed.
@verult Let’s make sure the fix also gets ported back to all affected branches (1.6, 1.5, and 1.4).
CC @kubernetes/sig-storage-bugs
correct