kubernetes: kubelet fails to heartbeat with API server with stuck TCP connections
Is this a BUG REPORT or FEATURE REQUEST?: /kind bug
What happened:
operator is running an HA master setup with a LB in front. kubelet attempts to update node status, but tryUpdateNodeStatus
wedges. based on the goroutine dump, the wedge happens when it attempts to GET the latest state of the node from the master. operator observed 15 minute intervals between attempts to update node status when kubelet could not contact master. assume this is when the LB ultimately closes the connection. the impact is that node controller then marked node as lost, and workload was evicted.
What you expected to happen: expected the kubelet to timeout client-side. right now, no kubelet->master communication has a timeout. ideally, the kubelet -> master communication would have a timeout based on the configuration of the node-status-update-frequency so that no single attempt to update status wedges future attempts.
How to reproduce it (as minimally and precisely as possible): see above.
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 11
- Comments: 36 (25 by maintainers)
Links to this issue
Commits related to this issue
- Merge pull request #52176 from liggitt/heartbeat-timeout Automatic merge from submit-queue (batch tested with PRs 52176, 43152). If you want to cherry-pick this change to another branch, please follo... — committed to kubernetes/kubernetes by deleted user 7 years ago
- Switch master ELB to NLB Primarily to mitigate kubernetes issue with kubeletes losing connection to masters ( ~06:20 in the morning for us ), and after a period ejecting pods, until that connection i... — committed to utilitywarehouse/tf_kube_aws by george-angel 6 years ago
- Switch master ELB to NLB Primarily to mitigate kubernetes issue with kubeletes losing connection to masters ( ~06:20 in the morning for us ), and after a period ejecting pods, until that connection i... — committed to utilitywarehouse/tf_kube_aws by george-angel 6 years ago
- Switch master ELB to NLB Primarily to mitigate kubernetes issue with kubeletes losing connection to masters ( ~06:20 in the morning for us ), and after a period ejecting pods, until that connection i... — committed to utilitywarehouse/tf_kube_aws by george-angel 6 years ago
- Switch master ELB to NLB Primarily to mitigate kubernetes issue with kubeletes losing connection to masters ( ~06:20 in the morning for us ), and after a period ejecting pods, until that connection i... — committed to utilitywarehouse/tf_kube_aws by george-angel 6 years ago
- Merge pull request #63492 from liggitt/node-heartbeat-close-connections Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a ... — committed to liggitt/kubernetes by deleted user 6 years ago
Indeed: as far as I understand, the behaviour is not undefined, it’s just defined in Linux rather than in Go. I think the Go docs could be clearer on this. Here’s the relevant section from
dup(2)
:My code doesn’t modify flags after obtaining the fd, instead its only use is in a call to
setsockopt(2)
. The docs for that call are fairly clear that it modifies properties of the socket referred to by the descriptor, not the descriptor itself:I agree that the original descriptor being set to blocking mode is annoying. Go’s code is clear that this will not prevent anything from working, just that more OS threads may be required for I/O:
https://github.com/golang/go/blob/516f5ccf57560ed402cdae28a36e1dc9e81444c3/src/net/fd_unix.go#L313-L315
Given that a single Kubelet (or otherwise use of client-go) establishes a small number of long-lived connections to the apiservers, and that this will be fixed in Go 1.11, I don’t think this is a significant issue.
I am happy for this to be fixed in another way, but given we know that this works and does not require invasive changes to the apiserver to achieve, I think it is a reasonable solution. I have heard from several production users of Kubernetes that this has bitten them in the same way it bit us.
@derekwaynecarr @liggitt #52176 doesn’t resolve this issue. See: https://github.com/kubernetes/kubernetes/pull/48670#issuecomment-352257836, https://github.com/kubernetes/kubernetes/pull/52176#issuecomment-349647651, and https://github.com/kubernetes/kubernetes/pull/52176#issuecomment-353319760.
Since this issue has been re-opened, would there be any value in me re-opening my PR for this commit? Monzo has been running this patch in production since last July and it has eliminated this problem entirely, for all uses of
client-go
.We’ve had three major events in the last few weeks that comes down to this problem. Watches set up through an elb node that gets replaced or scaled down cause large numbers of nodes to go not ready for 15 minutes causing very scary cluster turbulence. ( We’ve generally seen between a third to half the nodes go not ready ). We’re currently evaluating other ways to load balance the api servers for the components we currently send through the elb ( I haven’t poured through everything but I think that boils down to the kubelet and the proxy (possibly flannel) ).
one issue at a time 😃
persistent kubelet heartbeat failure results in all workloads being evicted. kube-proxy network issues are disruptive for some workloads, but not necessarily all
kube-proxy (and general client-go support) would need a different mechanism, since those components do not heartbeat with the api like the kubelet does. I’d recommend spawning a separate issue for kube-proxy handling of this condition.
This regressed, and was refixed in 1.14.3
See https://github.com/kubernetes/kubernetes/pull/78016
Few notes on these very valid concerns: