kubernetes: Nodes status is NotReady
I have a cluster and over the weekend some of the pods went to pending status, this caused me to find that the nodes are NotReady
state. They were all Ready
last week. I’m not sure how to debug what happened or remove the broken nodes / fix the issue.
kc get nodes
gke-cluster-1-default-pool-777adf16-an5j Ready 4d
gke-cluster-1-default-pool-777adf16-erra NotReady 4d
gke-cluster-1-default-pool-777adf16-ge2r Ready 4d
gke-cluster-1-default-pool-777adf16-t2aj Ready 4d
gke-cluster-1-default-pool-777adf16-vvhx Ready 4d
gke-cluster-1-default-pool-777adf16-w20k NotReady 4d
gke-cluster-1-default-pool-777adf16-wib8 Ready 4d
gke-cluster-1-default-pool-777adf16-wizq Ready 4d
gke-cluster-1-default-pool-777adf16-wteu Ready 4d
gke-cluster-1-default-pool-777adf16-x07o Ready 4d
gke-cluster-1-default-pool-777adf16-xhfh Ready 4d
gke-cluster-1-default-pool-777adf16-xsix NotReady 4d
gke-cluster-1-default-pool-777adf16-y98j NotReady 4d
gke-cluster-1-default-pool-777adf16-yjxa Ready 4d
gke-cluster-1-default-pool-777adf16-z7cz Ready 4d
gke-cluster-1-default-pool-777adf16-z8cn Ready 4d
Kubernetes version (use kubectl version
):
Client Version: version.Info{Major:"1", Minor:"3", GitVersion:"v1.3.3", GitCommit:"c6411395e09da356c608896d3d9725acab821418", GitTreeState:"clean", BuildDate:"2016-07-22T20:29:38Z", GoVersion:"go1.6.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"3", GitVersion:"v1.3.5", GitCommit:"b0deb2eb8f4037421077f77cb163dbb4c0a2a9f5", GitTreeState:"clean", BuildDate:"2016-08-11T20:21:58Z", GoVersion:"go1.6.2", Compiler:"gc", Platform:"linux/amd64"}
Environment:
Google cloud platform container engine
What happened:
Nodes went offline
What you expected to happen:
Nodes to stay online
How to reproduce it (as minimally and precisely as possible):
Unsure
Anything else do we need to know:
The pods that were on those nodes moved to Pending status:
some-api-2437792557-nm63f 0/1 Pending 0 1d
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Reactions: 3
- Comments: 23 (5 by maintainers)
I am running into this issue with my GKE clusters.
I think I solved it TLDR I’m overloading the cluster’s CPU (using smallest node size!!)
Environment
My issues
NotReady
status according tokubectl get nodes
kubectl describe node
on aNotReady
node I get theKubelet stopped posting node status
(the same link appeared in this comment: https://github.com/kubernetes/kubernetes/issues/32522#issuecomment-246507657)Diagnostics
kubectl describe node <NotReady node>
NotReady
status node, and ransudo journalctl -u kubelet --all | tail
NotReady
node VM instance. Found this interesting message: