kubernetes: Node `NotReady` status with "Kubelet stopped posting node status error"

On k8s 1.4 and used kubeadm to provision cluster:

I have node and master on same server. Suddenly by node is posting a NotReady status. Running a

# kubectl describe node <NODE>

returns

Name:                   operate
Labels:                 beta.kubernetes.io/arch=amd64
                        beta.kubernetes.io/os=linux
                        kubeadm.alpha.kubernetes.io/role=master
                        kubernetes.io/hostname=operate
Taints:                 <none>
CreationTimestamp:      Thu, 06 Oct 2016 23:57:52 +0000
Phase:
Conditions:
  Type                  Status          LastHeartbeatTime                       LastTransitionTime                      Reason           Message
  ----                  ------          -----------------                       ------------------                      ------           -------
  OutOfDisk             Unknown         Fri, 07 Oct 2016 08:13:50 +0000         Fri, 07 Oct 2016 08:14:30 +0000         NodeStatusUnknown Kubelet stopped posting node status.
  MemoryPressure        False           Fri, 07 Oct 2016 08:13:50 +0000         Thu, 06 Oct 2016 23:57:52 +0000         KubeletHasSufficientMemory        kubelet has sufficient memory available
  DiskPressure          False           Fri, 07 Oct 2016 08:13:50 +0000         Thu, 06 Oct 2016 23:57:52 +0000         KubeletHasNoDiskPressure  kubelet has no disk pressure
  Ready                 Unknown         Fri, 07 Oct 2016 08:13:50 +0000         Fri, 07 Oct 2016 08:14:30 +0000         NodeStatusUnknown Kubelet stopped posting node status.
Addresses:              10.138.0.2,10.138.0.2
Capacity:
 alpha.kubernetes.io/nvidia-gpu:        0
 cpu:                                   1
 memory:                                1737208Ki
 pods:                                  110
Allocatable:
 alpha.kubernetes.io/nvidia-gpu:        0
 cpu:                                   1
 memory:                                1737208Ki
 pods:                                  110
System Info:
 Machine ID:                    af77f36e18459f0d0d262ed74e977e59
 System UUID:                   AF77F36E-1845-9F0D-0D26-2ED74E977E59
 Boot ID:                       617db356-a6da-4099-9b63-ad5f993178fd
 Kernel Version:                4.4.0-38-generic
 OS Image:                      Ubuntu 16.04.1 LTS
 Operating System:              linux
 Architecture:                  amd64
 Container Runtime Version:     docker://1.11.2
 Kubelet Version:               v1.4.0
 Kube-Proxy Version:            v1.4.0
ExternalID:                     operate
Non-terminated Pods:            (8 in total)
  Namespace                     Name                                            CPU Requests    CPU Limits      Memory Requests Memory Limits
  ---------                     ----                                            ------------    ----------      --------------- -------------
  kube-system                   etcd-operate                                    200m (20%)      0 (0%)          0 (0%)          0 (0%)
  kube-system                   kube-controller-manager-operate                 200m (20%)      0 (0%)          0 (0%)          0 (0%)
  kube-system                   kube-discovery-982812725-kkarx                  0 (0%)          0 (0%)          0 (0%)          0 (0%)
  kube-system                   kube-dns-2247936740-fse3h                       210m (21%)      210m (21%)      390Mi (22%)     390Mi (22%)
  kube-system                   kube-proxy-amd64-x3x3m                          0 (0%)          0 (0%)          0 (0%)          0 (0%)
  kube-system                   kube-scheduler-operate                          100m (10%)      0 (0%)          0 (0%)          0 (0%)
  kube-system                   kubernetes-dashboard-1655269645-0hzho           0 (0%)          0 (0%)          0 (0%)          0 (0%)
  kube-system                   weave-net-r38tz                                 20m (2%)        0 (0%)          0 (0%)          0 (0%)
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.
  CPU Requests  CPU Limits      Memory Requests Memory Limits
  ------------  ----------      --------------- -------------
  730m (73%)    210m (21%)      390Mi (22%)     390Mi (22%)

I’ve tried restarting the server to no success. How would I debug this? Thanks

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 25 (1 by maintainers)

Most upvoted comments

I just ran into this - on GKE 1.5.1 with alpha features turned on

The problem appeared when the cluster auto-scaled. The first node went to status NotReady and status: Kubelet stopped posting node status

The node was non-responsive - I could not ssh into it. Restarting the node cleared the status

Happens to me as well in AWS EKS. Any hint ?

Conditions:
  Type             Status    LastHeartbeatTime                 LastTransitionTime                Reason                    Message
  ----             ------    -----------------                 ------------------                ------                    -------
  OutOfDisk        Unknown   Thu, 04 Jul 2019 10:12:19 -0400   Thu, 04 Jul 2019 10:13:04 -0400   NodeStatusUnknown         Kubelet stopped posting node status.
  MemoryPressure   Unknown   Thu, 04 Jul 2019 10:12:19 -0400   Thu, 04 Jul 2019 10:13:04 -0400   NodeStatusUnknown         Kubelet stopped posting node status.
  DiskPressure     Unknown   Thu, 04 Jul 2019 10:12:19 -0400   Thu, 04 Jul 2019 10:13:04 -0400   NodeStatusUnknown         Kubelet stopped posting node status.
  PIDPressure      False     Thu, 04 Jul 2019 10:12:19 -0400   Thu, 04 Jul 2019 08:26:42 -0400   KubeletHasSufficientPID   kubelet has sufficient PID available
  Ready            Unknown   Thu, 04 Jul 2019 10:12:19 -0400   Thu, 04 Jul 2019 10:13:04 -0400   NodeStatusUnknown         Kubelet stopped posting node status.

Can’t log into the instance to inspect kubelet. Seems the instance is frozen or something

Edit: Follow up here https://github.com/awslabs/amazon-eks-ami/issues/79

I’m having the same issue on EKS with Kubernetes 1.12.

Minimal steps to reproduce:

  1. Create a deployment with 1 replica. 2 Nodes.
  2. Create a HPA with 50% cpu target, minpods 1, maxpods3
  3. Overload the cpu on the first Pod
  4. Watch HPA scaling with “kubectl get hpa -w”
  5. After 1 minute, see 1 Node go down with NotReady status.
  6. After 30 mins, Node still in NotReady status. Even after HPA has scaled down back to 1 Pod.

Rebooting the EC2 instance doesn’t help.

I am also facing same issue.

I see some issue after deploy the app. App is deployed success But running is 0.

root@kubernetes:~# kubectl run kubernetes-bootcamp --image=docker.io/jocatalin/kubernetes-bootcamp:v1 --port=8080 deployment “kubernetes-bootcamp” created root@kubernetes:~# kubectl get deployments NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE kubernetes-bootcamp 1 1 1 0 15s skycouch 1 1 1 0 2d test 1 1 1 0 3d

Can you suggest and give me the appropriate suggestion.

root@kubernetes:~# kubectl get nodes NAME STATUS ROLES AGE VERSION kubenode1 NotReady <none> 3d v1.8.3 kubenode2 NotReady <none> 3d v1.8.3 kubernetes NotReady master 3d v1.8.3

Thanks Skylab