kubernetes: container runtime status check may not have completed yet, PLEG is not healthy

What happened:

When I upgraded kubelet from v1.18.8 to v1.19.0 (v1.19.9 is the same) , the node is notready, and kubelet reported the following error

Apr 13 13:22:22 i-ie33j8hy kubelet[19792]: E0413 13:22:22.676030   19792 kubelet.go:1765] skipping pod synchronization - [container runtime status check may not have completed yet, PLEG is not healthy: pleg was last seen active 16m6.559177445s ago; threshold is 3m0s]

I’ve tried restarting the node and it doesn’t solve the problem.

I checked docker, and both docker ps and docker inspect are working fine.

Anything else we need to know?:

The kubelet configuration:

$ cat /var/lib/kubelet/config.yaml
apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
  anonymous:
    enabled: false
  webhook:
    cacheTTL: 0s
    enabled: true
  x509:
    clientCAFile: /etc/kubernetes/pki/ca.crt
authorization:
  mode: Webhook
  webhook:
    cacheAuthorizedTTL: 0s
    cacheUnauthorizedTTL: 0s
clusterDNS:
- 169.254.25.10
clusterDomain: cluster.local
cpuManagerReconcilePeriod: 0s
evictionHard:
  memory.available: 5%
evictionMaxPodGracePeriod: 120
evictionPressureTransitionPeriod: 30s
evictionSoft:
  memory.available: 10%
evictionSoftGracePeriod:
  memory.available: 2m
featureGates:
  CSINodeInfo: true
  ExpandCSIVolumes: true
  RotateKubeletClientCertificate: true
  RotateKubeletServerCertificate: true
  VolumeSnapshotDataSource: true
fileCheckFrequency: 0s
healthzBindAddress: 127.0.0.1
healthzPort: 10248
httpCheckFrequency: 0s
imageMinimumGCAge: 0s
kind: KubeletConfiguration
kubeReserved:
  cpu: 200m
  memory: 250Mi
maxPods: 110
nodeStatusReportFrequency: 0s
nodeStatusUpdateFrequency: 0s
rotateCertificates: true
runtimeRequestTimeout: 0s
staticPodPath: /etc/kubernetes/manifests
streamingConnectionIdleTimeout: 0s
syncFrequency: 0s
systemReserved:
  cpu: 200m
  memory: 250Mi
volumeStatsAggPeriod: 0s

Environment:

  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.0", GitCommit:"e19964183377d0ec2052d1f1fa930c4d7575bd50", GitTreeState:"clean", BuildDate:"2020-08-26T14:30:33Z", GoVersion:"go1.15", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.8", GitCommit:"9f2892aab98fe339f3bd70e3c470144299398ace", GitTreeState:"clean", BuildDate:"2020-08-13T16:04:18Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}

  • Docker version
Client: Docker Engine - Community
 Version:           20.10.5
 API version:       1.40
 Go version:        go1.13.15
 Git commit:        55c4c88
 Built:             Tue Mar  2 20:33:55 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          19.03.8
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.17
  Git commit:       afacb8b
  Built:            Wed Mar 11 01:25:42 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.4
  GitCommit:        05f951a3781f4f2c1911b05e61c160e9c30eaa8e
 runc:
  Version:          1.0.0-rc93
  GitCommit:        12644e614e25b05da6fd08a38ffa0cfe1903fdec
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683
  • OS (e.g: cat /etc/os-release):
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
  • Kernel (e.g. uname -a):
Linux i-ie33j8hy 3.10.0-1160.24.1.el7.x86_64 #1 SMP Thu Apr 8 19:51:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
  • Install tools:
kubeadm

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 7
  • Comments: 16 (5 by maintainers)

Commits related to this issue

Most upvoted comments

Update to 1.4.6 fix the issue for me too

@qeternity I am not sure. The problem is that this is one of my production clusters and I cannot try to reproduce/verify this at any time.

That’s why I ask for steps to debug, so I can do that in a goal-oriented way instead of just trying different things …

I can also try to prepare a cluster just for debugging. Much memory on the node and very simple pods with low memory would be helpful, I think.