kubernetes: container runtime status check may not have completed yet, PLEG is not healthy
What happened:
When I upgraded kubelet from v1.18.8 to v1.19.0 (v1.19.9 is the same) , the node is notready, and kubelet reported the following error
Apr 13 13:22:22 i-ie33j8hy kubelet[19792]: E0413 13:22:22.676030 19792 kubelet.go:1765] skipping pod synchronization - [container runtime status check may not have completed yet, PLEG is not healthy: pleg was last seen active 16m6.559177445s ago; threshold is 3m0s]
I’ve tried restarting the node and it doesn’t solve the problem.
I checked docker, and both docker ps and docker inspect are working fine.
Anything else we need to know?:
The kubelet configuration:
$ cat /var/lib/kubelet/config.yaml
apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
anonymous:
enabled: false
webhook:
cacheTTL: 0s
enabled: true
x509:
clientCAFile: /etc/kubernetes/pki/ca.crt
authorization:
mode: Webhook
webhook:
cacheAuthorizedTTL: 0s
cacheUnauthorizedTTL: 0s
clusterDNS:
- 169.254.25.10
clusterDomain: cluster.local
cpuManagerReconcilePeriod: 0s
evictionHard:
memory.available: 5%
evictionMaxPodGracePeriod: 120
evictionPressureTransitionPeriod: 30s
evictionSoft:
memory.available: 10%
evictionSoftGracePeriod:
memory.available: 2m
featureGates:
CSINodeInfo: true
ExpandCSIVolumes: true
RotateKubeletClientCertificate: true
RotateKubeletServerCertificate: true
VolumeSnapshotDataSource: true
fileCheckFrequency: 0s
healthzBindAddress: 127.0.0.1
healthzPort: 10248
httpCheckFrequency: 0s
imageMinimumGCAge: 0s
kind: KubeletConfiguration
kubeReserved:
cpu: 200m
memory: 250Mi
maxPods: 110
nodeStatusReportFrequency: 0s
nodeStatusUpdateFrequency: 0s
rotateCertificates: true
runtimeRequestTimeout: 0s
staticPodPath: /etc/kubernetes/manifests
streamingConnectionIdleTimeout: 0s
syncFrequency: 0s
systemReserved:
cpu: 200m
memory: 250Mi
volumeStatsAggPeriod: 0s
Environment:
- Kubernetes version (use
kubectl version):
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.0", GitCommit:"e19964183377d0ec2052d1f1fa930c4d7575bd50", GitTreeState:"clean", BuildDate:"2020-08-26T14:30:33Z", GoVersion:"go1.15", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.8", GitCommit:"9f2892aab98fe339f3bd70e3c470144299398ace", GitTreeState:"clean", BuildDate:"2020-08-13T16:04:18Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}
- Docker version
Client: Docker Engine - Community
Version: 20.10.5
API version: 1.40
Go version: go1.13.15
Git commit: 55c4c88
Built: Tue Mar 2 20:33:55 2021
OS/Arch: linux/amd64
Context: default
Experimental: true
Server: Docker Engine - Community
Engine:
Version: 19.03.8
API version: 1.40 (minimum version 1.12)
Go version: go1.12.17
Git commit: afacb8b
Built: Wed Mar 11 01:25:42 2020
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.4
GitCommit: 05f951a3781f4f2c1911b05e61c160e9c30eaa8e
runc:
Version: 1.0.0-rc93
GitCommit: 12644e614e25b05da6fd08a38ffa0cfe1903fdec
docker-init:
Version: 0.18.0
GitCommit: fec3683
- OS (e.g:
cat /etc/os-release):
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
- Kernel (e.g.
uname -a):
Linux i-ie33j8hy 3.10.0-1160.24.1.el7.x86_64 #1 SMP Thu Apr 8 19:51:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
- Install tools:
kubeadm
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 7
- Comments: 16 (5 by maintainers)
Update to 1.4.6 fix the issue for me too
@qeternity I am not sure. The problem is that this is one of my production clusters and I cannot try to reproduce/verify this at any time.
That’s why I ask for steps to debug, so I can do that in a goal-oriented way instead of just trying different things …
I can also try to prepare a cluster just for debugging. Much memory on the node and very simple pods with low memory would be helpful, I think.