kubernetes: PLEG is not healthy
Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
What happened:
I had a node missbehaving. journal of the kubelet showed this:
Mar 13 16:02:43 vk-prod4-node-v18-w3fz kubelet[1450]: I0313 16:02:43.934473 1450 kubelet.go:1778] skipping pod synchronization - [PLEG is not healthy: pleg was last seen active 3m5.30015447s ago; threshold is 3m0s]
Mar 13 16:02:48 vk-prod4-node-v18-w3fz kubelet[1450]: I0313 16:02:48.934773 1450 kubelet.go:1778] skipping pod synchronization - [PLEG is not healthy: pleg was last seen active 3m10.30056416s ago; threshold is 3m0s]
Mar 13 16:02:53 vk-prod4-node-v18-w3fz kubelet[1450]: I0313 16:02:53.935030 1450 kubelet.go:1778] skipping pod synchronization - [PLEG is not healthy: pleg was last seen active 3m15.300823802s ago; threshold is 3m0s]
Mar 13 16:02:58 vk-prod4-node-v18-w3fz kubelet[1450]: I0313 16:02:58.935306 1450 kubelet.go:1778] skipping pod synchronization - [PLEG is not healthy: pleg was last seen active 3m20.301094771s ago; threshold is 3m0s]
Mar 13 16:03:03 vk-prod4-node-v18-w3fz kubelet[1450]: I0313 16:03:03.940675 1450 kubelet.go:1778] skipping pod synchronization - [PLEG is not healthy: pleg was last seen active 3m25.306459083s ago; threshold is 3m0s]
Mar 13 16:03:08 vk-prod4-node-v18-w3fz kubelet[1450]: I0313 16:03:08.940998 1450 kubelet.go:1778] skipping pod synchronization - [PLEG is not healthy: pleg was last seen active 3m30.306781014s ago; threshold is 3m0s]
Mar 13 16:03:13 vk-prod4-node-v18-w3fz kubelet[1450]: I0313 16:03:13.941284 1450 kubelet.go:1778] skipping pod synchronization - [PLEG is not healthy: pleg was last seen active 3m35.307062566s ago; threshold is 3m0s]
Environment:
- Kubernetes version (use
kubectl version
): 1.8.8 - Cloud provider or hardware configuration: GCE
- OS (e.g. from /etc/os-release): Ubuntu 16.04.3 LTS
- Kernel (e.g.
uname -a
): 4.13.0-1011-gcp - Install tools: kubeadm
- Others:
- Docker Version: 1.12.6-cs13
- I’m using calico for networking
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 5
- Comments: 73 (1 by maintainers)
I have the same issue on K8S v1.10.11 with health CPU, memory and disk. v1.10.11 <none> CentOS Linux 7 (Core) 3.10.0-862.14.4.el7.x86_64 docker://1.13.1 Docker works normally by running “docker ps or info”.
By this way, the issue is fixed temporarily.
After restarting docker deamon, there was no container running, meanwhile, kubelet logs kept printing: kubelet[788387]: W0412 00:46:38.054829 788387 pod_container_deletor.go:77] Container “621a89ecc8d299773098d740cf9057602df1f67aba6ba85b7cae88701a9b4b06” not found in pod’s containers kubelet[567286]: I0411 22:44:11.526244 567286 kubelet.go:1803] skipping pod synchronization - [PLEG is not healthy: pleg was last seen active 2h8m28.080201738s ago; threshold is 3m0s]
It gave me light ---- I may delete “/var/lib/kubelet/pods/*”. /var/lib/kubelet/pods/9a954722-5c0c-11e9-91fc-005056bd5f06: containers etc-hosts plugins volumes
Step1. Stop kubelet Step2. Remove pods, but the volume folders can’t be removed at all: rm -rf /var/lib/kubelet/pods/* rm: cannot remove \u2018/var/lib/kubelet/pods/084cf8bd-5cd4-11e9-ad28-005056bd5f06/volumes/kubernetes.io~secret/default-token-c2xc7\u2019: Device or resource busy
Step3. Start kubelet
“docker ps” could list the containers.
The node came back to the “Ready” state.
Restarting docker.service resolved this issue for me.
“PLEG is not healthy: pleg was last seen active 17m16.107709513s ago; threshold is 3m0s”
I am still facing this issue. This is fluctuating. Because of this some pods get stuck. I am also using Cluster Autoscaler, which starts adding node, once pods are not scheduled because of this node in error.
Any help or clue?
I see this issue on 1.17.6, with lots of free RAM on the host
I have the same issue than I just reboot the problem node to solve the problem
shutdown -r now
also just had this wonderful experience (with calico 2.6) on one of my nodes in Azure
@albertvaka probably abandoned it in favour of https://github.com/kubernetes/kubernetes/issues/45419
It seems to happen to me when I have too many application instances running and too few nodes. It doesn’t matter what the size of the nodes are. I have a simple 3 node test cluster going. I create one project/namespace and run one instance of Odoo - all good. I add a few more instances of Odoo and after a week or so I’m plagued with pleg. My nodes are beefy too. This has happened on Upcloud, Hetzner and Digital Ocean.
On Tue, Jun 18, 2019 at 8:08 AM Mohammed S Fadin notifications@github.com wrote:
@albertvaka Was this bug fixed, or why was the issue closed? I am seeing the same thing.