kubernetes: PLEG is not healthy

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug

What happened:

I had a node missbehaving. journal of the kubelet showed this:

Mar 13 16:02:43 vk-prod4-node-v18-w3fz kubelet[1450]: I0313 16:02:43.934473    1450 kubelet.go:1778] skipping pod synchronization - [PLEG is not healthy: pleg was last seen active 3m5.30015447s ago; threshold is 3m0s]
Mar 13 16:02:48 vk-prod4-node-v18-w3fz kubelet[1450]: I0313 16:02:48.934773    1450 kubelet.go:1778] skipping pod synchronization - [PLEG is not healthy: pleg was last seen active 3m10.30056416s ago; threshold is 3m0s]
Mar 13 16:02:53 vk-prod4-node-v18-w3fz kubelet[1450]: I0313 16:02:53.935030    1450 kubelet.go:1778] skipping pod synchronization - [PLEG is not healthy: pleg was last seen active 3m15.300823802s ago; threshold is 3m0s]
Mar 13 16:02:58 vk-prod4-node-v18-w3fz kubelet[1450]: I0313 16:02:58.935306    1450 kubelet.go:1778] skipping pod synchronization - [PLEG is not healthy: pleg was last seen active 3m20.301094771s ago; threshold is 3m0s]
Mar 13 16:03:03 vk-prod4-node-v18-w3fz kubelet[1450]: I0313 16:03:03.940675    1450 kubelet.go:1778] skipping pod synchronization - [PLEG is not healthy: pleg was last seen active 3m25.306459083s ago; threshold is 3m0s]
Mar 13 16:03:08 vk-prod4-node-v18-w3fz kubelet[1450]: I0313 16:03:08.940998    1450 kubelet.go:1778] skipping pod synchronization - [PLEG is not healthy: pleg was last seen active 3m30.306781014s ago; threshold is 3m0s]
Mar 13 16:03:13 vk-prod4-node-v18-w3fz kubelet[1450]: I0313 16:03:13.941284    1450 kubelet.go:1778] skipping pod synchronization - [PLEG is not healthy: pleg was last seen active 3m35.307062566s ago; threshold is 3m0s]

Environment:

  • Kubernetes version (use kubectl version): 1.8.8
  • Cloud provider or hardware configuration: GCE
  • OS (e.g. from /etc/os-release): Ubuntu 16.04.3 LTS
  • Kernel (e.g. uname -a): 4.13.0-1011-gcp
  • Install tools: kubeadm
  • Others:
    • Docker Version: 1.12.6-cs13
    • I’m using calico for networking

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 5
  • Comments: 73 (1 by maintainers)

Most upvoted comments

I have the same issue on K8S v1.10.11 with health CPU, memory and disk. v1.10.11 <none> CentOS Linux 7 (Core) 3.10.0-862.14.4.el7.x86_64 docker://1.13.1 Docker works normally by running “docker ps or info”.

By this way, the issue is fixed temporarily.

After restarting docker deamon, there was no container running, meanwhile, kubelet logs kept printing: kubelet[788387]: W0412 00:46:38.054829 788387 pod_container_deletor.go:77] Container “621a89ecc8d299773098d740cf9057602df1f67aba6ba85b7cae88701a9b4b06” not found in pod’s containers kubelet[567286]: I0411 22:44:11.526244 567286 kubelet.go:1803] skipping pod synchronization - [PLEG is not healthy: pleg was last seen active 2h8m28.080201738s ago; threshold is 3m0s]

It gave me light ---- I may delete “/var/lib/kubelet/pods/*”. /var/lib/kubelet/pods/9a954722-5c0c-11e9-91fc-005056bd5f06: containers etc-hosts plugins volumes

Step1. Stop kubelet Step2. Remove pods, but the volume folders can’t be removed at all: rm -rf /var/lib/kubelet/pods/* rm: cannot remove \u2018/var/lib/kubelet/pods/084cf8bd-5cd4-11e9-ad28-005056bd5f06/volumes/kubernetes.io~secret/default-token-c2xc7\u2019: Device or resource busy

Step3. Start kubelet

“docker ps” could list the containers.

The node came back to the “Ready” state.

Restarting docker.service resolved this issue for me.

“PLEG is not healthy: pleg was last seen active 17m16.107709513s ago; threshold is 3m0s”

I am still facing this issue. This is fluctuating. Because of this some pods get stuck. I am also using Cluster Autoscaler, which starts adding node, once pods are not scheduled because of this node in error.

Any help or clue?

I see this issue on 1.17.6, with lots of free RAM on the host

free -g
              total        used        free      shared  buff/cache   available
Mem:             15           3           3           0           8          11
Swap:             0           0           0

I have the same issue than I just reboot the problem node to solve the problem shutdown -r now

also just had this wonderful experience (with calico 2.6) on one of my nodes in Azure

@albertvaka probably abandoned it in favour of https://github.com/kubernetes/kubernetes/issues/45419

It seems to happen to me when I have too many application instances running and too few nodes. It doesn’t matter what the size of the nodes are. I have a simple 3 node test cluster going. I create one project/namespace and run one instance of Odoo - all good. I add a few more instances of Odoo and after a week or so I’m plagued with pleg. My nodes are beefy too. This has happened on Upcloud, Hetzner and Digital Ocean.

On Tue, Jun 18, 2019 at 8:08 AM Mohammed S Fadin notifications@github.com wrote:

I’m having the exact same issue on IBM Cloud Kubernetes as well.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kubernetes/kubernetes/issues/61117?email_source=notifications&email_token=AEVXNSGFBNV5VNNCKIWLIW3P3DF5VA5CNFSM4EVD4O72YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODX6FDQA#issuecomment-503075264, or mute the thread https://github.com/notifications/unsubscribe-auth/AEVXNSAHJI5RRB2Y2FZJHD3P3DF5VANCNFSM4EVD4O7Q .

@albertvaka Was this bug fixed, or why was the issue closed? I am seeing the same thing.