node-problem-detector: health-checker posting wrong status for kubelet - `KubeletUnhealthy`

I am trying to use the health-check-monitor to monitor kubelet and docker. I built the binaries from the source and created a docker image.

Although the pod is running fine, it consistently posts -

Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  KubeletUnhealthy     True    Fri, 03 Jul 2020 17:04:55 +0100   Fri, 03 Jul 2020 16:54:53 +0100   KubeletUnhealthy             kubelet:kubelet was found unhealthy; repair flag : false

Pod logs:

I0703 16:16:01.211066       1 plugin.go:86] Start to run custom plugins
I0703 16:16:01.222034       1 plugin.go:110] Add check result {Rule:0xc00003b1f0 ExitStatus:1 Message:kubelet:kubelet was found unhealthy; repair flag : false} for rule &{Type:permanent Condition:KubeletUnhealthy Reason:KubeletUnhealthy Path:/home/kubernetes/bin/health-checker Args:[--component=kubelet --enable-repair=false --cooldown-time=1m --health-check-timeout=10s] TimeoutString:0xc0000a8a50 Timeout:3m0s}
I0703 16:16:01.222104       1 plugin.go:115] Finish running custom plugins
I0703 16:16:01.222139       1 custom_plugin_monitor.go:138] New status generated: &{Source:health-checker Events:[] Conditions:[{Type:KubeletUnhealthy Status:True Transition:2020-07-03 15:54:31.225162591 +0000 UTC m=+0.065349002 Reason:KubeletUnhealthy Message:kubelet:kubelet was found unhealthy; repair flag : false}]}

I exec to the pod:

$ kubectl exec -it node-problem-detector-kl9kj -- /bin/sh
# /home/kubernetes/bin/health-checker
I0703 16:17:36.720049    1019 health_checker.go:136] health-checker: component is unhealthy, proceeding to repair
I0703 16:17:36.720251    1019 health_checker.go:156] health-checker: executing command : &{systemctl [systemctl show kubelet --property=ActiveEnterTimestamp] []  <nil> <nil> <nil> [] <nil> <nil> <nil> 0xc00008c840 0xc000096460 false [] [] [] [] <nil> <nil>}
I0703 16:17:36.720355    1019 health_checker.go:159] health-checker: command failed : exec: "systemctl": executable file not found in $PATH, []
I0703 16:17:36.720407    1019 health_checker.go:140] health-checker: exec: "systemctl": executable file not found in $PATH
I0703 16:17:36.720453    1019 health_checker.go:142] health-checker: component uptime: 0s
kubelet:kubelet was found unhealthy; repair flag : true

On the node:

m02:/usr/local/bin$ curl -m 100 -f -s -S http://127.0.0.1:10248/healthz
ok

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 1
  • Comments: 20 (3 by maintainers)

Most upvoted comments

I was able to get further by mounting /var/run/dbus/system_bus_socket and a doing a clean-install systemd on the image. I then had to enable host networking since it seems to just be doing a call to 127.0.0.1:10248/healthz at https://github.com/kubernetes/node-problem-detector/blob/f42281ee2658900bdb0571e1159a43f6ab712a19/pkg/healthchecker/health_checker.go#L110.