kubernetes: kubelet_running_pod_count metric is not correct

What happened: kubelet_running_pod_count is larger than the pod actually running on the node, as follows:

Getting metic on prometheus:

kubelet_running_pod_count{instance="k8s-node01"}

result is:

kubelet_running_pod_count{beta_kubernetes_io_arch="amd64",beta_kubernetes_io_os="linux",instance="k8s-node01",job="kubernetes-nodes",kubernetes_io_hostname="k8s-node01"}  98

But using kubectl describe node:

kubectl  describe node k8s-node01

result is:

Non-terminated Pods:         (95 in total)

kubelet_running_pod_count is 98 but actually there is only 95 pod running on the node.

What you expected to happen: kubelet_running_pod_count should equal the pod actually running on the node

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version): v1.13.2
  • Cloud provider or hardware configuration:
  • OS (e.g: cat /etc/os-release): centos linux 7
  • Kernel (e.g. uname -a): 4.14.67-2.el7.centos.x86_64
  • Install tools:
  • Network plugin and version (if this is a network-related bug):
  • Others:

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 6
  • Comments: 30 (20 by maintainers)

Most upvoted comments

We run a lot of small jobs as part of Argo workflows. When they are done workflow and pods are removed.

Describe node and docker ps are showing 12 pods. kubelet_running_pod_count is showing 562. Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.1", GitCommit:"7879fc12a63337efff607952a323df90cdc7a335", GitTreeState:"clean", BuildDate:"2020-04-08T17:30:47Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}

Links to PRs trying to fix this:

#85983 is merged on 31 Jul 2020 for v1.19 release. #92180 is closed as 85983 fixed the issue.

/close Feel free to reopen this if you think this is not resolved.

We run a CronJob (https://github.com/kubernetes-sigs/descheduler) and noticed that the “CompletedJobs” it periodically produces are included in the reported metric kubelet_running_pod_count. Ideally they shouldn’t, using AWS EKS and Kubelet version v1.16.8-eks-e16311