kubernetes: kubelet_running_pod_count metric is not correct
What happened: kubelet_running_pod_count is larger than the pod actually running on the node, as follows:
Getting metic on prometheus:
kubelet_running_pod_count{instance="k8s-node01"}
result is:
kubelet_running_pod_count{beta_kubernetes_io_arch="amd64",beta_kubernetes_io_os="linux",instance="k8s-node01",job="kubernetes-nodes",kubernetes_io_hostname="k8s-node01"} 98
But using kubectl describe node:
kubectl describe node k8s-node01
result is:
Non-terminated Pods: (95 in total)
kubelet_running_pod_count is 98 but actually there is only 95 pod running on the node.
What you expected to happen: kubelet_running_pod_count should equal the pod actually running on the node
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
- Kubernetes version (use
kubectl version
): v1.13.2 - Cloud provider or hardware configuration:
- OS (e.g:
cat /etc/os-release
): centos linux 7 - Kernel (e.g.
uname -a
): 4.14.67-2.el7.centos.x86_64 - Install tools:
- Network plugin and version (if this is a network-related bug):
- Others:
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 6
- Comments: 30 (20 by maintainers)
We run a lot of small jobs as part of Argo workflows. When they are done workflow and pods are removed.
Describe node and docker ps are showing 12 pods. kubelet_running_pod_count is showing 562.
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.1", GitCommit:"7879fc12a63337efff607952a323df90cdc7a335", GitTreeState:"clean", BuildDate:"2020-04-08T17:30:47Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
#85983 is merged on 31 Jul 2020 for v1.19 release. #92180 is closed as 85983 fixed the issue.
/close Feel free to reopen this if you think this is not resolved.
We run a CronJob (https://github.com/kubernetes-sigs/descheduler) and noticed that the “CompletedJobs” it periodically produces are included in the reported metric
kubelet_running_pod_count
. Ideally they shouldn’t, using AWS EKS and Kubelet version v1.16.8-eks-e16311