kube-state-metrics: "Evicted" pods don't register metrics

What happened: I have many pods with Evicted state:

kubectl get pod -A  | grep Evicted | wc -l
     117

But no metric with reason=Evicted. The following query returns empty for the last week

{job="kube-state-metrics", reason="Evicted"} == 1

What you expected to happen: I expected the above query to return with metrics of the evicted pods. How to reproduce it (as minimally and precisely as possible): Check that evicted nodes exist and then query prometheus as mentioned above

Environment:

  • kube-state-metrics version: 1.9.7
  • Kubernetes version (use kubectl version): 1.18
  • Cloud provider or hardware configuration: EKS on AWS

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 13
  • Comments: 26 (10 by maintainers)

Most upvoted comments

Hi @brancz Looks like this is effecting many users. Did you get a chance to look into this?

good night I could do it with the following metric sum by (namespace) (kube_pod_status_reason {reason = “Evicted”})> 0 version 2.0.0 kube-state-metrics I hope it helps you. Additionally I leave the alertmanager rule if they want to use it

  • name: pods-evicted rules:
    • alert: PodsEvicted annotations: description: Pods with evicted status detected. summary: Pods with evicted status detected. expr: | sum by (namespace) (kube_pod_status_reason{reason=“Evicted”}) > 0 for: 15m labels: severity: warning

@yfried This is a bug with the version you are running where KSM conflates pod and container states. This should be fixed in versions 2.0.0 and later. The metric containing this information is kube_pod_status_reason

Since 1.x.x has other issues according to the compatibility matrix, are you able to upgrade KSM to one of the 2.x.x versions?

Also keep in mind that only pods can have the Evicted reason in their status. Metrics about pod containers will likely not reflect this information.

+1

any progress on this issue?

Can confirm this is happening in our cluster right now too

➜ (⎈ gke:test) tmp  ✗ k get po | grep -i evicted | wc -l
     996

But the metrics never show a value of 1 as far as we can look. Our current workaround is to use

kube_pod_status_phase{namespace="test",phase="Failed"}

Which metric did you expect to be there but wasn’t?

@brancz All {job="kube-state-metrics", reason="Evicted"} metrics are 0 even though there are eviction events