kube-state-metrics: Missing metrics about pods in status failed and the reason

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug

/kind feature

What happened: I have a few pods in failed state, due to out of CPU, for example:

Status:             Failed
Reason:             OutOfcpu

I couldn’t find a metric I can use to monitor pods in this state. By looking on the code, it seems that this state is not collected - only waiting or terminated.

What you expected to happen: Be able to monitor how many pods are in this state and why (by the reason).

How to reproduce it (as minimally and precisely as possible): Create a cluster with pods in failed status

Anything else we need to know?:

Environment:

Kubernetes version (use kubectl version): Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.3", GitCommit:"a4529464e4629c21224b3d52edfe0ea91b072862", GitTreeState:"clean", BuildDate:"2018-09-09T17:53:03Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
Kube-state-metrics image version: `quay.io/coreos/kube-state-metrics:v1.4.0z

About this issue

Original URL
State: closed
Created 6 years ago
Comments: 17 (8 by maintainers)

Most upvoted comments

That way there would be no timeseries for those states, even though they are possible states. Generally speaking you want to make sure that any timeseries that could be there, should be there for discoverability and aggregations.

brancz on Dec 14, 2018

I think we treat the reason as an enum, so we need to specifically list each reason.

@brancz is there a reason for an enum instead of just using the reason field as-is?

hairyhenderson on Dec 14, 2018