kube-state-metrics: Missing metrics about pods in status failed and the reason
Is this a BUG REPORT or FEATURE REQUEST?:
Uncomment only one, leave it on its own line:
/kind bug
/kind feature
What happened: I have a few pods in failed state, due to out of CPU, for example:
Status: Failed
Reason: OutOfcpu
I couldn’t find a metric I can use to monitor pods in this state. By looking on the code, it seems that this state is not collected - only waiting or terminated.
What you expected to happen: Be able to monitor how many pods are in this state and why (by the reason).
How to reproduce it (as minimally and precisely as possible): Create a cluster with pods in failed status
Anything else we need to know?:
Environment:
- Kubernetes version (use
kubectl version):Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.3", GitCommit:"a4529464e4629c21224b3d52edfe0ea91b072862", GitTreeState:"clean", BuildDate:"2018-09-09T17:53:03Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"} - Kube-state-metrics image version: `quay.io/coreos/kube-state-metrics:v1.4.0z
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 17 (8 by maintainers)
That way there would be no timeseries for those states, even though they are possible states. Generally speaking you want to make sure that any timeseries that could be there, should be there for discoverability and aggregations.
@brancz is there a reason for an enum instead of just using the
reasonfield as-is?