kube-state-metrics: Repeated OOM'ing (perhaps due to a large number of namespaces)

/kind bug

What happened:

I’m running kube-state-metrics as part of kube-prometheus but it’s repeatedly triggering an OOMKilled.

I suspect this is because of the large number of namespaces we have. Some bits of information:

$ kubectl get ns | wc -l
     238

$ kubectl get nodes | wc -l
      47

$ kubectl get pods --all-namespaces | wc -l
    4008

$ kubectl get secrets --all-namespaces | wc -l
    8313

The resource request and limits are: { "cpu": "188m", "memory": "5290Mi" }. (Unfortunately, I’m having trouble getting resource utilization before the oom)

What you expected to happen:

Not OOM

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.7", GitCommit:"dd5e1a2978fd0b97d9b78e1564398aeea7e7fe92", GitTreeState:"clean", BuildDate:"2018-04-19T00:05:56Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"10+", GitVersion:"v1.10.4-gke.2", GitCommit:"eb2e43842aaa21d6f0bb65d6adf5a84bbdc62eaf", GitTreeState:"clean", BuildDate:"2018-06-15T21:48:39Z", GoVersion:"go1.9.3b4", Compiler:"gc", Platform:"linux/amd64"}

Kube-state-metrics image version

"quay.io/coreos/kube-state-metrics:v1.3.1"

About this issue

Original URL
State: closed
Created 6 years ago
Comments: 17 (7 by maintainers)

Most upvoted comments

For anyone else that lands here investigating a similar issue, it seems like a large aggregate number of any/all resource tracked by this exporter can cause it to use a fair bit of memory. The simplest way is to query for counts of metrics like kube_*, but if that’s fallen out of your history you can also bump up the memory limit on the exporter and then query it directly.

In my case I learned that Helm doesn’t necessarily clean up old release revisions and I had 3600+ configmaps cluttering up the cluster.

Once you get your house in order you can restart the exporter to check that its memory usage is within reason, and then bump the limit back down.

wrossmann on Oct 21, 2022

Could you try removing the addon-resizer and just remove all resource limits and requests? I have a feeling that the resource recommendations that we have currently are off. They are from the scalability tests from around a year ago.

brancz on Jul 16, 2018