prometheus-operator: prometheus-config-reloader, rules-configmap-reloader alertmanager config-reloader have insufficient CPU resources
Moving https://github.com/helm/charts/issues/9540 issue there.
What did you do?
helm install stable/prometheus-operator
with https://github.com/helm/charts/pull/9516 patch applied (actual alerts state sync) on GKE cluster
What did you expect to see? No alerts on prometheus-operator itself.
What did you see instead? Under which circumstances? Following alerts in Prometheus:
message 41% throttling of CPU in namespace monitoring for container prometheus-config-reloader in pod prometheus-operator-prometheus-0.
message 33% throttling of CPU in namespace monitoring for container rules-configmap-reloader in pod prometheus-operator-prometheus-0.
message 39% throttling of CPU in namespace monitoring for container config-reloader in pod alertmanager-prometheus-operator-alertmanager-0.
Environment
- Prometheus Operator version:
v0.25.0
- Kubernetes version information:
➜ ~ kubectl version
Client Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.2", GitCommit:"17c77c7898218073f14c8d573582e8d2313dc740", GitTreeState:"clean", BuildDate:"2018-10-30T21:39:16Z", GoVersion:"go1.11.1", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"10+", GitVersion:"v1.10.6-gke.11", GitCommit:"42df8ec7aef509caba40b6178616dcffca9d7355", GitTreeState:"clean", BuildDate:"2018-11-08T20:06:00Z", GoVersion:"go1.9.3b4", Compiler:"gc", Platform:"linux/amd64"}
- Kubernetes cluster kind:
GKE
Anything else we need to know: It seems like limits for this pod are not enough and should be increased:
name: rules-configmap-reloader
resources:
limits:
cpu: 5m
memory: 10Mi
requests:
cpu: 5m
memory: 10Mi
---
name: prometheus-config-reloader
resources:
limits:
cpu: 10m
memory: 50Mi
requests:
cpu: 10m
memory: 50Mi
---
name: config-reloader
resources:
limits:
cpu: 5m
memory: 10Mi
requests:
cpu: 5m
memory: 10Mi
Other possible case is busy loop, in this case no resources will be enough. But these limits are really low and I guess just increasing the value should help.
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 2
- Comments: 21 (11 by maintainers)
Also hitting intermittent (but quite frequent) alerts for config-reloader. Upgrading to Helm chart v4.0.0. (using 0.29.0) but doesn’t seem to help anything.
I’m using the prometheus-operator helm chart v3.0.0, and the
CPUThrottlingHigh
alert is entering inPENDING
state for the reloader containers intermittently. It lasts about 5m, so they are not firing at all.Off-topic: I’m having trouble with another containers, there is a way to silence notifications for this alert using config files from prometheus-operator chart? If so, help me here, please 😃
I see this often in my clusters running the helm version of prometheus-operator.
I was under the impression that https://github.com/helm/charts/issues/9540 and https://github.com/coreos/prometheus-operator/pull/2144 should have upped the limits for at least the alertmanager config-reloader, but here’s what my cluster shows.
My cluster isn’t even half utilized according to https://github.com/etopeter/kubectl-view-utilization
Here’s how often it happens. My current “solution” is what @metalmatze recommended: silencing those specific alerts in alertmanager.