prometheus-operator: prometheus-config-reloader, rules-configmap-reloader alertmanager config-reloader have insufficient CPU resources

Moving https://github.com/helm/charts/issues/9540 issue there. What did you do? helm install stable/prometheus-operator with https://github.com/helm/charts/pull/9516 patch applied (actual alerts state sync) on GKE cluster

What did you expect to see? No alerts on prometheus-operator itself.

What did you see instead? Under which circumstances? Following alerts in Prometheus:

message 41% throttling of CPU in namespace monitoring for container prometheus-config-reloader in pod prometheus-operator-prometheus-0.

message 33% throttling of CPU in namespace monitoring for container rules-configmap-reloader in pod prometheus-operator-prometheus-0.

message 39% throttling of CPU in namespace monitoring for container config-reloader in pod alertmanager-prometheus-operator-alertmanager-0.

Environment

Prometheus Operator version:

v0.25.0

Kubernetes version information:

➜  ~  kubectl version
Client Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.2", GitCommit:"17c77c7898218073f14c8d573582e8d2313dc740", GitTreeState:"clean", BuildDate:"2018-10-30T21:39:16Z", GoVersion:"go1.11.1", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"10+", GitVersion:"v1.10.6-gke.11", GitCommit:"42df8ec7aef509caba40b6178616dcffca9d7355", GitTreeState:"clean", BuildDate:"2018-11-08T20:06:00Z", GoVersion:"go1.9.3b4", Compiler:"gc", Platform:"linux/amd64"}

Kubernetes cluster kind:

GKE

Anything else we need to know: It seems like limits for this pod are not enough and should be increased:

    name: rules-configmap-reloader
    resources:
      limits:
        cpu: 5m
        memory: 10Mi
      requests:
        cpu: 5m
        memory: 10Mi
---
    name: prometheus-config-reloader
    resources:
      limits:
        cpu: 10m
        memory: 50Mi
      requests:
        cpu: 10m
        memory: 50Mi
---
    name: config-reloader
    resources:
      limits:
        cpu: 5m
        memory: 10Mi
      requests:
        cpu: 5m
        memory: 10Mi

Other possible case is busy loop, in this case no resources will be enough. But these limits are really low and I guess just increasing the value should help.

About this issue

Original URL
State: closed
Created 6 years ago
Reactions: 2
Comments: 21 (11 by maintainers)

Most upvoted comments

Also hitting intermittent (but quite frequent) alerts for config-reloader. Upgrading to Helm chart v4.0.0. (using 0.29.0) but doesn’t seem to help anything.

aleclerc on Feb 26, 2019

I’m using the prometheus-operator helm chart v3.0.0, and the CPUThrottlingHigh alert is entering in PENDING state for the reloader containers intermittently. It lasts about 5m, so they are not firing at all.

Off-topic: I’m having trouble with another containers, there is a way to silence notifications for this alert using config files from prometheus-operator chart? If so, help me here, please 😃

eduardobaitello on Feb 21, 2019

I see this often in my clusters running the helm version of prometheus-operator.

100 * sum by(container_name, pod_name, namespace) (increase(container_cpu_cfs_throttled_periods_total{container_name!=""}[5m])) / sum by(container_name, pod_name, namespace) (increase(container_cpu_cfs_periods_total[5m])) > 30

{container_name="rules-configmap-reloader",namespace="monitoring",pod_name="prometheus-prometheus-operator-prometheus-0"}	33.33333333333333
{container_name="config-reloader",namespace="monitoring",pod_name="alertmanager-prometheus-operator-alertmanager-0"}	57.142857142857146

$ helm ls
NAME               	REVISION	UPDATED                 	STATUS  	CHART                    	APP VERSION	NAMESPACE
prometheus-operator	38      	Thu Jan 31 18:39:28 2019	DEPLOYED	prometheus-operator-2.1.2	0.26.0     	monitoring
...

I was under the impression that https://github.com/helm/charts/issues/9540 and https://github.com/coreos/prometheus-operator/pull/2144 should have upped the limits for at least the alertmanager config-reloader, but here’s what my cluster shows.

$ kubectl describe pod prometheus-prometheus-operator-prometheus-0 -n monitoring
...
  prometheus-config-reloader:
    Image:         quay.io/coreos/prometheus-config-reloader:v0.25.0
    Limits:
      cpu:     10m
      memory:  50Mi
    Requests:
      cpu:     10m
      memory:  50Mi
...
  rules-configmap-reloader:
    Image:         quay.io/coreos/configmap-reload:v0.0.1
    Limits:
      cpu:     5m
      memory:  10Mi
    Requests:
      cpu:        5m
      memory:     10Mi
...

$ kubectl describe pod alertmanager-prometheus-operator-alertmanager-0 -n monitoring
...
  config-reloader:
    Image:         quay.io/coreos/configmap-reload:v0.0.1
    Limits:
      cpu:     5m
      memory:  10Mi
    Requests:
      cpu:        5m
      memory:     10Mi
...

My cluster isn’t even half utilized according to https://github.com/etopeter/kubectl-view-utilization

$ ./kubectl-view-utilization.sh
cores     11 / 24    (46%)
memory   13G / 45G   (28%)

Here’s how often it happens. My current “solution” is what @metalmatze recommended: silencing those specific alerts in alertmanager. cputhrottlinghigh

pgporada on Feb 1, 2019