prometheus-operator: prometheus-operator high CPU usage

After changing some ServiceMonitor and Service it started to syncing often (4-6 times per second) and consume lots of CPU

Server top:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                       
 4055 root      20   0 1289564 780892  24920 S  44.1 10.0 657:10.56 /hyperkube apiserver --advertise-address=123.123.123.123 --etcd-servers=https://123.123.123.123:2379 --etcd-cafile=/etc/ssl/etcd/ss+
11007 root      20   0   57964  30048   5560 S  41.2  0.4 111:27.01 /bin/operator --kubelet-service=kube-system/kubelet --prometheus-config-reloader=quay.io/coreos/prometheus-config-reloader:v0+
 2934 root      20   0   10.8g 570904  28152 S  29.4  7.3 254:45.36 /usr/local/bin/etcd                                                                                                           

operator logs:

level=info ts=2018-07-22T13:42:19.488531966Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:19.673520342Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:19.926435905Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:20.217087316Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:20.377151554Z caller=operator.go:400 component=alertmanageroperator msg="sync alertmanager" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:20.532518265Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:20.703567261Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:20.907221323Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:21.580574057Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:21.76928422Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:21.984846144Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:22.162221857Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:22.352269176Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:22.493064909Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:22.7352912Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:22.983950194Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:23.175734439Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:23.335383162Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:23.468575212Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:23.656169945Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:23.988038254Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:24.142248531Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:24.35380114Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:24.538068047Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:24.784743416Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:25.039094124Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:25.181452826Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:25.320558779Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:25.452268213Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:25.653005215Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:25.864405214Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:26.130965577Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:26.372177244Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:26.527136827Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:26.698098587Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:26.915618298Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:27.170208284Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:27.356296116Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:27.583503043Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 29 (22 by maintainers)

Most upvoted comments

@mxinden yep, I’ve confirmed that the issue is fixed using the high-cpu-usage-v1 image! Thanks for the quick fix - looking forward to the next release 😄

I am able to reproduce the issue now. https://github.com/coreos/prometheus-operator/pull/1785 contains both a test to prevent regressions as well as a fix.

@Capitrium would you mind testing this out once again? Let me know if you want me to push a docker image. Thanks a lot for making this reproducible! Great job.

@mxinden @brancz After playing around with every combination of ServiceMonitor / PrometheusRule change I could imagine, I was finally able to reproduce this in minikube by doing something pretty much completely unrelated:

  1. Deploy the kube-prometheus stack to minikube
  2. Annotate the prometheus k8s object:
kubectl annotate prometheus -n monitoring k8s test-annotation=test-value 
  1. prometheus-operator syncs k8s object multiple times per second

It seems like even the existence of the annotations metadata field breaks the hashing; running kubectl annotate prometheus -n monitoring k8s test-annotation- results in a blank annotations field (annotations: {}) and the operator still syncs excessively. Removing the annotations field from the prometheus object entirely stops the excessive syncing.

This came up since I frequently use kubectl apply, which adds the last-applied-configuration annotation; hopefully you get the same results in your tests… 😆

Cool. Closing here then. Thanks @Paxa and @Capitrium for the great help.

@Paxa feel free to reopen in case you are still facing this issue.