prometheus-operator: prometheus-operator high CPU usage
After changing some ServiceMonitor and Service it started to syncing often (4-6 times per second) and consume lots of CPU
Server top
:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4055 root 20 0 1289564 780892 24920 S 44.1 10.0 657:10.56 /hyperkube apiserver --advertise-address=123.123.123.123 --etcd-servers=https://123.123.123.123:2379 --etcd-cafile=/etc/ssl/etcd/ss+
11007 root 20 0 57964 30048 5560 S 41.2 0.4 111:27.01 /bin/operator --kubelet-service=kube-system/kubelet --prometheus-config-reloader=quay.io/coreos/prometheus-config-reloader:v0+
2934 root 20 0 10.8g 570904 28152 S 29.4 7.3 254:45.36 /usr/local/bin/etcd
operator logs:
level=info ts=2018-07-22T13:42:19.488531966Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:19.673520342Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:19.926435905Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:20.217087316Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:20.377151554Z caller=operator.go:400 component=alertmanageroperator msg="sync alertmanager" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:20.532518265Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:20.703567261Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:20.907221323Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:21.580574057Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:21.76928422Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:21.984846144Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:22.162221857Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:22.352269176Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:22.493064909Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:22.7352912Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:22.983950194Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:23.175734439Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:23.335383162Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:23.468575212Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:23.656169945Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:23.988038254Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:24.142248531Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:24.35380114Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:24.538068047Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:24.784743416Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:25.039094124Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:25.181452826Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:25.320558779Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:25.452268213Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:25.653005215Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:25.864405214Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:26.130965577Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:26.372177244Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:26.527136827Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:26.698098587Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:26.915618298Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:27.170208284Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:27.356296116Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-07-22T13:42:27.583503043Z caller=operator.go:737 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 29 (22 by maintainers)
@mxinden yep, I’ve confirmed that the issue is fixed using the
high-cpu-usage-v1
image! Thanks for the quick fix - looking forward to the next release 😄I am able to reproduce the issue now. https://github.com/coreos/prometheus-operator/pull/1785 contains both a test to prevent regressions as well as a fix.
@Capitrium would you mind testing this out once again? Let me know if you want me to push a docker image. Thanks a lot for making this reproducible! Great job.
@mxinden @brancz After playing around with every combination of ServiceMonitor / PrometheusRule change I could imagine, I was finally able to reproduce this in minikube by doing something pretty much completely unrelated:
It seems like even the existence of the
annotations
metadata field breaks the hashing; runningkubectl annotate prometheus -n monitoring k8s test-annotation-
results in a blank annotations field (annotations: {}
) and the operator still syncs excessively. Removing the annotations field from the prometheus object entirely stops the excessive syncing.This came up since I frequently use
kubectl apply
, which adds thelast-applied-configuration
annotation; hopefully you get the same results in your tests… 😆Cool. Closing here then. Thanks @Paxa and @Capitrium for the great help.
@Paxa feel free to reopen in case you are still facing this issue.