coroot: High memory usage and OOM loop
Hi,
we are experiencing high memory usage and OOM kill loop while using Coroot in our GKE cluster. Coroot container tries to allocate up to 14GiB of memory before it is killed. Here is the complete log of the container before getting killed:
I0126 13:44:42.589518 1 main.go:45] version: 0.12.1, url-base-path: /, read-only: false I0126 13:44:42.589639 1 db.go:38] using postgres database I0126 13:44:43.715195 1 cache.go:130] cache loaded from disk in 1.106531315s I0126 13:44:43.715499 1 compaction.go:81] compaction worker started I0126 13:44:43.716125 1 main.go:142] listening on :8080 I0126 13:44:44.716784 1 updater.go:54] worker iteration for krxa44eq I0126 13:44:53.716464 1 compaction.go:92] compaction iteration started
Here is the graph of memory usage:
We set an 8GiB memory limit on the Coroot container.
Before we set the memory limit, the container allocated up to 24GiB of memory.
We tried with both SQLite and PostgreSQL and there were no differences in behavior.
Our GKE cluster version is v1.24.5-gke.600
.
We have 22 Nodes, 154 Deployments, 25 DaemonSets, and 12 StatefulSets which in total have 857 Pods.
About this issue
- Original URL
- State: open
- Created a year ago
- Comments: 25 (13 by maintainers)
Commits related to this issue
- constructor: fix RDS instances lookup (#18) — committed to coroot/coroot by apetruhin a year ago
Hi @YoranSys, please update to version 0.14.9. We are expecting a noticeable reduction in memory consumption and improved UI responsiveness.
Coroot v0.22+ should work much better on large clusters. Thank you, @wenhuwang, for the assistance.
Related releases: https://github.com/coroot/coroot/releases/tag/v0.22.0 https://github.com/coroot/coroot/releases/tag/v0.22.1
@wenhuwang, please upgrade Coroot using the latest helm chart.
container_net_tcp_*
metricsWe expect much lower CPU and memory consumption within an hour or two after the upgrade. As Coroot updates metrics from Prometheus using a 1-hour time window, these changes will take effect when the “old” metrics fall out of this window. Alternatively, you have the option to delete historical data from Prometheus after upgrading the chart if it is acceptable in your case.
@apetruhin Hi, i installed coroot with version 0.21.0, my cluster has 44 nodes, 6610 pods. coroot used 15C cpu and 50G memory.
More importantly, the coroot UI data is always empty.
The status of coroot-related pods is normal, and no error logs are seen in coroot and promtheus pods. prometheus configuration is also right, I’m not sure what the
An error has been occurred while querying Prometheus
problem is.We’re continuing to work on reducing memory consumption. Please update to version 0.14.7 to get the
Postgres
tab fixed.Hi @YoranSys, please try version 0.14.6
Thank you. We have more optimizations coming soon.