risingwave: OOM for sysbench select random limits (Hummock read duration seems abnormal)
Describe the bug
https://buildkite.com/risingwave-test/sysbench/builds/554#018be963-e823-4393-aad6-38255d87bcdd/1121
image is 20231119
Dashboard:
The read duration seems abnormal.
The previous tests are all good, e.g. the latest successful one is image 20231116
https://buildkite.com/risingwave-test/sysbench/builds/553#018bd9f0-9c26-4472-a129-e0cb237ecf39
According to https://github.com/risingwavelabs/rw-commits-history#nightly-20231119, it does not seem to be affected by a PR between 20231116 and 20231119.
Error message/log
No response
To Reproduce
No response
Expected behavior
No response
How did you deploy RisingWave?
No response
The version of RisingWave
No response
Additional context
IIUC, we have implemented the mechanism at the executor level to kill a batch query that risks going OOM. Therefore, OOM is unexpected, although the root cause of OOM is not necessarily this mechanism.
I saw https://github.com/risingwavelabs/risingwave/pull/13132 merged in 20231115, wonder if it may have some impact?
About this issue
- Original URL
- State: closed
- Created 7 months ago
- Reactions: 1
- Comments: 23 (21 by maintainers)
It seems that v1.4.0 does not have this issue. https://buildkite.com/risingwave-test/sysbench/builds/569
nightly-20231121The sysbench 32c64g all pods affinity configuration is as such:

CN still OOM! However, we did not see the mem metric exceeding the limit on grafana. I guess it is because the rapid increase of cn mem caused us to not collect useful metrics.
https://grafana.test.risingwave-cloud.xyz/d/EpkBw5W4k/risingwave-dev-dashboard?orgId=1&var-datasource=P2453400D1763B4D9&var-namespace=jianwei-sysbench-20231121&var-instance=benchmark-risingwave&var-pod=All&var-component=All&var-table=All&from=1700639051172&to=1700640236739
Drop the
.collapsedfile into https://www.speedscope.app/ and click the top-left “Left Heavy” button.Lots of memory hold by Hummock iteration
1700467603-2023-11-20-08-06-42.auto.heap.collapsed.zip
Seems leaks somewhere. This is critical and let’s prioritize it.
Might be related #9732