risingwave: Performance degradation for nightly-20240109
Describe the bug
+-----------------------------------------------------------------+--------------+------------+---------------------+-----------------------------+
| BENCHMARK NAME | EXECUTION ID | STATUS | FLUCTUATION OF BEST | FLUCTUATION OF LAST 10 DAYS |
+-----------------------------------------------------------------+--------------+------------+---------------------+-----------------------------+
| nexmark-q4-blackhole-medium-1cn-affinity | 18144 | Negative | -26.91% | -12.18% |
| nexmark-q20-blackhole-medium-1cn-affinity | 18156 | Negative | -19.34% | -11.74% |
| nexmark-q9-blackhole-medium-1cn-affinity | 18161 | Negative | -30.70% | -15.74% |
| sysbench-oltp-point-select-medium-3cn | 18165 | Negative | -39.56% | -21.76% |
| sysbench-select-random-points-medium-3cn | 18166 | Negative | -32.88% | -14.88% |
| sysbench-select-random-ranges-medium-3cn | 18168 | Negative | -38.10% | -18.27% |
| sysbench-oltp-insert-medium-3cn | 18169 | Negative | -33.20% | -22.09% |
| sysbench-select-random-limits-medium-3cn | 18171 | Negative | -58.32% | -27.49% |
http://metabase.risingwave-cloud.xyz/dashboard/278-sysbench-rw-qps http://metabase.risingwave-cloud.xyz/dashboard/277-nexmark-blackhole-affinity-1cn-rw-avg-source-throughput?namespace=daily
Error message/log
No response
To Reproduce
No response
Expected behavior
No response
How did you deploy RisingWave?
No response
The version of RisingWave
nightly-20240109
Additional context
https://github.com/risingwavelabs/rw-commits-history?tab=readme-ov-file#nightly-20240109
About this issue
- Original URL
- State: closed
- Created 6 months ago
- Reactions: 2
- Comments: 27 (27 by maintainers)
I believe the degradation of sysbench has been resolved by #14528.
Since #14528 got merged, distributed tracing is now off by default. Let’s see the result of sysbench tonight.
Today‘s bench result looks well… let’s watch and wait for tomorrow’s result.
Aren’t we already sure it’s caused by tracing (and earlier regression)? 👀 Which query’s regression are we talking about now?
Currently, the only way to do it quickly is adding the env var to kube-bench https://github.com/risingwavelabs/kube-bench/blob/main/manifests/risingwave/risingwave.template.yaml#L151 in a branch. And then we can run the pipeline by specifying the
KUBEBENCH_BRANCHwhen we trigger the jobSo the degradation does not because we have closed the fast compaction. we still need to find which PR caused it…
Q20’s perf even decreased in your test
just to be safe, maybe let
fast compactionbe tested for a longer period of time before enabling it by default?edit: We do enable it in the daily longevity tests
edit: actually, it is enabled in all the tests right now
Benchmark is done. 😕 It could possibly be related to the traces, but the impact does not seem significant.
nightly-20240107vsnightly-20240109vsnightly-20240109 (disable embedded tracing)In our daily performance test notification, we only consider the executions with < -10% perf fluctuations comparing to the average key metrics of the last 10 days as negative executions. I just paste the negative ones in this issue. If you want to go through the overall daily perf brief. Please check
#perf-notifychannel for msg like this https://risingwave-labs.slack.com/archives/C04R6R5236C/p1704844802151659. You will get a brief table as below.Does the benchmark pipeline accept env vars? Setting
RW_DISABLE_EMBEDDED_TRACINGwould be good.This is also the most confusing fact to me.
Let’s try to run the benchmarks with https://github.com/risingwavelabs/risingwave/pull/14220 reverted to confirm it first?