risingwave: Performance degradation for nightly-20240109

Describe the bug

+-----------------------------------------------------------------+--------------+------------+---------------------+-----------------------------+
| BENCHMARK NAME                                                  | EXECUTION ID | STATUS     | FLUCTUATION OF BEST | FLUCTUATION OF LAST 10 DAYS |
+-----------------------------------------------------------------+--------------+------------+---------------------+-----------------------------+
| nexmark-q4-blackhole-medium-1cn-affinity                        |        18144 | Negative   | -26.91%             | -12.18%                     |
| nexmark-q20-blackhole-medium-1cn-affinity                       |        18156 | Negative   | -19.34%             | -11.74%                     |
| nexmark-q9-blackhole-medium-1cn-affinity                        |        18161 | Negative   | -30.70%             | -15.74%                     |
| sysbench-oltp-point-select-medium-3cn                           |        18165 | Negative   | -39.56%             | -21.76%                     |
| sysbench-select-random-points-medium-3cn                        |        18166 | Negative   | -32.88%             | -14.88%                     |
| sysbench-select-random-ranges-medium-3cn                        |        18168 | Negative   | -38.10%             | -18.27%                     |
| sysbench-oltp-insert-medium-3cn                                 |        18169 | Negative   | -33.20%             | -22.09%                     |
| sysbench-select-random-limits-medium-3cn                        |        18171 | Negative   | -58.32%             | -27.49%                     |

http://metabase.risingwave-cloud.xyz/dashboard/278-sysbench-rw-qps http://metabase.risingwave-cloud.xyz/dashboard/277-nexmark-blackhole-affinity-1cn-rw-avg-source-throughput?namespace=daily

Error message/log

No response

To Reproduce

No response

Expected behavior

No response

How did you deploy RisingWave?

No response

The version of RisingWave

nightly-20240109

Additional context

https://github.com/risingwavelabs/rw-commits-history?tab=readme-ov-file#nightly-20240109

About this issue

Original URL
State: closed
Created 6 months ago
Reactions: 2
Comments: 27 (27 by maintainers)

Most upvoted comments

I believe the degradation of sysbench has been resolved by #14528.

BugenZhao on Jan 16, 2024

Can you bench the sysbench test without traces? I am not sure if the degradation of streaming nexmark and serving sysbench are caused by the same reason.

Since #14528 got merged, distributed tracing is now off by default. Let’s see the result of sysbench tonight.

BugenZhao on Jan 15, 2024

Aren’t we already sure it’s caused by tracing (and earlier regression)? 👀 Which query’s regression are we talking about now?

Today‘s bench result looks well… let’s watch and wait for tomorrow’s result.

st1page on Jan 11, 2024

Aren’t we already sure it’s caused by tracing (and earlier regression)? 👀 Which query’s regression are we talking about now?

xxchan on Jan 11, 2024

Currently, the only way to do it quickly is adding the env var to kube-bench https://github.com/risingwavelabs/kube-bench/blob/main/manifests/risingwave/risingwave.template.yaml#L151 in a branch. And then we can run the pipeline by specifying the KUBEBENCH_BRANCH when we trigger the job

cyliu0 on Jan 10, 2024

I think it may be because risingwavelabs/kube-bench#349 enables fast compaction all of the testing pipelines instead of just the perf test one… let me correct it

edit: or do you want to keep the status quo?

So the degradation does not because we have closed the fast compaction. we still need to find which PR caused it…

st1page on Jan 11, 2024

Benchmark is done. 😕 It could possibly be related to the traces, but the impact does not seem significant.

nightly-20240107 vs nightly-20240109 vs nightly-20240109 (disable embedded tracing)

Q20’s perf even decreased in your test

xxchan on Jan 11, 2024

just to be safe, maybe let fast compaction be tested for a longer period of time before enabling it by default?

edit: We do enable it in the daily longevity tests

edit: actually, it is enabled in all the tests right now

lmatz on Jan 11, 2024

Let’s try to run the benchmarks with #14220 reverted to confirm it first?

Does the benchmark pipeline accept env vars? Setting RW_DISABLE_EMBEDDED_TRACING would be good.

Benchmark is done. 😕 It could possibly be related to the traces, but the impact does not seem significant.

nightly-20240107 vs nightly-20240109 vs nightly-20240109 (disable embedded tracing)

BugenZhao on Jan 10, 2024

In our daily performance test notification, we only consider the executions with < -10% perf fluctuations comparing to the average key metrics of the last 10 days as negative executions. I just paste the negative ones in this issue. If you want to go through the overall daily perf brief. Please check #perf-notify channel for msg like this https://risingwave-labs.slack.com/archives/C04R6R5236C/p1704844802151659. You will get a brief table as below.

+-----------------------------------------------------------------+--------------+------------+---------------------+-----------------------------+
| BENCHMARK NAME                                                  | EXECUTION ID | STATUS     | FLUCTUATION OF BEST | FLUCTUATION OF LAST 10 DAYS |
+-----------------------------------------------------------------+--------------+------------+---------------------+-----------------------------+
| nexmark-q4-blackhole-medium-1cn-affinity                        |        18144 | Negative   | -26.91%             | -12.18%                     |
| nexmark-q20-blackhole-medium-1cn-affinity                       |        18156 | Negative   | -19.34%             | -11.74%                     |
| nexmark-q9-blackhole-medium-1cn-affinity                        |        18161 | Negative   | -30.70%             | -15.74%                     |
| sysbench-oltp-point-select-medium-3cn                           |        18165 | Negative   | -39.56%             | -21.76%                     |
| sysbench-select-random-points-medium-3cn                        |        18166 | Negative   | -32.88%             | -14.88%                     |
| sysbench-select-random-ranges-medium-3cn                        |        18168 | Negative   | -38.10%             | -18.27%                     |
| sysbench-oltp-insert-medium-3cn                                 |        18169 | Negative   | -33.20%             | -22.09%                     |
| sysbench-select-random-limits-medium-3cn                        |        18171 | Negative   | -58.32%             | -27.49%                     |
| nexmark-q0-blackhole-medium-1cn-affinity                        |        18139 | Reasonable | -23.55%             | -1.64%                      |
| nexmark-q15-blackhole-medium-1cn-affinity                       |        18140 | Reasonable | -8.33%              | -2.05%                      |
| nexmark-q0-medium-1cn-affinity                                  |        18141 | Reasonable | -10.31%             | -5.38%                      |
| nexmark-q3-blackhole-medium-1cn-affinity                        |        18142 | Reasonable | -17.37%             | -3.59%                      |
| nexmark-q16-blackhole-medium-1cn-affinity                       |        18143 | Reasonable | -25.81%             | -6.47%                      |
| nexmark-q5-blackhole-watermark-medium-1cn-affinity              |        18145 | Reasonable | -20.23%             | -6.85%                      |
| nexmark-q17-blackhole-medium-1cn-affinity                       |        18146 | Reasonable | -12.62%             | -1.59%                      |
| nexmark-q5-blackhole-medium-1cn-affinity                        |        18147 | Reasonable | -24.76%             | -9.84%                      |
| nexmark-q7-blackhole-watermark-medium-1cn-affinity              |        18148 | Reasonable | -30.26%             | -6.04%                      |
| nexmark-q18-blackhole-medium-1cn-affinity                       |        18149 | Reasonable | -12.10%             | -6.73%                      |
| nexmark-q5-rewrite-blackhole-medium-1cn-affinity                |        18150 | Reasonable | -20.95%             | -2.35%                      |
| nexmark-q8-blackhole-watermark-medium-1cn-affinity              |        18151 | Reasonable | -8.02%              | -1.49%                      |
| nexmark-q19-blackhole-medium-1cn-affinity                       |        18152 | Reasonable | -16.29%             | -8.54%                      |
| nexmark-q7-blackhole-medium-1cn-affinity                        |        18153 | Reasonable | -38.94%             | -6.33%                      |
| nexmark-q9-blackhole-watermark-medium-1cn-affinity              |        18154 | Reasonable | -14.38%             | -3.48%                      |
| nexmark-q7-rewrite-blackhole-medium-1cn-affinity                |        18155 | Reasonable | -17.70%             | -1.66%                      |
| nexmark-q18-blackhole-watermark-medium-1cn-affinity             |        18157 | Reasonable | -9.42%              | -4.15%                      |
| nexmark-q8-blackhole-medium-1cn-affinity                        |        18158 | Reasonable | -18.75%             | -6.29%                      |
| nexmark-q101-blackhole-medium-1cn-affinity                      |        18159 | Reasonable | -13.54%             | -5.92%                      |
| nexmark-q6-group-top1-blackhole-watermark-medium-1cn-affinity   |        18160 | Reasonable | -18.32%             | -5.72%                      |
| nexmark-q102-blackhole-medium-1cn-affinity                      |        18162 | Reasonable | -17.45%             | -5.55%                      |
| nexmark-q5-many-windows-blackhole-watermark-medium-1cn-affinity |        18163 | Reasonable | -18.42%             | -1.02%                      |
| nexmark-q12-blackhole-medium-1cn-affinity                       |        18164 | Reasonable | -10.66%             | -2.75%                      |
| nexmark-q103-blackhole-medium-1cn-affinity                      |        18167 | Reasonable | -11.94%             | -5.70%                      |
| nexmark-q13-blackhole-medium-1cn-affinity                       |        18170 | Reasonable | -20.21%             | -3.55%                      |
| sysbench-bulk-insert-medium-3cn                                 |        18172 | Reasonable | -7.17%              | -0.45%                      |
| nexmark-q104-blackhole-medium-1cn-affinity                      |        18173 | Reasonable | -13.03%             | -6.40%                      |
| nexmark-q3-no-condition-blackhole-medium-1cn-affinity           |        18174 | Reasonable | -16.80%             | -6.71%                      |
| nexmark-q105-blackhole-medium-1cn-affinity                      |        18175 | Reasonable | -16.72%             | -5.78%                      |
| nexmark-q5-many-windows-blackhole-medium-1cn-affinity           |        18176 | Reasonable | -48.81%             | -1.96%                      |
+-----------------------------------------------------------------+--------------+------------+---------------------+-----------------------------+

cyliu0 on Jan 10, 2024

Let’s try to run the benchmarks with #14220 reverted to confirm it first?

Does the benchmark pipeline accept env vars? Setting RW_DISABLE_EMBEDDED_TRACING would be good.

BugenZhao on Jan 10, 2024

it’s still interesting to find only 3 nexmark queries are affected.

This is also the most confusing fact to me.

Let’s try to run the benchmarks with https://github.com/risingwavelabs/risingwave/pull/14220 reverted to confirm it first?

xxchan on Jan 10, 2024