risingwave: Performance degradation for nightly-20240109

Describe the bug

+-----------------------------------------------------------------+--------------+------------+---------------------+-----------------------------+
| BENCHMARK NAME                                                  | EXECUTION ID | STATUS     | FLUCTUATION OF BEST | FLUCTUATION OF LAST 10 DAYS |
+-----------------------------------------------------------------+--------------+------------+---------------------+-----------------------------+
| nexmark-q4-blackhole-medium-1cn-affinity                        |        18144 | Negative   | -26.91%             | -12.18%                     |
| nexmark-q20-blackhole-medium-1cn-affinity                       |        18156 | Negative   | -19.34%             | -11.74%                     |
| nexmark-q9-blackhole-medium-1cn-affinity                        |        18161 | Negative   | -30.70%             | -15.74%                     |
| sysbench-oltp-point-select-medium-3cn                           |        18165 | Negative   | -39.56%             | -21.76%                     |
| sysbench-select-random-points-medium-3cn                        |        18166 | Negative   | -32.88%             | -14.88%                     |
| sysbench-select-random-ranges-medium-3cn                        |        18168 | Negative   | -38.10%             | -18.27%                     |
| sysbench-oltp-insert-medium-3cn                                 |        18169 | Negative   | -33.20%             | -22.09%                     |
| sysbench-select-random-limits-medium-3cn                        |        18171 | Negative   | -58.32%             | -27.49%                     |

http://metabase.risingwave-cloud.xyz/dashboard/278-sysbench-rw-qps http://metabase.risingwave-cloud.xyz/dashboard/277-nexmark-blackhole-affinity-1cn-rw-avg-source-throughput?namespace=daily

Error message/log

No response

To Reproduce

No response

Expected behavior

No response

How did you deploy RisingWave?

No response

The version of RisingWave

nightly-20240109

Additional context

https://github.com/risingwavelabs/rw-commits-history?tab=readme-ov-file#nightly-20240109

About this issue

  • Original URL
  • State: closed
  • Created 6 months ago
  • Reactions: 2
  • Comments: 27 (27 by maintainers)

Most upvoted comments

I believe the degradation of sysbench has been resolved by #14528.

Can you bench the sysbench test without traces? I am not sure if the degradation of streaming nexmark and serving sysbench are caused by the same reason.

Since #14528 got merged, distributed tracing is now off by default. Let’s see the result of sysbench tonight.

Aren’t we already sure it’s caused by tracing (and earlier regression)? 👀 Which query’s regression are we talking about now?

Today‘s bench result looks well… let’s watch and wait for tomorrow’s result.

Aren’t we already sure it’s caused by tracing (and earlier regression)? 👀 Which query’s regression are we talking about now?

Currently, the only way to do it quickly is adding the env var to kube-bench https://github.com/risingwavelabs/kube-bench/blob/main/manifests/risingwave/risingwave.template.yaml#L151 in a branch. And then we can run the pipeline by specifying the KUBEBENCH_BRANCH when we trigger the job

I think it may be because risingwavelabs/kube-bench#349 enables fast compaction all of the testing pipelines instead of just the perf test one… let me correct it

edit: or do you want to keep the status quo?

So the degradation does not because we have closed the fast compaction. we still need to find which PR caused it…

Benchmark is done. 😕 It could possibly be related to the traces, but the impact does not seem significant.

nightly-20240107 vs nightly-20240109 vs nightly-20240109 (disable embedded tracing)

Q20’s perf even decreased in your test

just to be safe, maybe let fast compaction be tested for a longer period of time before enabling it by default?

edit: We do enable it in the daily longevity tests

edit: actually, it is enabled in all the tests right now

Let’s try to run the benchmarks with #14220 reverted to confirm it first?

Does the benchmark pipeline accept env vars? Setting RW_DISABLE_EMBEDDED_TRACING would be good.

Benchmark is done. 😕 It could possibly be related to the traces, but the impact does not seem significant.

nightly-20240107 vs nightly-20240109 vs nightly-20240109 (disable embedded tracing)

image image image

In our daily performance test notification, we only consider the executions with < -10% perf fluctuations comparing to the average key metrics of the last 10 days as negative executions. I just paste the negative ones in this issue. If you want to go through the overall daily perf brief. Please check #perf-notify channel for msg like this https://risingwave-labs.slack.com/archives/C04R6R5236C/p1704844802151659. You will get a brief table as below.

+-----------------------------------------------------------------+--------------+------------+---------------------+-----------------------------+
| BENCHMARK NAME                                                  | EXECUTION ID | STATUS     | FLUCTUATION OF BEST | FLUCTUATION OF LAST 10 DAYS |
+-----------------------------------------------------------------+--------------+------------+---------------------+-----------------------------+
| nexmark-q4-blackhole-medium-1cn-affinity                        |        18144 | Negative   | -26.91%             | -12.18%                     |
| nexmark-q20-blackhole-medium-1cn-affinity                       |        18156 | Negative   | -19.34%             | -11.74%                     |
| nexmark-q9-blackhole-medium-1cn-affinity                        |        18161 | Negative   | -30.70%             | -15.74%                     |
| sysbench-oltp-point-select-medium-3cn                           |        18165 | Negative   | -39.56%             | -21.76%                     |
| sysbench-select-random-points-medium-3cn                        |        18166 | Negative   | -32.88%             | -14.88%                     |
| sysbench-select-random-ranges-medium-3cn                        |        18168 | Negative   | -38.10%             | -18.27%                     |
| sysbench-oltp-insert-medium-3cn                                 |        18169 | Negative   | -33.20%             | -22.09%                     |
| sysbench-select-random-limits-medium-3cn                        |        18171 | Negative   | -58.32%             | -27.49%                     |
| nexmark-q0-blackhole-medium-1cn-affinity                        |        18139 | Reasonable | -23.55%             | -1.64%                      |
| nexmark-q15-blackhole-medium-1cn-affinity                       |        18140 | Reasonable | -8.33%              | -2.05%                      |
| nexmark-q0-medium-1cn-affinity                                  |        18141 | Reasonable | -10.31%             | -5.38%                      |
| nexmark-q3-blackhole-medium-1cn-affinity                        |        18142 | Reasonable | -17.37%             | -3.59%                      |
| nexmark-q16-blackhole-medium-1cn-affinity                       |        18143 | Reasonable | -25.81%             | -6.47%                      |
| nexmark-q5-blackhole-watermark-medium-1cn-affinity              |        18145 | Reasonable | -20.23%             | -6.85%                      |
| nexmark-q17-blackhole-medium-1cn-affinity                       |        18146 | Reasonable | -12.62%             | -1.59%                      |
| nexmark-q5-blackhole-medium-1cn-affinity                        |        18147 | Reasonable | -24.76%             | -9.84%                      |
| nexmark-q7-blackhole-watermark-medium-1cn-affinity              |        18148 | Reasonable | -30.26%             | -6.04%                      |
| nexmark-q18-blackhole-medium-1cn-affinity                       |        18149 | Reasonable | -12.10%             | -6.73%                      |
| nexmark-q5-rewrite-blackhole-medium-1cn-affinity                |        18150 | Reasonable | -20.95%             | -2.35%                      |
| nexmark-q8-blackhole-watermark-medium-1cn-affinity              |        18151 | Reasonable | -8.02%              | -1.49%                      |
| nexmark-q19-blackhole-medium-1cn-affinity                       |        18152 | Reasonable | -16.29%             | -8.54%                      |
| nexmark-q7-blackhole-medium-1cn-affinity                        |        18153 | Reasonable | -38.94%             | -6.33%                      |
| nexmark-q9-blackhole-watermark-medium-1cn-affinity              |        18154 | Reasonable | -14.38%             | -3.48%                      |
| nexmark-q7-rewrite-blackhole-medium-1cn-affinity                |        18155 | Reasonable | -17.70%             | -1.66%                      |
| nexmark-q18-blackhole-watermark-medium-1cn-affinity             |        18157 | Reasonable | -9.42%              | -4.15%                      |
| nexmark-q8-blackhole-medium-1cn-affinity                        |        18158 | Reasonable | -18.75%             | -6.29%                      |
| nexmark-q101-blackhole-medium-1cn-affinity                      |        18159 | Reasonable | -13.54%             | -5.92%                      |
| nexmark-q6-group-top1-blackhole-watermark-medium-1cn-affinity   |        18160 | Reasonable | -18.32%             | -5.72%                      |
| nexmark-q102-blackhole-medium-1cn-affinity                      |        18162 | Reasonable | -17.45%             | -5.55%                      |
| nexmark-q5-many-windows-blackhole-watermark-medium-1cn-affinity |        18163 | Reasonable | -18.42%             | -1.02%                      |
| nexmark-q12-blackhole-medium-1cn-affinity                       |        18164 | Reasonable | -10.66%             | -2.75%                      |
| nexmark-q103-blackhole-medium-1cn-affinity                      |        18167 | Reasonable | -11.94%             | -5.70%                      |
| nexmark-q13-blackhole-medium-1cn-affinity                       |        18170 | Reasonable | -20.21%             | -3.55%                      |
| sysbench-bulk-insert-medium-3cn                                 |        18172 | Reasonable | -7.17%              | -0.45%                      |
| nexmark-q104-blackhole-medium-1cn-affinity                      |        18173 | Reasonable | -13.03%             | -6.40%                      |
| nexmark-q3-no-condition-blackhole-medium-1cn-affinity           |        18174 | Reasonable | -16.80%             | -6.71%                      |
| nexmark-q105-blackhole-medium-1cn-affinity                      |        18175 | Reasonable | -16.72%             | -5.78%                      |
| nexmark-q5-many-windows-blackhole-medium-1cn-affinity           |        18176 | Reasonable | -48.81%             | -1.96%                      |
+-----------------------------------------------------------------+--------------+------------+---------------------+-----------------------------+

Let’s try to run the benchmarks with #14220 reverted to confirm it first?

Does the benchmark pipeline accept env vars? Setting RW_DISABLE_EMBEDDED_TRACING would be good.

it’s still interesting to find only 3 nexmark queries are affected.

This is also the most confusing fact to me.

Let’s try to run the benchmarks with https://github.com/risingwavelabs/risingwave/pull/14220 reverted to confirm it first?