OpenSearch: OpenSearch 1.2.0 Performance regression of 10% compared to 1.1.0

We are seeing 10%+ regression across the board compared to the OpenSearch 1.1.0 release.

Area Movement Additional Reference
Indexing Requests Per Second Down 10%
Mean Index Latency Up 11% p50 9%, p90 11%, p99 37%, p100 46%
Mean Query Latency Up 116% p50 116%, p90-108%, p99 106%, p100 118%

Thanks to @mch2 for these numbers

Performance test data is available on https://github.com/opensearch-project/opensearch-build/issues/963, please review and create any issues if follow up is needed.

Builds under test:

"min x64 url": "https://ci.opensearch.org/ci/dbc/bundle-build/1.2.0/982/linux/x64/builds/dist/opensearch-min-1.2.0-linux-x64.tar.gz",
"full distro x64 url": "https://ci.opensearch.org/ci/dbc/bundle-build/1.2.0/982/linux/x64/dist/opensearch-1.2.0-linux-x64.tar.gz",

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 38 (32 by maintainers)

Commits related to this issue

Most upvoted comments

Excellent news @peternied!!!

So to summarize, the initial reported 20% indexing degradation was the result of a difference in the benchmarking configuration between the 1.1 and 1.2 build? The initial issue above indicates the 1.2.0 distro was built from CI. Were we comparing that distro build against a 1.1.0 built from a local branch checkout? Thereby leading to different jvm and machine configs?

Many thanks to all that put a huge effort into verifying this performance! @peternied @mch2 @getsaurabh02 @Bukhtawar Looking forward to getting ongoing benchmarking stood up to catch any potential issues early!

OpenSearch 1.2 Indexing Backpressure Removed vs Normal Distribution

Looking over these results, they are within in the 5% error percentage, no 20% outliers. This aligns with the earlier findings indicating that Indexing Backpressure presence does not have a [edit] significant [/edit] impact on the overall performance. 🍾 @getsaurabh02 @nknize

Indexing Latency milliseconds delta

P50 P90 P99
x64-disable 8.5 14.3 -5.1
x64-enabled 4.5 23.1 17.7
arm64-enable 20.7 29.1 45
arm64-disable -3.3 -13 72.8

Indexing Latency milliseconds delta %

P50 P90 P99
x64-disable 2% 2% 0%
x64-enabled 1% 3% 1%
arm64-enable 4% 4% 3%
arm64-disable -1% -2% 6%

Source Data table (Excel Online)

We (myself and @mch2) ran the Rally Tests again today, to compare the 1.2 changes with and without the shard indexing pressure commit. This was done using the same branch and just isolating one ShardIndexingPressure commit.

Since the previous tests were done against different distributions, it was not equivalent setup. Here numbers are pretty much similar, and definitely not alarming as called out 20% in the earlier comment. Worst case increase is reported as around 2% for p100, however since at the same time p99 is same, it mitigates any risk.

Indexing Operation Latencies (ms) Comparison

Metric Full 1.2 With Revert Commit in 1.2 Difference (%)
P50 2,587.4 2,569.2 -0.70839
P90 3,773.4 3,686 -2.37113
P99 5,083.5 5,077 -0.12803
P100 8,736.7 8,564 -2.01658

Throughput (req/s) Comparison

Metric Full 1.2 With Revert Commit in 1.2 Difference (%)
P0 14,516.1 14,644.4 0.8761
P50 15,647.6 15,893.2 1.54531
P100 18,931.2 19,152.6 1.15598

The server side metrics for these tests such as CPU, Memory, Threadpool utilization, Garbage collections are all identical in trend and similar in numbers. Some of the differences in the tests can be attributed to client side metric collections. Please let me know if there are any questions related to these runs.

On the other hand, for one possible optimization found (as covered in the above comment #1589), I have already raised a PR. It will reduce one CPU computation in the indexing flow further. Although it doesn’t look like a blocker, but we can get that change in as well if needed.

I compared 1.1 and 1.2 performance runs and on queries being slow on 1.2: performance test created nyc-taxis index with 5 shards on 1.1 and with 1 shard on 1.2 test runs respectively. This index has ~165millon docs. So on 1.1 with 5 shards (4.5 GB each) the query ran faster than 1.2 with single (23GB) shard. We need to rerun the 1.2 performance test with 5 shards and compare again.

This issue was ultimately determined to be a configuration mismatch between the 1.1.0 and 1.2.0 performance testing setups. Closing this as we have shipped 1.2.0

@dblock What I understood in the previous setup from @mch2 which reported 20% degradation, one test setup was run by checking out the local branch and building the distribution locally after reverting the commit, while other was downloaded from the archived directly.

So this time we used the same setup branch, which was checked out locally for both the tests, while one had the commit reverted.

In order to have a basis of comparison for 1.1 and rule out test configuration differences, I’ve triggered 4 new tests against the OpenSearch 1.1, the should be complete in ~8 hours

  • efe7ef60-4a6c-498e-a50e-78da62adaf0a OpenSearch-1-1-0–RC-x64-disable
  • 50243fa9-f4e6-4886-9cd8-5201e8b3852d OpenSearch-1-1-0–RC-x64-enable
  • d6bc9610-2c1e-474e-80e4-44f3820a461b OpenSearch-1-1-0–RC-arm64-disable
  • 63f7f5e8-3e12-45cf-86d5-09100dbcf6ef OpenSearch-1-1-0–RC-arm64-enable

Working on getting rally set up to run min comparison locally will report back