OpenSearch: [BUG] Performance regression in `top_hits` aggregation

I believe I have isolated a regression in performance of the top_hits aggregation between ES 7.9 and ES 7.10.2 (which means the regression is present in OpenSearch 1.0 as well).

To Reproduce I added the following operation to the default nyc_taxis track in es rally:

{
  "name": "top_hits_agg",
  "warmup-iterations": 10,
  "iterations": 50,
  "clients": 2,
  "operation": {
    "operation-type": "search",
    "body": {
      "size": 0,
      "query": {
        "range": {
          "dropoff_datetime": {
            "from": "2015-01-01 00:00:00",
            "to": "2015-01-01 12:00:00"
          }
        }
      },
      "aggs": {
        "2": {
          "terms": {
            "field": "payment_type",
            "size": 1000
          },
          "aggregations": {
            "3": {
              "date_histogram": {
                "field": "dropoff_datetime",
                "fixed_interval": "5s"
              },
              "aggregations": {
                "1": {
                  "top_hits": {
                    "size": 1,
                    "sort": [
                      {
                        "dropoff_datetime": {
                          "order": "desc"
                        }
                      }
                    ]
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

After populating both a 7.9 and 7.10.2 cluster with the nyc_taxis dataset, I then ran a race with the following commands:

esrally race --pipeline=benchmark-only --distribution-version=7.9.0 --track=nyc_taxis --target-host=127.0.0.1:9790 --report-file=es_790_report_$(date +%s) --user-tag='version:7.9,race:top_hits_and_date' --include-tasks=top_hits_agg,date_histogram_agg

esrally race --pipeline=benchmark-only --distribution-version=7.10.2 --track=nyc_taxis --target-host=127.0.0.1:9710 --report-file=es_710_report_$(date +%s) --user-tag='version:7.10.2,race:top_hits_and_date' --include-tasks=top_hits_agg,date_histogram_agg

Expected behavior The expected behavior is that there will be very similar performance for the top_hits aggregation between the two versions of the software. The following data compares ES7.9 as baseline and ES7.10.2 as the contender. As expected, the performance of the date_histogram_agg operation is more-or-less at parity between the two. However, the top_hits aggregation performance is consistently 20-30% worse in 7.10.2 than 7.9:

Metric Task Baseline Contender Diff Unit
Cumulative indexing time of primary shards 125.509 110.375 -15.1333 min
Min cumulative indexing time across primary shard 125.509 110.375 -15.1333 min
Median cumulative indexing time across primary shard 125.509 110.375 -15.1333 min
Max cumulative indexing time across primary shard 125.509 110.375 -15.1333 min
Cumulative indexing throttle time of primary shards 0 0 0 min
Min cumulative indexing throttle time across primary shard 0 0 0 min
Median cumulative indexing throttle time across primary shard 0 0 0 min
Max cumulative indexing throttle time across primary shard 0 0 0 min
Cumulative merge time of primary shards 41.7306 47.5379 5.80733 min
Cumulative merge count of primary shards 93 89 -4
Min cumulative merge time across primary shard 41.7306 47.5379 5.80733 min
Median cumulative merge time across primary shard 41.7306 47.5379 5.80733 min
Max cumulative merge time across primary shard 41.7306 47.5379 5.80733 min
Cumulative merge throttle time of primary shards 4.12925 4.25008 0.12083 min
Min cumulative merge throttle time across primary shard 4.12925 4.25008 0.12083 min
Median cumulative merge throttle time across primary shard 4.12925 4.25008 0.12083 min
Max cumulative merge throttle time across primary shard 4.12925 4.25008 0.12083 min
Cumulative refresh time of primary shards 2.60575 2.36 -0.24575 min
Cumulative refresh count of primary shards 82 79 -3
Min cumulative refresh time across primary shard 2.60575 2.36 -0.24575 min
Median cumulative refresh time across primary shard 2.60575 2.36 -0.24575 min
Max cumulative refresh time across primary shard 2.60575 2.36 -0.24575 min
Cumulative flush time of primary shards 5.45558 5.48922 0.03363 min
Cumulative flush count of primary shards 35 34 -1
Min cumulative flush time across primary shard 5.45558 5.48922 0.03363 min
Median cumulative flush time across primary shard 5.45558 5.48922 0.03363 min
Max cumulative flush time across primary shard 5.45558 5.48922 0.03363 min
Total Young Gen GC time 0.258 1.24 0.982 s
Total Young Gen GC count 32 136 104
Total Old Gen GC time 0 0 0 s
Total Old Gen GC count 0 0 0
Store size 25.0414 24.3655 -0.67586 GB
Translog size 5.12227e-08 5.12227e-08 0 GB
Heap used for segments 0.193222 0.0988235 -0.0944 MB
Heap used for doc values 0.0354309 0.034874 -0.00056 MB
Heap used for terms 0.0408325 0.0370789 -0.00375 MB
Heap used for norms 0 0 0 MB
Heap used for points 0 0 0 MB
Heap used for stored fields 0.116959 0.0268707 -0.09009 MB
Segment count 32 30 -2
Min Throughput date_histogram_agg 1.50023 1.50023 -1e-05 ops/s
Median Throughput date_histogram_agg 1.50035 1.50035 -0 ops/s
Max Throughput date_histogram_agg 1.50054 1.50054 0 ops/s
50th percentile latency date_histogram_agg 548.563 549.301 0.73863 ms
90th percentile latency date_histogram_agg 557.053 559.417 2.36497 ms
99th percentile latency date_histogram_agg 571.816 569.581 -2.23542 ms
100th percentile latency date_histogram_agg 578.69 575.319 -3.37139 ms
50th percentile service time date_histogram_agg 547.429 548.28 0.85141 ms
90th percentile service time date_histogram_agg 555.608 558.266 2.65809 ms
99th percentile service time date_histogram_agg 570.955 568.385 -2.57038 ms
100th percentile service time date_histogram_agg 577.466 573.309 -4.1576 ms
error rate date_histogram_agg 0 0 0 %
Min Throughput top_hits_agg 0.720015 0.559013 -0.161 ops/s
Median Throughput top_hits_agg 0.747501 0.577278 -0.17022 ops/s
Max Throughput top_hits_agg 0.759315 0.582273 -0.17704 ops/s
50th percentile latency top_hits_agg 2569.6 3373.35 803.747 ms
90th percentile latency top_hits_agg 2609.98 3455.39 845.411 ms
99th percentile latency top_hits_agg 2622.45 3502.29 879.836 ms
100th percentile latency top_hits_agg 2694.13 3503.48 809.352 ms
50th percentile service time top_hits_agg 2569.6 3373.35 803.747 ms
90th percentile service time top_hits_agg 2609.98 3455.39 845.411 ms
99th percentile service time top_hits_agg 2622.45 3502.29 879.836 ms
100th percentile service time top_hits_agg 2694.13 3503.48 809.352 ms
error rate top_hits_agg 0 0 0 %

Plugins None

Host/Environment (please complete the following information):

  • OS: Ubuntu 20.04
  • Version ES 7.9 and ES 7.10.2
  • Host type: c6i.8xlarge

Additional context The elasticsearch distributions are just the stock versions downloaded as tar files, extracted and started with the following command:

ES_JAVA_OPTS="-Xms8g -Xmx8g" ./bin/elasticsearch

The only non-default setting in the yml configuration file is the port number. Both instances are running on the same machine, though there load is only driven one at a time (not concurrently). The service is bound to localhost and es rally is running on the same machine.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 21 (10 by maintainers)

Most upvoted comments

I believe this Lucene issue actually does fix the regression. I didn’t realize that the testing tool was setting BEST_COMPRESSION by default for the given workload, but when I run the test against in index that has been configured with index.codec=default then the regression nearly completely disappears.

Test script:

$ ./top-hits-test.sh
es7.9      "took" : 737,
os1.0      "took" : 2707,
os1.2      "took" : 961,
------
es7.9      "took" : 744,
os1.0      "took" : 2700,
os1.2      "took" : 933,
------
es7.9      "took" : 742,
os1.0      "took" : 2703,
os1.2      "took" : 915,
------

Note that BEST_COMPRESSION is in fact quite a bit slower on new versions as compared to ES 7.9, but that’s because the new Lucene codec has been changed to significantly improve compression at the cost of slower performance. The performance change is minimal when comparing the versions using the default codec (BEST_SPEED).

@andrross so the change came with https://github.com/opensearch-project/OpenSearch/commit/31c026f25ccbbc02bb643351338786297a2dff6a where the upgrade to Lucene 8.7.0 happened and basically switched to new compression.

I did experiment and rebuilt OpenSearch 2.0 SNASPHOT with Lucene86Codec, the latencies went back to Elasticsearch 7.9.x levels:

Elasticsearch 7.9.x

Metric Task Run #1 Run #2 Unit
Min Throughput top_hits_agg 0.36 0.36 ops/s
Mean Throughput top_hits_agg 0.38 0.38 ops/s
Median Throughput top_hits_agg 0.38 0.39 ops/s
Max Throughput top_hits_agg 0.38 0.39 ops/s
50th percentile latency top_hits_agg 69333.1 67921.7 ms
90th percentile latency top_hits_agg 107202 105145 ms
100th percentile latency top_hits_agg 116503 114215 ms
50th percentile service time top_hits_agg 2507.79 2469.64 ms
90th percentile service time top_hits_agg 2541.37 2520.94 ms
100th percentile service time top_hits_agg 2661.41 2541.38 ms
error rate top_hits_agg 0 0 %

Opensearch 2.0 (SNAPSHOT)

Metric Task Run #1 Run #2 Unit
Min Throughput top_hits_agg 0.27 0.28 ops/s
Mean Throughput top_hits_agg 0.29 0.29 ops/s
Median Throughput top_hits_agg 0.29 0.29 ops/s
Max Throughput top_hits_agg 0.29 0.3 ops/s
50th percentile latency top_hits_agg 97998.2 96045.2 ms
90th percentile latency top_hits_agg 150750 151238 ms
100th percentile latency top_hits_agg 164118 164771 ms
50th percentile service time top_hits_agg 3238.32 3251.69 ms
90th percentile service time top_hits_agg 3284.75 3449.24 ms
100th percentile service time top_hits_agg 3309.06 3542.48 ms
error rate top_hits_agg 0 0 %

Opensearch 2.0 (SNAPSHOT) with Lucene86Codec

Metric Task Run #1 Run #2 Unit
Min Throughput top_hits_agg 0.33 0.29 ops/s
Mean Throughput top_hits_agg 0.35 0.33 ops/s
Median Throughput top_hits_agg 0.35 0.34 ops/s
Max Throughput top_hits_agg 0.35 0.34 ops/s
50th percentile latency top_hits_agg 77534.8 79022.5 ms
90th percentile latency top_hits_agg 118762 120599 ms
100th percentile latency top_hits_agg 129576 131161 ms
50th percentile service time top_hits_agg 2679.91 2672.98 ms
90th percentile service time top_hits_agg 2759 2714.66 ms
100th percentile service time top_hits_agg 2869.71 2828.59 ms
error rate top_hits_agg 0 0 %

I actually don’t know what we should be doing in this case, the change seems to be related to the Lucene87Codec implementation.

@reta I’ve been benchmarking against OpenSearch 1.2, which is using Lucene 8.10.1 and it shows comparable performance to all the ES versions greater than 7.9.

Here’s the quick test I’m running against the 4 versions on my machine (all populated with the nyc_taxi dataset but otherwise idle). It consistently produces results like the following:

es7.9      "took" : 2396,
es7.10.2   "took" : 3013,
es7.15.2   "took" : 3068,
os1.2      "took" : 3007,
------
es7.9      "took" : 2393,
es7.10.2   "took" : 3012,
es7.15.2   "took" : 3049,
os1.2      "took" : 2999,
------
es7.9      "took" : 2386,
es7.10.2   "took" : 3009,
es7.15.2   "took" : 3055,
os1.2      "took" : 3009,
------

Also, I’ve tested this “top hits” query against ES7.11 and ES7.15 and both appear to exhibit the regressions relative to ES7.9.

I profiled ES7.9 versus OpenSearch 1.2. It looks like OS1.2 is possibly spending more time doing decompression.

ES7.9

OpenSearch 1.2

I do indeed get different results for the query in question here, so it looks like the referenced bug may be a factor.