OpenSearch: [BUG] Performance regression in `top_hits` aggregation

I believe I have isolated a regression in performance of the top_hits aggregation between ES 7.9 and ES 7.10.2 (which means the regression is present in OpenSearch 1.0 as well).

To Reproduce I added the following operation to the default nyc_taxis track in es rally:

{
  "name": "top_hits_agg",
  "warmup-iterations": 10,
  "iterations": 50,
  "clients": 2,
  "operation": {
    "operation-type": "search",
    "body": {
      "size": 0,
      "query": {
        "range": {
          "dropoff_datetime": {
            "from": "2015-01-01 00:00:00",
            "to": "2015-01-01 12:00:00"
          }
        }
      },
      "aggs": {
        "2": {
          "terms": {
            "field": "payment_type",
            "size": 1000
          },
          "aggregations": {
            "3": {
              "date_histogram": {
                "field": "dropoff_datetime",
                "fixed_interval": "5s"
              },
              "aggregations": {
                "1": {
                  "top_hits": {
                    "size": 1,
                    "sort": [
                      {
                        "dropoff_datetime": {
                          "order": "desc"
                        }
                      }
                    ]
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

After populating both a 7.9 and 7.10.2 cluster with the nyc_taxis dataset, I then ran a race with the following commands:

esrally race --pipeline=benchmark-only --distribution-version=7.9.0 --track=nyc_taxis --target-host=127.0.0.1:9790 --report-file=es_790_report_$(date +%s) --user-tag='version:7.9,race:top_hits_and_date' --include-tasks=top_hits_agg,date_histogram_agg

esrally race --pipeline=benchmark-only --distribution-version=7.10.2 --track=nyc_taxis --target-host=127.0.0.1:9710 --report-file=es_710_report_$(date +%s) --user-tag='version:7.10.2,race:top_hits_and_date' --include-tasks=top_hits_agg,date_histogram_agg

Expected behavior The expected behavior is that there will be very similar performance for the top_hits aggregation between the two versions of the software. The following data compares ES7.9 as baseline and ES7.10.2 as the contender. As expected, the performance of the date_histogram_agg operation is more-or-less at parity between the two. However, the top_hits aggregation performance is consistently 20-30% worse in 7.10.2 than 7.9:

Metric	Task	Baseline	Contender	Diff	Unit
Cumulative indexing time of primary shards		125.509	110.375	-15.1333	min
Min cumulative indexing time across primary shard		125.509	110.375	-15.1333	min
Median cumulative indexing time across primary shard		125.509	110.375	-15.1333	min
Max cumulative indexing time across primary shard		125.509	110.375	-15.1333	min
Cumulative indexing throttle time of primary shards		0	0	0	min
Min cumulative indexing throttle time across primary shard		0	0	0	min
Median cumulative indexing throttle time across primary shard		0	0	0	min
Max cumulative indexing throttle time across primary shard		0	0	0	min
Cumulative merge time of primary shards		41.7306	47.5379	5.80733	min
Cumulative merge count of primary shards		93	89	-4
Min cumulative merge time across primary shard		41.7306	47.5379	5.80733	min
Median cumulative merge time across primary shard		41.7306	47.5379	5.80733	min
Max cumulative merge time across primary shard		41.7306	47.5379	5.80733	min
Cumulative merge throttle time of primary shards		4.12925	4.25008	0.12083	min
Min cumulative merge throttle time across primary shard		4.12925	4.25008	0.12083	min
Median cumulative merge throttle time across primary shard		4.12925	4.25008	0.12083	min
Max cumulative merge throttle time across primary shard		4.12925	4.25008	0.12083	min
Cumulative refresh time of primary shards		2.60575	2.36	-0.24575	min
Cumulative refresh count of primary shards		82	79	-3
Min cumulative refresh time across primary shard		2.60575	2.36	-0.24575	min
Median cumulative refresh time across primary shard		2.60575	2.36	-0.24575	min
Max cumulative refresh time across primary shard		2.60575	2.36	-0.24575	min
Cumulative flush time of primary shards		5.45558	5.48922	0.03363	min
Cumulative flush count of primary shards		35	34	-1
Min cumulative flush time across primary shard		5.45558	5.48922	0.03363	min
Median cumulative flush time across primary shard		5.45558	5.48922	0.03363	min
Max cumulative flush time across primary shard		5.45558	5.48922	0.03363	min
Total Young Gen GC time		0.258	1.24	0.982	s
Total Young Gen GC count		32	136	104
Total Old Gen GC time		0	0	0	s
Total Old Gen GC count		0	0	0
Store size		25.0414	24.3655	-0.67586	GB
Translog size		5.12227e-08	5.12227e-08	0	GB
Heap used for segments		0.193222	0.0988235	-0.0944	MB
Heap used for doc values		0.0354309	0.034874	-0.00056	MB
Heap used for terms		0.0408325	0.0370789	-0.00375	MB
Heap used for norms		0	0	0	MB
Heap used for points		0	0	0	MB
Heap used for stored fields		0.116959	0.0268707	-0.09009	MB
Segment count		32	30	-2
Min Throughput	date_histogram_agg	1.50023	1.50023	-1e-05	ops/s
Median Throughput	date_histogram_agg	1.50035	1.50035	-0	ops/s
Max Throughput	date_histogram_agg	1.50054	1.50054	0	ops/s
50th percentile latency	date_histogram_agg	548.563	549.301	0.73863	ms
90th percentile latency	date_histogram_agg	557.053	559.417	2.36497	ms
99th percentile latency	date_histogram_agg	571.816	569.581	-2.23542	ms
100th percentile latency	date_histogram_agg	578.69	575.319	-3.37139	ms
50th percentile service time	date_histogram_agg	547.429	548.28	0.85141	ms
90th percentile service time	date_histogram_agg	555.608	558.266	2.65809	ms
99th percentile service time	date_histogram_agg	570.955	568.385	-2.57038	ms
100th percentile service time	date_histogram_agg	577.466	573.309	-4.1576	ms
error rate	date_histogram_agg	0	0	0	%
Min Throughput	top_hits_agg	0.720015	0.559013	-0.161	ops/s
Median Throughput	top_hits_agg	0.747501	0.577278	-0.17022	ops/s
Max Throughput	top_hits_agg	0.759315	0.582273	-0.17704	ops/s
50th percentile latency	top_hits_agg	2569.6	3373.35	803.747	ms
90th percentile latency	top_hits_agg	2609.98	3455.39	845.411	ms
99th percentile latency	top_hits_agg	2622.45	3502.29	879.836	ms
100th percentile latency	top_hits_agg	2694.13	3503.48	809.352	ms
50th percentile service time	top_hits_agg	2569.6	3373.35	803.747	ms
90th percentile service time	top_hits_agg	2609.98	3455.39	845.411	ms
99th percentile service time	top_hits_agg	2622.45	3502.29	879.836	ms
100th percentile service time	top_hits_agg	2694.13	3503.48	809.352	ms
error rate	top_hits_agg	0	0	0	%

Plugins None

Host/Environment (please complete the following information):

OS: Ubuntu 20.04
Version ES 7.9 and ES 7.10.2
Host type: c6i.8xlarge

Additional context The elasticsearch distributions are just the stock versions downloaded as tar files, extracted and started with the following command:

ES_JAVA_OPTS="-Xms8g -Xmx8g" ./bin/elasticsearch

The only non-default setting in the yml configuration file is the port number. Both instances are running on the same machine, though there load is only driven one at a time (not concurrently). The service is bound to localhost and es rally is running on the same machine.

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 21 (10 by maintainers)

Most upvoted comments

I believe this Lucene issue actually does fix the regression. I didn’t realize that the testing tool was setting BEST_COMPRESSION by default for the given workload, but when I run the test against in index that has been configured with index.codec=default then the regression nearly completely disappears.

Test script:

$ ./top-hits-test.sh
es7.9      "took" : 737,
os1.0      "took" : 2707,
os1.2      "took" : 961,
------
es7.9      "took" : 744,
os1.0      "took" : 2700,
os1.2      "took" : 933,
------
es7.9      "took" : 742,
os1.0      "took" : 2703,
os1.2      "took" : 915,
------

Note that BEST_COMPRESSION is in fact quite a bit slower on new versions as compared to ES 7.9, but that’s because the new Lucene codec has been changed to significantly improve compression at the cost of slower performance. The performance change is minimal when comparing the versions using the default codec (BEST_SPEED).

andrross on Dec 16, 2021

@andrross so the change came with https://github.com/opensearch-project/OpenSearch/commit/31c026f25ccbbc02bb643351338786297a2dff6a where the upgrade to Lucene 8.7.0 happened and basically switched to new compression.

I did experiment and rebuilt OpenSearch 2.0 SNASPHOT with Lucene86Codec, the latencies went back to Elasticsearch 7.9.x levels:

Elasticsearch 7.9.x

Metric	Task	Run #1	Run #2	Unit
Min Throughput	top_hits_agg	0.36	0.36	ops/s
Mean Throughput	top_hits_agg	0.38	0.38	ops/s
Median Throughput	top_hits_agg	0.38	0.39	ops/s
Max Throughput	top_hits_agg	0.38	0.39	ops/s
50th percentile latency	top_hits_agg	69333.1	67921.7	ms
90th percentile latency	top_hits_agg	107202	105145	ms
100th percentile latency	top_hits_agg	116503	114215	ms
50th percentile service time	top_hits_agg	2507.79	2469.64	ms
90th percentile service time	top_hits_agg	2541.37	2520.94	ms
100th percentile service time	top_hits_agg	2661.41	2541.38	ms
error rate	top_hits_agg	0	0	%

Opensearch 2.0 (SNAPSHOT)

Metric	Task	Run #1	Run #2	Unit
Min Throughput	top_hits_agg	0.27	0.28	ops/s
Mean Throughput	top_hits_agg	0.29	0.29	ops/s
Median Throughput	top_hits_agg	0.29	0.29	ops/s
Max Throughput	top_hits_agg	0.29	0.3	ops/s
50th percentile latency	top_hits_agg	97998.2	96045.2	ms
90th percentile latency	top_hits_agg	150750	151238	ms
100th percentile latency	top_hits_agg	164118	164771	ms
50th percentile service time	top_hits_agg	3238.32	3251.69	ms
90th percentile service time	top_hits_agg	3284.75	3449.24	ms
100th percentile service time	top_hits_agg	3309.06	3542.48	ms
error rate	top_hits_agg	0	0	%

Opensearch 2.0 (SNAPSHOT) with Lucene86Codec

Metric	Task	Run #1	Run #2	Unit
Min Throughput	top_hits_agg	0.33	0.29	ops/s
Mean Throughput	top_hits_agg	0.35	0.33	ops/s
Median Throughput	top_hits_agg	0.35	0.34	ops/s
Max Throughput	top_hits_agg	0.35	0.34	ops/s
50th percentile latency	top_hits_agg	77534.8	79022.5	ms
90th percentile latency	top_hits_agg	118762	120599	ms
100th percentile latency	top_hits_agg	129576	131161	ms
50th percentile service time	top_hits_agg	2679.91	2672.98	ms
90th percentile service time	top_hits_agg	2759	2714.66	ms
100th percentile service time	top_hits_agg	2869.71	2828.59	ms
error rate	top_hits_agg	0	0	%

I actually don’t know what we should be doing in this case, the change seems to be related to the Lucene87Codec implementation.

reta on Dec 7, 2021

@reta I’ve been benchmarking against OpenSearch 1.2, which is using Lucene 8.10.1 and it shows comparable performance to all the ES versions greater than 7.9.

Here’s the quick test I’m running against the 4 versions on my machine (all populated with the nyc_taxi dataset but otherwise idle). It consistently produces results like the following:

es7.9      "took" : 2396,
es7.10.2   "took" : 3013,
es7.15.2   "took" : 3068,
os1.2      "took" : 3007,
------
es7.9      "took" : 2393,
es7.10.2   "took" : 3012,
es7.15.2   "took" : 3049,
os1.2      "took" : 2999,
------
es7.9      "took" : 2386,
es7.10.2   "took" : 3009,
es7.15.2   "took" : 3055,
os1.2      "took" : 3009,
------

andrross on Dec 6, 2021

Also, I’ve tested this “top hits” query against ES7.11 and ES7.15 and both appear to exhibit the regressions relative to ES7.9.

andrross on Dec 6, 2021

I profiled ES7.9 versus OpenSearch 1.2. It looks like OS1.2 is possibly spending more time doing decompression.

ES7.9

OpenSearch 1.2

andrross on Dec 6, 2021

I do indeed get different results for the query in question here, so it looks like the referenced bug may be a factor.

andrross on Dec 3, 2021