OpenSearch: Performance degradation in OpenSearch 1.1.0 due to Lucene 8.9.0

Is your feature request related to a problem? Please describe. We are planning to upgrade from Elasticsearch 7.7.1 to OpenSearch 1.2.4 release. We have compared the performance of OpenSearch 1.2.4 with Elasticsearch 7.7.1. For cardinality queries (with keyword fields), the performance is degraded by 50%. So we couldn’t upgrade to OpenSearch.

The performance degradation is observed from in OpenSearch 1.1.0 release onwards.

Below is the code snippet which is running slow in OpenSearch 1.1.0.

    // with opensearch 1.0.1: 240 requests/second
    // with opensearch 1.1.0:  97 requests/second
    public static SearchRequestBuilder getSearchRequest1(TransportClient client, String index, String randomValue) {
        QueryBuilder qb = QueryBuilders.boolQuery().mustNot(QueryBuilders.termQuery("__id.keyword", randomValue));
        CardinalityAggregationBuilder agg = AggregationBuilders
                .cardinality("somename")
                .field("__id.keyword");
        return client.prepareSearch(index).setQuery(qb).addAggregation(agg);
    }

This degradation is caused due to lucene upgrade from 8.8.2 to 8.9.0. The commit is https://github.com/opensearch-project/OpenSearch/commit/e153629871f9eaa1ba6c0e4bc9143d27ec4ef96c

Lucene developer said this is caused due to an enhancement in Lucene and it has to be fixed in OpenSearch.

Lucene ticket that I have filed: https://issues.apache.org/jira/browse/LUCENE-10509

Describe the solution you’d like

As per Lucene developer (Adrien Grand):

The cardinality aggregation performs value lookups on each document. OpenSearch should change the way cardinality aggregations run to collect matching ordinals into a bitset first, and only look up values once the entire segment has been collected. This should address the performance problem and will likely make the cardinality aggregation faster than it was before Lucene 8.9.

Describe alternatives you’ve considered None

Additional context

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 1
  • Comments: 17 (11 by maintainers)

Most upvoted comments

We were able to achieve performance improvement with the use of murmur plugin to precompute hash value of a field in an index.

Scenario: We are migrating from ElasticSearch v7.10 to OpenSearch v1.2. Post migration, we observed that a search query with Cardinality Aggregation was having performance degradation of 186%. Sample query is given below.

GET sample-index-20220721/_search?timeout=10m { "aggregations": { "user_q": { "aggregations": { "account_name_total": { "cardinality": { "field": "account_name" } } }, "terms": { "field": "user", "size": 10000 } } }, "query": { "query_string": { "query": "timestamp:[1657828498000||/m TO 1657914898000||/m] AND storage_id:302391 AND type:1", "allow_leading_wildcard": false } }, "size": 0, "track_total_hits": true }

Steps Performed for Triage with Observations: We executed the long running query in asynchronous mode with “Profile API” option and observed the following points

  1. Identified that “account_name” field is a type “keyword” and has high cardinality
  2. The Aggregations profile API showed that the “CardinalityAggregator” type was taking more time than expected across multiple shards, within the index where the data was located
  3. Under the debug section of the aggregation, we were able to see that no global ordinals were used (see ordinals_collections_used=0 and ordinals_collectors_overhead_too_high=1). "debug" : { "ordinals_collectors_used" : 0, "ordinals_collectors_overhead_too_high" : 1, "string_hashing_collectors_used" : 1, "numeric_collectors_used" : 0, "empty_collectors_used" : 0 }
  4. As per the recommendation from this article, if a string field has high cardinality, it is recommended to store the hash value of the field in the index. This can be done using the murmur plugin
  5. We also observed that, the query execution takes times only on indices which has data matching the query

Performance Tuning changes: Based on the observation, we performed the following steps

  1. Created a new index “sample-index-20220721-updated” with the following two changes a. Used murmur plugin to store hash value of the field in the index for “account_name” field. Follow the steps in this link b. Enabled “eager_global_ordinals”: true for “user” field. We did this as we are using user field under “terms” within the query
  2. Re-indexed the source index “sample-index-20220721” to “sample-index-20220721-updated” for the changes to take effect.

Once the above changes were implemented, we were able to see good decrease in the execution time of the query. Though the response time didn’t fall back to the pre Lucene 8.9.0 levels with murmur plugin, but this solution gives good response times in OpenSearch v1.2

Updated Debug section for the Index in the Profile API output:

"debug" : { "ordinals_collectors_used" : 0, "ordinals_collectors_overhead_too_high" : 0, "string_hashing_collectors_used" : 0, "numeric_collectors_used" : 34, "empty_collectors_used" : 0 }