OpenSearch: [BUG] Search requests with size=0 & terminate_after sometimes return incorrectly hit count

Describe the bug When both size=0 and terminate_after parameters are provided to a search request, the incorrect total hit count is sometimes returned. This was discovered as a part of the investigation for a flakey test, see https://github.com/opensearch-project/OpenSearch/issues/9946#issuecomment-1749266330

To Reproduce Set size=0 in the search request in the below test.

./gradlew ':server:internalClusterTest' --tests "org.opensearch.search.simple.SimpleSearchIT" -Dtests.method="testSimpleTerminateAfterCount {p0={"search.concurrent_segment_search.enabled":"true"}}" -Dtests.seed=C37FD3C5A66D02BC -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=he-IL -Dtests.timezone=Asia/Qostanay -Druntime.java=17

Expected behavior The above test should always pass

About this issue

  • Original URL
  • State: open
  • Created 9 months ago
  • Comments: 15 (8 by maintainers)

Most upvoted comments

@austintlee Thanks for looking into this!

But it is not a bug. I would say it’s just incompatible with what you might expect when you set terminate_after, but I think it makes use of the intended optimization without actually violating the “contract” of the terminate_after flag (since it does terminate early and terminatedEarly is set to true in the response as well).

I think there’s some more discussion needed on what the correct “contract” for terminate_after actually looks like. From the docs it’s defined as “The maximum number of documents OpenSearch should process before terminating the request” which to me sounds like if I pass a specific terminate_after value in a search request, I would expect the total hit count have a upper bound of the terminate_after value if the terminated_early is true, which is not necessarily true in the edge case you’ve identified here.

This is something that we’ve struggled with for concurrent search as well – it’s not straightforward to have concurrent search early terminate at the exact terminate_after count either. See https://github.com/opensearch-project/OpenSearch/issues/8371

Curious to hear your thoughts on this too @sohami

I have narrowed down the cause of this “issue” to a change introduced in this PR - #4043. (cc @reta)

Specifically, this change: https://github.com/opensearch-project/OpenSearch/blob/9b7a9d0026aa379537804fd24b95619abe88e1c0/server/src/main/java/org/opensearch/search/internal/ContextIndexSearcher.java#L306-L308

That PR brought in a Lucene feature (https://issues.apache.org/jira/browse/LUCENE-10620) that allows you to take advantage of Weights that can speed up Collectors’ work during search.

Now, what is really interesting about this change is that when combined with these three (I think ALL three) search conditions - 1) size = 0, 2) terminate_after is set and 3) tracK_total_hits set to true, it hits the following code path where weight.count() returns the total count for a Lucene segment (it’s an optimization).

https://github.com/apache/lucene/blob/bbf197fdc2822e9f1ba1767f83ae23037336ed27/lucene/core/src/java/org/apache/lucene/search/TotalHitCountCollector.java#L47-L52

public LeafCollector getLeafCollector(LeafReaderContext context) throws IOException {
    int leafCount = weight == null ? -1 : weight.count(context);
    if (leafCount != -1) {
      totalHits += leafCount;
      throw new CollectionTerminatedException();
    }

And, of course, that’s exactly the setup for this test. But it is not a bug. I would say it’s just incompatible with what you might expect when you set terminate_after, but I think it makes use of the intended optimization without actually violating the “contract” of the terminate_after flag (since it does terminate early and terminatedEarly is set to true in the response as well).

As to the non-deterministic manifestation of this issue, there are two things that pertain to this feature that seem to contribute to non-determinism. One is that the test uses indexRandom to randomly distribute documents and this (by design) leads to a non-deterministic layout of Lucene segments each time. The other is that the Weight objects are stored in an LRUCache. Also, the test makes a range query and this also seems to uniquely contribute to how the Weights are used depending on the layout and sequence of documents IDs within a segment.