OpenSearch: [BUG] Uppercase Regex in String queries search

Regex search on analyzed fields doesn’t work with capital letters now as if the analyzer saves them lower cased, or doesn’t use the standard analyzer for some reason.

To Reproduce Steps to reproduce the behavior:

  1. Create an index and add a doc containing uppercase letters to it (e.g message: this is a TLS handshake)
  2. Perform this search request: {"query": {"bool": {"must": [{"query_string": {"query": "message:/TLS/"}}]}}}

Expected behavior We should find the doc we added above since it’s an exact match. However, we got 0 results.

Plugins None

Host/Environment (please complete the following information): I used OpenSearch container with a single node (version 1.2.3), running on iOS 11.6.4

Additional context If we specify the standard analyzer to the search request, we get the expected results. {"query": {"bool": {"must": [{"query_string": {"query": "message:/TLS/", "analyzer":"standard"}}]}}}

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 24 (11 by maintainers)

Most upvoted comments

@reta @nknize we believe that reverting the change (aligning the behavior with the way it worked before the fix) should be the way to go in this case because:

  1. the initial fix was needed only for fields with the type of wildcard, which OpenSearch doesn’t support. There is no real reason for this code to exist in the OpenSearch.
  2. the fix changes make the regex search case sensitive in a very weird way since we don’t automatically apply type-specific analyzers at a search time for regex anymore, although we do it on indexing time by default.
  3. the fix introduces an inability to apply type-specific analyzers during the query string regex searches (even if set on an index or type mapping level), which reduces the opportunities to affect the query parsing and search flow in general.

please let me know what you think about it.

and we of course happy to contribute to it if that’s the way to go here.

@AmiStrn FYI

@reta I agree with @alexgnatyuk here. this seems like a really easy fix if the code introduced in Elasticsearch 7.9 is reverted since it is an x-pack feature that is using it. It is a bug, really, that has gone unnoticed for a while. users should not have to be explicit about this for the regex’s in their query since many times they will be using OpenSearch-dashboards and not have that level of knowledge that they need to also specify the analyzer type somehow (that would be really bad UX).

the discussion is about the proposed solution - what do you think? should we make this change as proposed?