OpenSearch: [BUG] OpenSearch is exposed to ReDoS attack

Describe the bug By using a specific regExp query, I am able to cause 100% cpu for a long time.

See also discussion here: https://discuss.opendistrocommunity.dev/t/is-opendistro-opensearch-exposed-to-redos-attack/5898

To Reproduce

POST regexp-test1/_doc/test01
{
    "stringvalue" : "aaaaaaaaaaaaaaaaaaaaaaaaaasssssssssssssssssssssssssssssss"
}


GET regexp-test1/_search
{
  "query": {
    "regexp": {
      "stringvalue": {
        "value": "(.*a){2000}"
      }
    }
  }
}

GET /_nodes/stats/process

Expected behavior I would expect the internal Lucene regExp engine to limit the execution after short period. But unfortunately, it doesn’t.

Host/Environment (please complete the following information):

  • OS: Centos Linux 7.x
  • Version: latest OpenSearch built from source (2021/05/11)

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 3
  • Comments: 26 (9 by maintainers)

Most upvoted comments

Worst is, the problem comes from this getCommonSuffixBytesRef which should be a silly opto, its actually totally optional. I can confirm after 268s on my 2-core laptop it eventually hits TooComplexToDeterminizeException.

But I don’t know why it takes so long 😃 If I only let this optimization use up to 1000 states it terminates in 4 seconds instead.

To get the common suffix (which is important for infinite DFAs like this, e.g. “leading wildcard”), We probably shouldn’t pass through the original maxDeterminizedStates anyway IMO. Because we are then gonna do some evil shit like reverse the entire DFA and det() it again, then compute common prefix.

So in my opinion the fix to do would be in CompiledAutomaton, high level, we’d change this chunk of code: https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/automaton/CompiledAutomaton.java#L243-L247

into something like this:

try {
   // this is slow, and just an opto anyway, so don't burn cycles on it for some crazy worst-case.
   // if we don't set this common suffix, the query will just run a bit slower, that's all.
   int limit = Math.min(1000, maxDeterminizedStates);
   BytesRef suffix = Operations.getCommonSuffixBytesRef(binary, limit);
   ... (setting commonSuffixRef)
} catch (TooComplexTooDeterminizeException notWorthIt) {
  commonSuffixRef = null;
}

It at least works around the issue, unless @mikemccand has a better idea (he knows the det() better than me).

I will open a lucene issue, with the lucene test boiled down from this already-wonderful test case 😃

It may give some more visibility, as we have some automata nerds there to look at it.

I think independent of this issue, something similar to my proposed change is a good idea. maybe even with a smaller limit. This is just an optimization so we should never make things worse or terrible computing it, we can just give up.

But I would rather understand why it blows up with the current maxDeterminizedStates limit that we have and not just shove the problem under the rug.