OpenSearch: [BUG] 429 too many requests

Describe the bug we’re using the aws hosted opensearch service, for about 10 days from now we started to get 429 too many requests response from the elasticsearch api (fwict only from search endpoint) - it happened even we haven’t seen any increase in the number of requests - we have been working since then to reduce the number of requests - the search request queued is steady at 0 with eventual peaks around 10 or 20.

Expected behavior why did we start to get this 429 that looks like an api rate limit, when the request count didn’t increase from our usual workload and all the critical metrics are green (as before) ?

Plugins none

Host/Environment (please complete the following information): ecs / fargate / elasticsearch hosted by aws / graviton powered containers last supported elasticsearch version and requesting with the last compatible elasticsearch node.js client

Additional context

{"name":"ResponseError","meta":{"body":"429 Too Many Requests /****/_search","statusCode":429,"headers":{"date":"Thu, 18 Nov 2021 18:35:30 GMT","content-type":"text/plain;charset=ISO-8859-1","content-length":"54","connection":"keep-alive","server":"Jetty(8.1.12.v20130726)"},"meta":{"context":null,"request":{"params":{"method":"POST","path":"/***/_search","body":{"type":"Buffer","data":[***]},"querystring":"size=100&from=0&_source=id about 10 fields","headers":{"user-agent":"elasticsearch-js/7.10.0 (linux 4.14.248-189.473.amzn2.x86_64-x64; Node.js v16.13.0)","accept-encoding":"gzip,deflate","content-type":"application/json","content-encoding":"gzip","content-length":"294"},"timeout":30000},"options":{},"id":5379},"name":"elasticsearch-js","connection":{"url":"https://***/","id":"https://***/","headers":{},"deadCount":0,"resurrectTimeout":0,"_openRequests":0,"status":"alive","roles":{"master":true,"data":true,"ingest":true,"ml":false}},"attempts":0,"aborted":false}}}

<img width=“746” alt=“Screen Shot 2564-11-19 at 01 43 33” src=“https://user- Screen Shot 2564-11-19 at 01 43 22 images.githubusercontent.com/22284209/142477316-e24a8a44-1e6f-4a08-95f0-65abc4c4a3e1.png”> Screen Shot 2564-11-19 at 01 43 27

Screen Shot 2564-11-19 at 01 43 40

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 2
  • Comments: 19 (5 by maintainers)

Most upvoted comments

We had a call with AWS support today. The solution offered was for us to raise a support request to increase the JVM utilization threshold from 85% to 95% after we create a cluster using Graviton instance types. We’re not going to make use of this because we’re operating fine on m5 instance types now and have a fully automated infrastructure as code deployment process.

I sent a message to our account manager requesting a feature to improve OpenSearch support on newer instance types.

Looking into this.

We’ve only just raised a support case with Amazon. No resolution yet.

@amitmun switching to non-Graviton instance type resolved it. We have autotune enabled and use the managed opensearch service so we do not have ability to set any GC settings

@dblock we reached out to our AWS support - they mentioned The GC behavior is different for the domain once G1GC got enabled with the Graviton instance type. AWS support did not recommend switching to non graviton but based on the GC statement from them we tried it and it resolved our 429 issue. We never fully rolled out customers to the graviton cluster as it hit the issue with half the normal load. Data node memory pressure with graviton went above the 429 threshold of 85%: image Non graviton with more load does not have the memory pressure issue: image

something changed because of opensearch, with the same metrics and instances, we never had that issue before