dapr: Resiliency policies do not seem to take HTTP 429 errors into account
Resiliency Policies
/area runtime
/area operator
/area placement
/area docs
/area test-and-release
What version of Dapr?
1.1.x
Expected Behavior
Configured an app (service) to be rate limited to 1 call per second. Confirmed from logs that this policy is applied. App correctly returns HTTP 429 when there are too many requests in the given timeframe (more than 1 request per second).
Configure another client app that makes a service invocation to above service app with standard retry and resiliency policies applied. If the client app makes too many calls (more than 1 per second), it receives a HTTP 429 from the service. The service is retried as per retry policies configured and the call will eventually succeed.
Note both apps are written in Java.
Actual Behavior
Client app behaves as if all the service invocations were successful, but in fact many of them failed with the HTTP 429 error. None of the failed requests are retried.
It looks like the runtime is ignoring HTTP 429 errors.
Steps to Reproduce the Problem
- Setup the sample from the Java QuickStart https://docs.dapr.io/getting-started/quickstarts/resiliency/resiliency-serviceinvo-quickstart/
- Change line 38 of CheckoutServiceApplication.java in the checkout folder as follows to reduce the time between invocations to 100ms:
TimeUnit.MILLISECONDS.sleep(100);
- Add a ratelimit middleware component as follows in the components folder:
apiVersion: dapr.io/v1alpha1
kind: Component
metadata:
  name: ratelimit
spec:
  type: middleware.http.ratelimit
  version: v1
  metadata:
  - name: maxRequestsPerSecond
    value: 1
- Add the ratelimit to the pipeline by modifying the config.yaml as follows:
apiVersion: dapr.io/v1alpha1
kind: Configuration
metadata:
  name: config
spec:
  features:
    - name: Resiliency
      enabled: true
  httpPipeline:
    handlers:
      - name: ratelimit
        type: middleware.http.ratelimit
        
- Rebuild the apps and run as per the QuickStart docs. The checkout app just runs through the 20 invocations as if all of them succeeded, but in fact only 3 of the 20 calls succeeded.
Release Note
RELEASE NOTE:
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 15 (10 by maintainers)
Thanks @artursouza for the deep dive investigation here and finding that the situation described by @sujitdmello shouldn’t have worked in 1.9.
To recap:
appHttpPipelineand NOThttpPipelineThanks for reporting this @sujitdmello
We discussed this during the call this morning: