dapr: Resiliency policies do not seem to take HTTP 429 errors into account

Resiliency Policies

/area runtime

/area operator

/area placement

/area docs

/area test-and-release

What version of Dapr?

1.1.x

Expected Behavior

Configured an app (service) to be rate limited to 1 call per second. Confirmed from logs that this policy is applied. App correctly returns HTTP 429 when there are too many requests in the given timeframe (more than 1 request per second).

Configure another client app that makes a service invocation to above service app with standard retry and resiliency policies applied. If the client app makes too many calls (more than 1 per second), it receives a HTTP 429 from the service. The service is retried as per retry policies configured and the call will eventually succeed.

Note both apps are written in Java.

Actual Behavior

Client app behaves as if all the service invocations were successful, but in fact many of them failed with the HTTP 429 error. None of the failed requests are retried.

It looks like the runtime is ignoring HTTP 429 errors.

Steps to Reproduce the Problem

TimeUnit.MILLISECONDS.sleep(100);
  • Add a ratelimit middleware component as follows in the components folder:
apiVersion: dapr.io/v1alpha1
kind: Component
metadata:
  name: ratelimit
spec:
  type: middleware.http.ratelimit
  version: v1
  metadata:
  - name: maxRequestsPerSecond
    value: 1
  • Add the ratelimit to the pipeline by modifying the config.yaml as follows:
apiVersion: dapr.io/v1alpha1
kind: Configuration
metadata:
  name: config
spec:
  features:
    - name: Resiliency
      enabled: true
  httpPipeline:
    handlers:
      - name: ratelimit
        type: middleware.http.ratelimit
        
  • Rebuild the apps and run as per the QuickStart docs. The checkout app just runs through the 20 invocations as if all of them succeeded, but in fact only 3 of the 20 calls succeeded.

Release Note

RELEASE NOTE:

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 15 (10 by maintainers)

Most upvoted comments

Thanks @artursouza for the deep dive investigation here and finding that the situation described by @sujitdmello shouldn’t have worked in 1.9.

To recap:

  • For this scenario to work, the middleware should be set in the appHttpPipeline and NOT httpPipeline
  • The rate limiter middleware is broken in Dapr 1.9 and was already fixed in Dapr 1.10, so this scenario won’t work with Dapr 1.9 anyways. See: dapr/components-contrib#2349
  • I’ve opened #5872 to track respecting the value of Retry-After set in 429 responses as a follow-up for Dapr 1.11

Thanks for reporting this @sujitdmello

We discussed this during the call this morning:

  • Retries should work in your case even with 429 errors. It seems that you don’t have a configured Resiliency policy however (built-in retries only apply to errors that map to Unavailable or Unauthenticated). In your case, adding a Resiliency policy should make the request be retried
  • However, we currently do not handle 429 responses any differently from, for example, 500 ones. We probably should implement some special logic for 429 errors that takes into account the value of the Retry-After header. However, this won’t happen for Dapr 1.10 since we’re already well into the endgame.
  • We also noticed a bug in how we handle 2xx status codes besides 200, and we’ll fix that as P0 for Dapr 1.10. Unrelated from your issue, but you allowed us to discover that, so thank you 😃