scrapy: Adjust throttling for 429 response codes

HTTP 429 response code is returned when we reach a rate limit for an API at a given time. Usually, it is a matter of waiting a bit to start sending new requests. The “problem” is that, if concurrency settings are greater than the allowed number of requests from the API, we’ll always get 429s.

A solution would be to tune throttling so it delays requests based on 429s. It can be a extension/middleware, as AutoThrottle seems quite specific for throttle control over latency.

Also it could be worth considering that some APIs return the waiting time https://github.com/scrapy/scrapy/issues/3849

Here is a previous PR for this https://github.com/scrapy/scrapy/pull/3061

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Comments: 15 (7 by maintainers)

Most upvoted comments

@Gallaecio I think a per-domain basis would be more fitting for my use-case. However, with #5015 close to being approved (I’ve been eagerly following that discussion), I do believe that functionality will make it easier to account for status codes for throttling by subclassing the existing AutoThrottle middleware. Though, I feel this functionally would fall under a common dilemma, thus calling for an official middleware to address it.