scrapy: Adjust throttling for 429 response codes
HTTP 429 response code is returned when we reach a rate limit for an API at a given time. Usually, it is a matter of waiting a bit to start sending new requests. The “problem” is that, if concurrency settings are greater than the allowed number of requests from the API, we’ll always get 429s.
A solution would be to tune throttling so it delays requests based on 429s.
It can be a extension/middleware, as AutoThrottle seems quite specific for throttle control over latency.
Also it could be worth considering that some APIs return the waiting time https://github.com/scrapy/scrapy/issues/3849
Here is a previous PR for this https://github.com/scrapy/scrapy/pull/3061
About this issue
- Original URL
- State: open
- Created 4 years ago
- Comments: 15 (7 by maintainers)
@Gallaecio I think a per-domain basis would be more fitting for my use-case. However, with #5015 close to being approved (I’ve been eagerly following that discussion), I do believe that functionality will make it easier to account for status codes for throttling by subclassing the existing AutoThrottle middleware. Though, I feel this functionally would fall under a common dilemma, thus calling for an official middleware to address it.