requests: Proposal: potential way to follow bad url shorteners
I’ve ran into an issue that runs tangent to issue #441.
We have a content indexer that is powered by requests. Some of the larger URL shorteners exhibit some weird behaviors depending on the user agent string.
-
(variant of #441) depending on the user-agent header, a particular shortener will either send a proper 301 response OR a HTTP 200 containing a meta-refresh value of 0.
<head><meta name="referrer" content="always"><noscript><META http-equiv="refresh" content="0;URL={URL}"></noscript><title>{URL}</title></head><script>window.opener = null; location.replace("{URL}")</script> -
(new?) a handful of url shorteners may send a HTTP-200 response with a
locationheader to the redirect. (yes I know, it breaks spec and makes no sense.
To handle both of these (and several other scenarios) I have a novel suggestion. The callback hooks could be used to allow developers to catch and follow these types of redirect oddities within a single configured request.
A developer would be able to handle edge cases like the above by with a hook:
def bad_shortener_callback(r):
r_location = r.headers.get('location')
if r.status_code == 200 and r_location!= r.url:
r.is_redirect = True
r.redirect_location_override = r_location
return r
In order to make this work, a slight change would be needed:
The redirect url is pulled via this line:
url = resp.headers['location']
I propose this change to sessions.py
-url = resp.headers['location']
+url = resp.redirect_location
and then extending models.py accordingly
+ redirect_location_override = None
@property
def redirect_location(self):
if self.redirect_location_override is not None:
return self.redirect_location_override
return self.headers['location']
if this is acceptable, I would be happy to issue a PR that includes tests.
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 16 (16 by maintainers)
I have no problem with achieving pluggability via a refactor of the redirect following logic. For example, it could be enough to update
SessionRedirectMixinwith a methodget_redirect_targetwhich we use to determine if a response needs us to redirect, and then call that.