requests: requests can't properly handle redirects if the response body is encoded in something else than 'utf8'
Just like in the topic. The response body is encoded in iso-8859-2 and the location happens to contain non-ascii character so that it results in UnicodeDecodeError being thrown.
Expected Result
Flawless execution of the code.
Actual Result
UnicodeDecodeError
Reproduction Steps
import requests
requests.get("http://www.biblia.deon.pl/ksiega.php?id=3")
System Information
$ python -m requests.help
{
"chardet": {
"version": "3.0.4"
},
"cryptography": {
"version": "2.3"
},
"idna": {
"version": "2.7"
},
"implementation": {
"name": "CPython",
"version": "2.7.15+"
},
"platform": {
"release": "4.18.0-13-generic",
"system": "Linux"
},
"pyOpenSSL": {
"openssl_version": "1010100f",
"version": "18.0.0"
},
"requests": {
"version": "2.19.0"
},
"system_ssl": {
"version": "1010100f"
},
"urllib3": {
"version": "1.23"
},
"using_pyopenssl": true
}
This command is only available on Requests v2.16.4 and greater. Otherwise, please provide some basic information about your system (Python version, operating system, &c).
About this issue
- Original URL
- State: open
- Created 5 years ago
- Comments: 17 (5 by maintainers)
I have also recently run into this issue and would like to see #4933 merged.
@tomchristie Thank you for answer. Technically speaking it might not be a bug but I will still maintain that this is an expected behaviour from the library which advertises itself as “HTTP for Humans”.
Following Python3 code works as expected
Following Go code works as expected
Both of them use only standard library.
The encoding of the response body is irrelevant here. The location header should be strictly ascii encoded. (See eg. https://stackoverflow.com/questions/7654207/what-charset-should-be-used-for-a-location-header-in-a-301-response.)
Requests will (reasonably enough) decode it as utf8, since it is ascii compatible, and ends up being more robust in practice.
In short: The
http://www.biblia.deon.pl/ksiega.php?id=3address is serving an invalid HTTP response.(As an aside it also doesn’t include ‘iso-8859-2’ in the content-type, so there’s really no way to determine what the intended content type of the byte sequence might be)
Requests could decode the header with
errors="ignore"or something like that, in order to be more robust against malformed headers, but it’d just be masking the issue that the response header is malformed.