requests: utils.get_encoding_from_headers returns ISO-8859-1 incorrectly
When I call get_encoding_from_headers on this url:
http://thelastpsychiatrist.com/2012/02/my_fiancee_is_pushing_me_away.html
The response is ISO-8859-1:
(Pdb) get_encoding_from_headers(self.response.headers)
'ISO-8859-1'
Even though the headers don’t contain that characterset:
(Pdb) self.response.headers
{'date': 'Sun, 11 Mar 2012 21:10:40 GMT', 'transfer-encoding': 'chunked', 'content-type': 'text/html', 'server': 'Apache/2.2.22'}
It looks like this was an intentional choice in the source, but this is problematic for me because, if I knew that the encoding was guessed, I’d want to check the HTML meta tag myself - which would then properly parse as UTF-8.
I think the better solution for is to either return None explicitly, or provide a default kwarg param that people could set to an encoding manually if they wanted to.
I can patch this if it sounds like a good solution.
About this issue
- Original URL
- State: closed
- Created 12 years ago
- Comments: 16 (16 by maintainers)
For future reference to anyone who stumbles upon this, the spec is:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7.1