requests: HeaderParsingError: Failed to parse headers
INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): 127.0.0.1
DEBUG:requests.packages.urllib3.connectionpool:"POST /kkblog/ HTTP/1.1" 201 None
WARNING:requests.packages.urllib3.connectionpool:Failed to parse headers (url=http://127.0.0.1:5984/kkblog/): [MissingHeaderBodySeparatorDefect()], unparsed data: '³é\x97\xad\r\nETag: "1-967a00dff5e02add41819138abb3284d"\r\nDate: Fri, 15 Apr 2016 14:45:18 GMT\r\nContent-Type: text/plain; charset=utf-8\r\nContent-Length: 69\r\nCache-Control: must-revalidate\r\n\r\n'
Traceback (most recent call last):
File "/usr/lib/python3.5/site-packages/requests/packages/urllib3/connectionpool.py", line 390, in _make_request
assert_header_parsing(httplib_response.msg)
File "/usr/lib/python3.5/site-packages/requests/packages/urllib3/util/response.py", line 59, in assert_header_parsing
raise HeaderParsingError(defects=defects, unparsed_data=unparsed_data)
requests.packages.urllib3.exceptions.HeaderParsingError: [MissingHeaderBodySeparatorDefect()], unparsed data: '³é\x97\xad\r\nETag: "1-967a00dff5e02add41819138abb3284d"\r\nDate: Fri, 15 Apr 2016 14:45:18 GMT\r\nContent-Type: text/plain; charset=utf-8\r\nContent-Length: 69\r\nCache-Control: must-revalidate\r\n\r\n'
here is the same request with curl:
curl -v -X POST 127.0.0.1:5984/kkblog/ -H "Content-Type: application/json" -d '{"_id": "关闭"}'
Note: Unnecessary use of -X or --request, POST is already inferred.
* Trying 127.0.0.1...
* Connected to 127.0.0.1 (127.0.0.1) port 5984 (#0)
> POST /kkblog/ HTTP/1.1
> Host: 127.0.0.1:5984
> User-Agent: curl/7.47.1
> Accept: */*
> Content-Type: application/json
> Content-Length: 17
>
* upload completely sent off: 17 out of 17 bytes
< HTTP/1.1 201 Created
< Server: CouchDB/1.6.1 (Erlang OTP/18)
< Location: http://127.0.0.1:5984/kkblog/关闭
< ETag: "3-bc27b6930ca514527d8954c7c43e6a09"
< Date: Fri, 15 Apr 2016 15:13:14 GMT
< Content-Type: text/plain; charset=utf-8
< Content-Length: 69
< Cache-Control: must-revalidate
<
{"ok":true,"id":"关闭","rev":"3-bc27b6930ca514527d8954c7c43e6a09"}
* Connection #0 to host 127.0.0.1 left intact
the problem is Location: http://127.0.0.1:5984/kkblog/关闭 in the response header, I tried other Chinese chars but they didn’t cause Exception.
>>> '关闭'.encode('utf-8')
b'\xe5\x85\xb3\xe9\x97\xad'
>>>
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 23 (10 by maintainers)
@fake-name I don’t recall being puritanical about anything. Here is, word for word, what I said (literally quoting myself from this thread):
Note the key part of this comment: “Either way, httplib is getting confused here, and we can’t really step in and stop it.”.
This is what I mean when I say “it’s not really possible for us to resolve the problem”. The issue here is in a helper library that sits in the Python standard library. Changing the header parsing logic of that standard library module, while possible, is something that needs to be done as part of the standard CPython development process. Requests already carries more subclasses and monkeypatches to httplib than we’re happy with, and we’re strongly disinclined to carry more.
So here are the options for resolving this issue:
Now, you are welcome to pursue any of those options, but I’ve been to this rodeo a few times so I’m pursuing (3), which is the only one that actually makes this problem go away for good. Unfortunately, it turns out that replacing our low-level HTTP stack that we have spent 7 years integrating with takes quite a lot of work, and I can’t just vomit out the code to fix this on demand.
To sum up: I didn’t say I didn’t think this was a problem or a bug, I said it was a problem that the Requests team couldn’t fix, at least not on a timescale that was going to be helpful to this user. If you disagree, by all means, provide a patch to prove me wrong.
And let me make something clear. For the last 9 months or so I have been the most active Requests maintainer by a long margin. Requests is not all I do with my time. I maintain 15 other libraries and actively contribute to more. I have quite a lot of stuff I am supposed to be doing. So I have to prioritise my bug fixing.
Trust me when I say that a bug where the effort required to fix it is extremely high and the flaw comes from a server that is emitting non-RFC-compliant output, that’s not a bug that screams out “must be fixed this second”. Any time there is a bug predicated on the notion that our peer isn’t spec compliant that bug drops several spaces down my priority list. Postel was wrong.
Browsers are incentivised to support misbehaving servers because they are in a competitive environment, and users only blame them when things go wrong. If Chrome doesn’t support ${CRAPPY_WEBSITE_X} then Chrome users will just go to a browser that does when they need access.
That’s all fine and good, but the reason that Requests doesn’t do this is because we have two regular developers. That’s it. There are only so many things two developers can do in a day. Neither of us work on just Requests. Compare this to Chrome, which has tens of full-time developers and hundreds of part-time ones. If you want Requests to work on every site where Chrome does, then I have bad news for you my friend because it’s just never going to happen.
I say all of this to say: please don’t berate the Requests team because we didn’t think your particular pet bug was important. We prioritise bugs and close ones we don’t think we’ll fix any time soon. If you would like to see this bug fixed, a much better option is to write the patch yourself. Shouting at me does not make me look fondly on your request for assistance.
I apologize. I assumed you were holding an opinion that you were not, and proceeded to be a complete ass.
In any event, I don’t disagree that this is a issue with the core library, but, well, in my experience complaining about encoding issues in the core library is non-productive (I have an issue with the built-in
ftplibwhere it decodes some messages asiso-8859-1even when in utf-8 mode, which I was only able to solve by monkey-patching the stdlib).Anyways, Assuming you’re OK with monkey patching, here’s a simple snippet that monkey-patches
http.clientto make it much, MUCH more robust to arbitrary header encodings:Note: This does require
cchardet, orchardet. I’m open to better ways to determine the encoding. It simply overrides thehttp.client.parse_headers()member of the stdlib, which is kind of squicky.Splatting the above into a file, and then just importing it at the beginning of the
requests/__init__.pyfile seems to solve the problem.And again, whether it’s compliant is irrelevant. There are servers out that that act like this. And my browser, and cURL are completely happy talking to them, yet requests explodes.
Hell, I’m interacting with cloudflare, and it’s serving UTF-8 headers. So basically unicode header support is massively, MASSIVELY deployed and available, standards be damned.
If you’re dead set on being puritanical about RFC support, the only people who are harmed are people who want to use the requests library.
So… what about the thousands and thousands of potential servers that I can’t just SSH into and fix?
Basically, “just follow the RFC” is a complete non-answer, because I don’t control the world (I’m taking minion applications, though!).
The fact is, servers out there serve UTF-8 headers. This is not something fixable, because they’re not my server. My web browser handles this situation just fine, so it’s clearly possible to make this situation work.
As it is, requests fails on these servers. This is fixable, because I control the code on my local machine.