h11: Fails to decode a header that requests can handle

I am trying to read this url

https://www.bitstamp.net/api/v2/trading-pairs-info/

It fails (see https://github.com/theelous3/asks/issues/60) with an exception:

....
  File "/Users/michael/.pyenv/versions/3.6.3/lib/python3.6/site-packages/h11/_readers.py", line 85, in maybe_read_from_SEND_RESPONSE_server
    return class_(headers=list(_decode_header_lines(lines[1:])),
  File "/Users/michael/.pyenv/versions/3.6.3/lib/python3.6/site-packages/h11/_readers.py", line 55, in _decode_header_lines
    matches = validate(header_field_re, line)
  File "/Users/michael/.pyenv/versions/3.6.3/lib/python3.6/site-packages/h11/_util.py", line 96, in validate
    raise LocalProtocolError(msg)

Looking closer, this is the header that seems to cause trouble:

bytearray(b'Set-Cookie: ___utmvafIumyLc=kUd\x01UpAt; path=/; Max-Age=900')

My guess is it’s the \x.

About this issue

Original URL
State: closed
Created 6 years ago
Comments: 16 (10 by maintainers)

Commits related to this issue

Temporary fix for header encoding issue. https://github.com/python-hyper/h11/issues/57 — committed to miracle2k/h11 by miracle2k 6 years ago
Allow more characters in header values The RFC says we should reject any header value that contains control characters. But apparently in the real world, you have to both accept and produce these som... — committed to njsmith/h11 by njsmith 6 years ago
Allow more characters in header values The RFC says we should reject any header value that contains control characters. But apparently in the real world, you have to both accept and produce these som... — committed to njsmith/h11 by njsmith 6 years ago

Most upvoted comments

Note that curl allows all sorts of things specifically because it is a tool used for pentesting and verification. It would be nice if it had some sort of validation mode that highlighted spec errors.

royfielding on Feb 4, 2021

On further thought, I realized my suggestion above wouldn’t actually handle the case that started this, because there the offending byte is inside a cookie, which means that the client has to be able to send it back to the server 😕. So I guess our options are:

Loosen the header-value regex to accept any character except \0, \r, \n
Be spec-compliant about what we send and receive in server mode, but use the loosened rule above in client mode. This would be somewhat annoying to implement, though, because of how h11 is structured (the same h11.Request or h11.Response objects are used by both clients and servers, they validate headers, and they don’t know which role we’re playing – if we were only changing the rules for parsing from the wire, that would be easier, because that’s done by the connection object itself).

So I guess I’m now leaning towards the first option.

njsmith on Apr 2, 2018

@sigmavirus24 I guess a similar option would be to have some sort of configuration you pass when setting up the h11.Connection object to choose the degree of header validation. (This would also be easier to implement+faster, since currently we do the header value validation as a side-effect of the regex’s we use to pick apart the headers in the first place.) But if the person who has the problem is using, like, someapp → requests → urllib3 → h11, then this kind of config doesn’t help unless either urllib3 sets it to the permissive mode by default, or else everyone in the stack provides their own config option that they then pass through to the next layer down. That’s the motivation for thinking about envvars – they’re gross, but they do provide a way for a user to circumvent that stack.

That said… I guess we always want strict validation on outgoing headers, and that probably we always want strict validation as servers parsing incoming requests (because on the server side, you can give a proper error message, plus not-always-but-usually the client will be stuck working around whatever you do rather than vice-versa). And if this is breaking in real life and no extant clients actually enforce it, then I guess urllib3 and similar will probably want to disable it always anyway. So one heuristic would be: when parsing the headers from an incoming response, and only in this situation, then outlaw \0, \r and \n but let everything else through; otherwise (outgoing headers and incoming requests) apply full RFC validation.

njsmith on Mar 22, 2018