httpx: Client should have more lenient behaviour wrt. spec violations.
Considering for example this snippet of code
def run_sync(url):
with httpx.Client(verify=False) as client:
response = client.get(url)
print(response)
async def run_async(url):
async with httpx.AsyncClient(verify=False) as client:
response = await client.get(url)
print(response)
>>> run_sync('http://100.33.56.173')
<Response [200 OK]>
>>> trio.run(run_async, 'http://100.33.56.173')
httpx.exceptions.ProtocolError: multiple Content-Length headers
>>> run_sync('http://220.181.136.243')
<Response [600 ]>
>>> trio.run(run_async, 'http://220.181.136.243')
httpx.exceptions.ProtocolError: Response status_code should be in range [200, 600), not 600
>>> run_sync('http://122.147.128.158')
<Response [200 OK]>
>>> trio.run(run_async, 'http://122.147.128.158')
httpx.exceptions.ProtocolError: malformed data
>>> run_sync('http://217.86.148.177')
<Response [200 OK]>
>>> trio.run(run_async, 'http://217.86.148.177')
httpx.exceptions.ProtocolError: Receive buffer too long
These issues seem to come from the underlying h11. I get that these are ill-configured servers, sometimes even against protocol specs, but it should work nonetheless as most HTTP clients don’t make these kinds of restrictions.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 3
- Comments: 17 (10 by maintainers)
The different cases that were raised in this ticket…
httpx.exceptions.ProtocolError: multiple Content-Length headersNow closed via https://github.com/python-hyper/h11/pull/109httpx.exceptions.ProtocolError: Response status_code should be in range [200, 600), not 600Now closed via https://github.com/python-hyper/h11/pull/140httpx.exceptions.ProtocolError: malformed dataTracked in https://github.com/python-hyper/h11/issues/97 (header value validation) and https://github.com/python-hyper/h11/issues/113 (header name validation) Now tracked in this more specific issue… https://github.com/encode/httpx/issues/1363httpx.exceptions.ProtocolError: Receive buffer too longDefaults to 16kB or configured on theh11connection instance. Nathaniel notes that curl defaults to 100kB - see https://curl.se/mail/lib-2019-09/0023.html - perhaps we should do the same? Now closed via https://github.com/encode/httpcore/pull/647I think if we had a ticket on
httpcoretracking the last item on this list (eg. support large cookie values on the order of a little under 100kB), then we could close this issue off as we’ve got more specific tickets for each case instead.I broke these cases down in https://github.com/python-hyper/h11/issues/96
They’re all cases where
h11was following the spec correctly, but we’d actually like it to have more lenient behaviours in each case.I work as a security researcher and these are cases that I happened to catch while using your (awesome) lib to perform some analysis at scale. From what I can immediately see, these happen a lot. Of course these are edge cases, but there are many many servers with bad configurations out there. For instance, the response status error is very rare, but the others actually happen quite often.
We probably want to provide as-graceful-as-possible behaviour, when protocol errors occur, so these are a really useful set of cases for us to consider, thanks.
There’s a few different classes present here, it might be that we want to deal with some gracefully, and others as hard errors, but will need to have a look over each case.
For context: How did you dig these cases out? Are you able to gauge how often your seeing each case?
Marked this as a question because I don’t know if there’s anything we can/should do here. The differences in behavior are expected, and come from us using urllib3 in the sync case, while the async case uses our own dispatcher implementation with h11/h2. This is planned to change for 1.0 when we will be using our own logic across sync and async.
In similar cases of faulty servers we turned it down as « it’s a faulty server, we’re not handling our-of-spec responses ». I don’t know if at some point we’ll want to start handling weird edge cases for the sake of allowing users to access data that’s there, just presented in a slightly out of spec form…