runtime: HttpClient doesn't decompress "deflate" correctly
Description
I am using .NET Core v3.1 to fetch data from a REST API that uses deflate compression. I use the following code:
var httpClientHandler = new HttpClientHandler { AutomaticDecompression = DecompressionMethods.All };
using var httpClient = new HttpClient(httpClientHandler);
var request = new HttpRequestMessage(HttpMethod.Get, "https://api.example.com/api/v1/get");
var response = await httpClient.SendAsync(request);
This request sends out the following headers:
GET https://api.example.com/api/v1/get HTTP/1.1
Host: api.example.com
Accept-Encoding: gzip, deflate, br
The server responds with (parsed with Fiddler):
HTTP/1.1 200 OK
Server: openresty/1.15.8.1
Date: Wed, 17 Jun 2020 12:06:27 GMT
Content-Type: application/json; charset=utf-8
Transfer-Encoding: chunked
Connection: keep-alive
charset: utf-8
Access-Control-Allow-Origin: *
Access-Control-Allow-Headers: Content-Type, Authorization, X-Requested-With
Content-Encoding: deflate
Vary: Accept-Encoding
Strict-Transport-Security: max-age=15724800; includeSubDomains
My .NET Core client cannot decompress the data and fails with “The archive entry was compressed using an unsupported compression method.”. When I invoke this command using Curl or Postman, then it works fine and the result is a valid JSON result.
The binary data looks like this:
00000000: 789c edd4 414b c330 1407 f0af 22ef 9c43 x...AK.0...."..C
00000010: 92a6 8deb 7517 3d74 8808 1ec6 0e61 895d ....u.=t.....a.]
00000020: a14d 244b 6132 f6dd 4daa 42ab 87a5 13d7 .M$Ka2..M.B.....
00000030: 83cd a92f fcf3 dafc 0a6f 7d84 da08 b932 .../.....o}....2
00000040: 9013 042f 4ac9 4a97 9f95 15ae 32fa 5e42 .../J.J.....2.^B
00000050: ce09 43d0 5407 6543 45b2 0481 8f59 1f56 ..C.T.eCE....Y.V
00000060: daed 215f 1f7b 7588 30cc 1608 9cb0 a572 ..!_.{u.0......r
00000070: cfaa 2a77 0ef2 c49f 92f6 ad10 ce29 fba0 ..*w.........)..
00000080: ec16 728c 406c 5d2b ea22 f45e 9aa6 11da ..r.@l]+.".^....
00000090: 9f86 9bb0 309c 504c df8c 47f7 2db0 5f14 ....0.PL..G.-._.
000000a0: 4e1b 04a5 35ed ebc7 a777 8fdd c528 c7df N...5....w...(..
000000b0: bb13 ecb7 764a c8a5 6975 5787 d3ce 3851 ....vJ..iuW...8Q
000000c0: 3ffd 0c76 fb77 fdb4 7713 87c7 4eb2 970c ?..v.w..w...N...
000000d0: 37fb 72a7 b3fb 24ee c9ec 3e89 3b9b dd29 7.r...$...>.;..)
000000e0: 27e7 dcd3 31ec e979 f5f4 8fd5 e371 46a1 '...1..y.....qF.
000000f0: 93f8 9f79 013a 8d33 a797 9167 0372 3a20 ...y.:.3...g.r:
00000100: 67f8 f7e4 fe75 78b4 fab8 5110 3f0b e287 g....ux...Q.?...
00000110: 413a 40e2 0324 f63f a7c1 0453 f876 769f A:@..$.?...S.vv.
00000120: c47d 31bb 5fcf 7df3 0ea6 e930 f2 .}1._.}....0.
After some digging, I found that the first two bytes 78 9C
are the ZLIB header and the final four bytes A6 E9 30 F2
are the Adler32 checksum that is part of ZLIB compressed data. It seems that .NET uses the standard DeflateStream when it encounters deflated content and this stream cannot deal with deflated data that has this ZLIB header.
RFC2616 section 3.5 specifies:
deflate The “zlib” format defined in RFC 1950 in combination with the “deflate” compression mechanism described in RFC 1951.
.NET Core and .NET Framework don’t implement the RFC 1950 part of this specification, so if the HTTP server uses the ZLIB envelope the data cannot be decompressed and the call fails.
Expected behaviour
The .NET Core implementation should detect that a ZLIB envelop is being used and decompress the data according to this header.
Workaround
Only specify the DecompressionMethods.GZip
for the HttpClientHandler’s AutomaticDecompression
property. If the server behaves properly, then it should use GZip compression instead (or not use compression after all). If you have a bad behaving HTTP server it may still send deflated data instead.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 2
- Comments: 30 (27 by maintainers)
Maybe for .NET 5, since this has existed forever, we could handle what seems like the vast majority case just by stripping off the typical two-byte header and the checksum footer:
It’s not 100% robust, but then again, what we’re currently doing is apparently 100% problematic 😃 We could validate the two-byte header is the exact
78 9C
combination we know how to handle, and optionally validate the checksum.(The “right” answer is still a ZlibStream that just lets zlib handle it all. Just exploring alternatives.)
Given that this issue has apparently existed forever and we just discovered it, it doesn’t seem like it justifies doing something yucky.
We have tests in .NET Framework and .NET Core for deflate and gzip. But we only test against ourselves, i.e. the .NET DeflateStream classes. We don’t have any ‘interop’ tests with other implementations. That is probably how we missed this case.
DeflateStream is the raw deflate algorithm, no header, no footer, and can be used when some other format is including “deflate” and providing a header/footer around it. GzipStream effectively does just that, wrapping that with the gzip header and footer. To fix this, we would very likely want to expose a ZlibStream (https://github.com/dotnet/runtime/issues/2236), and use that instead of DeflateStream in SocketsHttpHandler.
It is: HttpClient is handling content-coding and deciding to use DeflateStream to handle “deflate”, which isn’t correct.