http-kit: SSL error on `doRead`

I’m getting javax.crypto.BadPaddingException exception which may be caused by synchronization. I’m not sure if this bug can be fixed on httpkit or just by updating java version 🤔

I’m using

http-kit 2.5.3
java openjdk version “11.0.11” 2021-04-20

Exception stack

javax.crypto.BadPaddingException: Insufficient buffer remaining for AEAD cipher fragment (2). Needs to be more than or equal to IV size (8) + tag size (16)
  ?, in sun.security.ssl/decrypt
  ?, in sun.security.ssl/decodeInputRecord
  ?, in sun.security.ssl/decode
  ?, in sun.security.ssl/decode
  ?, in sun.security.ssl/decode
  ?, in sun.security.ssl/decode
  ?, in sun.security.ssl/readRecord
  ?, in sun.security.ssl/unwrap
  ?, in sun.security.ssl/unwrap
  ?, in javax.net.ssl/unwrap
  File "HttpsRequest.java", line 35, in org.httpkit.client/unwrapRead
    while ((res = engine.unwrap(peerNetData, peerAppData)).getStatus() == Status.OK) {
  File "HttpClient.java", line 191, in org.httpkit.client/doRead
    read = httpsReq.unwrapRead(buffer);
  File "HttpClient.java", line 494, in org.httpkit.client/run
    doRead(key, now);
  ?, in java.lang/run

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 15 (2 by maintainers)

Commits related to this issue

properly unrecycle the req when kept-alive conn wasn't able to be reused Fixes the SSLProtocolException "Input record too big" errors reported in #469. Moves more logic into recycle/unrecycle to make... — committed to xwang1498/http-kit by xwang1498 2 years ago
properly unrecycle the req when kept-alive conn wasn't able to be reused Fixes the SSLProtocolException "Input record too big" errors reported in #469. Moves more logic into recycle/unrecycle to make... — committed to xwang1498/http-kit by xwang1498 2 years ago
properly unrecycle the req when kept-alive conn wasn't able to be reused Fixes the SSLProtocolException "Input record too big" errors reported in #469. Moves more logic into recycle/unrecycle to make... — committed to xwang1498/http-kit by xwang1498 2 years ago
properly unrecycle the req when kept-alive conn wasn't able to be reused Fixes the SSLProtocolException "Input record too big" errors reported in #469. Moves more logic into recycle/unrecycle to make... — committed to xwang1498/http-kit by xwang1498 2 years ago
properly unrecycle the req when kept-alive conn wasn't able to be reused Fixes the SSLProtocolException "Input record too big" errors reported in #469. Moves more logic into recycle/unrecycle to make... — committed to xwang1498/http-kit by xwang1498 2 years ago
[#469 #489] [Client] Properly unrecycle req when kept-alive conn wasn't able to be reused (@xwang1498) Fixes nasty client bug that could lead to "Input record too big" SSLProtocolException errors. t... — committed to http-kit/http-kit by xwang1498 2 years ago

Most upvoted comments

@miikka @huima @seancorfield I think I’ve finally figured out what’s going on here, and it’s a nasty (but fixable!) bug in http-kit.

tl;dr

If re-using a kept-alive connection fails for some reason (e.g. the remote side closed it), http-kit will incorrectly use the old ssl engine when making a new connection.

Longer explanation

With the assistance of Wireshark, I found this sequence of events:

We successfully request many times over a long-running kept-alive connection.
At some point, this connection is closed by the remote end.
When trying to re-use this connection, we realize that it has closed, so we open a new connection. However, we don’t start a new TLS handshake, but rather immediately send encrypted data (because we’re incorrectly re-using the old ssl engine that has already handshaken) on the new connection.
To the remote (Cloudflare) side, this encrypted data looks like garbage, because we didn’t do a new TLS handshake. So they respond with HTTP/1.1 400 Bad Request and close this new connection.
Our TLS stack interprets those bytes as a large TLS packet and the request fails with a red herring SSLProtocolException: Input record too big: max = 16709 len = 20532 error (see above for why it’s 20532).
A subsequent request will open a brand new connection with a new ssl engine. So it will succeed and the system carries on.

This bug probably affects tons of people, because a kept-alive HTTPS connection getting closed would be a very common occurrence.

I’ve created pull request #489 that tries to address the problem.

xwang1498 on May 24, 2022

We’re also seeing this from time to time in production, without the SNI enabled configurer, without a specific client instance created, on a plain post that we dereference “immediately” before continuing on. So it’s just reusing the default client singleton.

We’re currently on a slightly older JDK 11 so it could be that underlying issue for us (we’re in the process of updating to the latest JDK 11 but plan to go to JDK 17 “soon”).

I’m going to update our code to use a fresh client instance for each call site to see if that reduces the occurrences of the error (to rule out some level of concurrency issues – although one call site is pretty high-traffic).

seancorfield on Sep 15, 2021

We’re seeing a javax.net.SSLProtocolException (message: Input record too big: max = 16709 len = 20532) when using http-kit as a HTTP client.

Hi @miikka , did you ever figure out what’s going on here?

I think I have a partial explanation. A Cloudflare-hosted service we talk to sometimes returns a plain unencrypted http response (e.g. the raw bytes HTTP/1.1 400 Bad Request ...), despite it being a TLS connection. The TLS stack interprets these bytes as a TLS record, where bytes 3 & 4 (the P/ in HTTP/1.1) are the length, plus 5 bytes for the header. 0x502f + 5 = 20532 bytes!

xwang1498 on May 17, 2022

@huima I can only point out that CloudFront and CloudFlare seem to have several differences in behavior.

seancorfield on May 18, 2022

@xwang1498, I never figured it out but your explanation makes a lot of sense! We didn’t have Cloudflare, but I can’t rule out some other service returning plain HTTP requests in some cases. Personally I’ve moved on but @huima check this out, you might find it interesting.

miikka on May 18, 2022

This reminded me I should have followed up on my report from September 2021: we ended up switching from http-kit to Hato because of this, after trying fresh client instances and also updating our JDK – and, yes, CloudFlare was in the mix for us as well.

seancorfield on May 17, 2022