openssl: New EOF detection breaks session resumption

We want to do session resumption.

If we follow the documentation of SSL_shutdown:

In case the application wants to be able to resume the session, it is recommended to do a complete shutdown procedure (bidirectional close_notify alerts). […] The shutdown is not yet finished: the close_notify was sent but the peer did not send it back yet. Call SSL_read() to do a bidirectional shutdown.

Consider the following pseudocode:

// obtain SSL_SESSION via callback during SSL_connect
...

// bidirectional shutdown
if (SSL_shutdown(ssl) == 0)
    SSL_read(ssl, NULL, 0);

// try to resume session via new ssl object
...

Due to new EOF detection (db943f43a60d1b5b1277e4b5317e8f288e7a0a3a), the call to SSL_read fails here and invalidates the session (SSLfatal) breaking resumption.

Version: OpenSSL 1.1.1e and current master (7e06a6758bef584deabc9cb4b0d21b3e664b25c9) Proposed fix: If bidirectional shutdown is supposed to work correctly, a zero-length EOF in SSL_read should not be a fatal error.

/cc @kleest

About this issue

Original URL
State: open
Created 4 years ago
Reactions: 1
Comments: 81 (61 by maintainers)

Commits related to this issue

openssl: add patch reverting behaviour change introduced in 1.1.1e Upstream issue: https://github.com/openssl/openssl/issues/11378 Quite a lot of programs needs to be fixed before they work with 1.1... — committed to termux/termux-packages by Grimler91 4 years ago
openssl: add patch reverting behaviour change introduced in 1.1.1e (#5075) * openssl: add patch reverting behaviour change introduced in 1.1.1e Upstream issue: https://github.com/openssl/openssl/i... — committed to termux/termux-packages by Grimler91 4 years ago
openssl: revert EOF detection change in 1.1.1 This adds patches to avoid possible application breakage caused by a change in behavior introduced in 1.1.1e. It affects at least nginx, which logs erro... — committed to openwrt/openwrt by cotequeiroz 4 years ago
openssl: revert EOF detection change in 1.1.1 This adds patches to avoid possible application breakage caused by a change in behavior introduced in 1.1.1e. It affects at least nginx, which logs erro... — committed to immortalwrt/immortalwrt by cotequeiroz 4 years ago
openssl: revert EOF detection change in 1.1.1 This adds patches to avoid possible application breakage caused by a change in behavior introduced in 1.1.1e. It affects at least nginx, which logs erro... — committed to openwrt/openwrt by cotequeiroz 4 years ago
openssl: revert EOF detection change in 1.1.1 This adds patches to avoid possible application breakage caused by a change in behavior introduced in 1.1.1e. It affects at least nginx, which logs erro... — committed to 1715173329/lean-lede by cotequeiroz 4 years ago
openssl: revert EOF detection change in 1.1.1 (#4127) This adds patches to avoid possible application breakage caused by a change in behavior introduced in 1.1.1e. It affects at least nginx, which... — committed to coolsnowwolf/lede by 1715173329 4 years ago
openssl: revert EOF detection change in 1.1.1 (#4127) This adds patches to avoid possible application breakage caused by a change in behavior introduced in 1.1.1e. It affects at least nginx, which... — committed to dexlx/lede by 1715173329 4 years ago
openssl: update to version 1.1.1f This is an important bugfix release that addresses a number of issues with user level applications that were brought about by version 1.1.1e. Users should treat ver... — committed to macports/macports-ports by dbevans 4 years ago
Upgrade to OpenSSL v1.1.1f This fixes a regression with EOF detecting handling: * https://github.com/openssl/openssl/issues/11378 * https://github.com/openssl/openssl/pull/11400 This was causing fai... — committed to gitlabhq/omnibus-gitlab by stanhu 4 years ago
openssl: revert EOF detection change in 1.1.1 This adds patches to avoid possible application breakage caused by a change in behavior introduced in 1.1.1e. It affects at least nginx, which logs erro... — committed to zxlhhyccc/openwrt-sfe by cotequeiroz 4 years ago
p5-IO-Socket-SSL: update to 2.068. 2.068 2020/03/31 - treat OpenSSL 1.1.1e as broken and refuse to build with it in order to prevent follow-up problems in tests and user code https://github.com/n... — committed to NetBSD/pkgsrc by deleted user 4 years ago
openssl: revert EOF detection change in 1.1.1. This adds patches to avoid possible application breakage caused by a change in behavior introduced in 1.1.1e. It affects at least nginx, which logs err... — committed to liji6630/qsdk by cotequeiroz 4 years ago

Most upvoted comments

The server-sides causing these problems will not just fix themselves in the mean-time so they will just show up later rather than sooner if you revert in 1.1.1 and leave enabled in v3. Postponing the decision will not help. It is either a bug or it isn’t.

IMO there is no doubt that this is a bug. However - all bug fixes are behaviour changes. Normally we hope that the behaviour change introduced by a bug fix into a stable branch is desirable because no one wants the old incorrect behaviour. However, every now and then we come across a bug like this one. The old behaviour has been there for so long that other software has been written to expect the incorrect behaviour.

Since 1.1.1 is supposed to be stable, and this has broken stuff, it seems that the correct solution is to revert it. However, 3.0 is a major release. We are trying to keep breaking changes in 3.0 to an absolute minimum, but we do not rule them out entirely. Software authors should reasonably expect to have to test that their software still works when upgrading to 3.0, and might have to make some minor changes in some cases. So, it still seems to me entirely appropriate to fix this bug in that release.

That said this has highlighted a couple of related problems which should be fixed in 3.0:

Session resumption breaking was unexpected - so we should look to fix that
s_server should also be fixed

It’s entirely possible that there are other related fixes that we should do to minimise the impact. So it would be good if we could identify those.

mattcaswell on Mar 25, 2020

I agree that bugfixes are behavior changes, and that 3.0 is a major release. It would be unfortunate if, for example, it wasn’t possible to buid and USE a FIPS-capable curl, for example, or set up a FIPS-capable nginx web server.

Please consider these things when you decide if the EOF detection stays in 3.0

I think the fix should stay in 3.0. Migration to 3.0 requires at least a rebuild/test of the code, and most applications should continue to work, and those that don’t can be remediated to not be oblivious to truncation attacks.

The issue with introducing this into 1.1.1e is that applications tested with previous 1.1.x releases would now without additional build/test cycles be suddenly exposed to a library that presents a potentially unexpected and not entirely uncommon end-of-session behaviour. That’s a higher level of risk than for recertification (build/test) of the application for 3.0, likely on a newer overall OS release, …

Which is not to say that we couldn’t discuss providing backwards-compatibility crutches for 3.0, but I’d like to know more about actual, rather than hypothetical impact. Which applications and which corner cases are running into this?

vdukhovni on Mar 25, 2020

PR for the 1.1.1 revert: #11400

t8m on Mar 25, 2020

I suppose so. A related issue to solve in master is also the #11388

t8m on Mar 25, 2020

Does this imply that we need to do a 1.1.1f soon?

mattcaswell on Mar 25, 2020

I would like to suggest adding an option to SSL_set_options to switch between the old and new EOF handling, which would be set to “old” by default. This would solve all EOF related issues for software relying on the old behavior while keeping the option for anyone concerned about #10880.

roussosalex on Mar 24, 2020

@beldmit This is not just a s_server problem. We found broken resumption using nginx as well.

@kroeckx While you are right that the server should send a close notify, many servers don’t. So “fix the server” is not something we can do. I did a quick test against the Alexa Top 50 and the rate of successful resumptions using bidirectional shutdown went from 70% using 1.1.1d to 30% using 1.1.1e. If SSL_read is going to signal an error, at least don’t invalidate the session while doing so.

As a general note, this kind of breaking change is not something I would have expected in a bugfix release.

roussosalex on Mar 23, 2020

Alright, then the SSL_shutdown documentation should be updated to reflect that in 3.0.

I did that in #11531

I still think there should be an option to ignore EOF for protocols that already handle truncation well.

I think that’s a good suggestion. It’s probably easier for applications to use that than to avoid the SSL_read() call. We should consider adding that in 1.1.1-stable.

kroeckx on Apr 15, 2020

It is a problem that OpenSSL seems to lack a culture of trying to minimize the amount of user visible changes.

I think this claim is ill-informed; the culture is to minimize the amount of user-visible changes. We’re all human, though, and sometimes things slip through. As 1.1.1f has illustrated, we are capable of reacting and fixing it when the wrong thing happened.

kaduk on Apr 15, 2020

I think a couple of things are being conflated here, leading to some confusion.

I think we can all agree that unexpected EOF should be a fatal error in an ideal world, because RFC 8446 is very clear about the close_notify requirement.
The documentation is accurate if the server properly closes the TLS connection.

The main problem is how misbehaving servers are handled. If they are treated as a fatal SSL error, then the session should be invalidated according to the RFC:

Servers and clients MUST forget the secret values and keys established in failed connections, with the exception of the PSKs associated with session tickets, which SHOULD be discarded if possible.

This would unfortunately turn all connections to currently non-conforming servers into failed connections, breaking session resumption and causing lots of errors. Assuming that the server applications cannot be fixed in the near future, there is no perfect solution to the problem but changing this behavior in a major release instead seems reasonable to me.

roussosalex on Apr 13, 2020

The “poorly-implemented HTTP/1.1 servers” are still out there and are being used. How common? Impossible to say.

bagder on Mar 26, 2020

I agree that bugfixes are behavior changes, and that 3.0 is a major release. It would be unfortunate if, for example, it wasn’t possible to buid and USE a FIPS-capable curl, for example, or set up a FIPS-capable nginx web server.

Please consider these things when you decide if the EOF detection stays in 3.0

richsalz on Mar 25, 2020

Can we at least do the revert in 1.1.1 soon? I can prepare PR for that.

t8m on Mar 25, 2020

The server-sides causing these problems will not just fix themselves in the mean-time so they will just show up later rather than sooner if you revert in 1.1.1 and leave enabled in v3. Postponing the decision will not help. It is either a bug or it isn’t.

bagder on Mar 25, 2020

If you keep it in 1.1.1 (no comment on that), please add a big CHANGES item that says this is broken and is changing in the next release, and point to a FAQ entry that explains what to do.

richsalz on Mar 24, 2020

I am afraid that this is the most reasonable way for 1.1.1.

And still in the master it would be worth allowing the resumption after the unexpected EOF unless we have a very good reason not to do that.

t8m on Mar 24, 2020

In practice I’m not sure how many people would actually use that option. I’m wondering whether a better solution is to keep this fix in master, but revert it in 1.1.1 and just document it as a known bug.

mattcaswell on Mar 24, 2020

I’m currently unsure if we should revert this or not.

This seems like the best solution for the near future until we can find a better way to handle the EOF issue.

And it seems non-trivial for code that actually knows all data was received and can’t avoid calling SSL_read() to ignore the error.

We would love to ignore the error, except that SSL_read now calls SSLfatal, which invalidates the session attached to it and makes it not resumable. We have no direct control over the validity of the session and cannot predict if a server will send a close notify or not.

If this problem persists, we can only switch to unidirectional shutdown and hope that the issue gets fixed in a future release.

roussosalex on Mar 23, 2020

We can consider revering it in 1.1.1 but keeping it in master/3.0. But I’m currently still unsure what to do, we want to encourage people to fix this. And it seems non-trivial for code that actually knows all data was received and can’t avoid calling SSL_read() to ignore the error.

kroeckx on Mar 23, 2020