openssl: Out of date/incorrect advice in SSL_get_error docs as of 1.1.0c
This commit changed SSL_read to return -1 rather than 0 when BIO_read returns 0: https://github.com/openssl/openssl/commit/4880672a9b41a09a0984b55e219f02a2de7ab75e, which landed in the 1.1.0c release.
When SSL_get_error is called, it’ll return SSL_ERROR_SYSCALL, and the error stack will be empty. The man page for SSL_get_error says that at that point you should check SSL_read’s return code; a -1 means that there was a read error and to consult the BIO-specific error (e.g. errno), and 0 means that there was an EOF in violation of the protocol, but that doesn’t appear to be correct anymore.
Is this a bug in the implementation or the docs? If the new implementation is correct, I guess the new procedure should be to check the BIO error, and check if an error actually occurred to determine if it’s an EOF?
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 33 (28 by maintainers)
Commits related to this issue
- Fix EOF detection See https://github.com/openssl/openssl/issues/1903 for details — committed to sfackler/rust-openssl by sfackler 8 years ago
- Make SSL_read and SSL_write return the old behaviour and document it. This reverts commit 4880672a9b41a09a0984b55e219f02a2de7ab75e. Fixes: #1903 This also fixes the documenation for SSL_read_ex() a... — committed to kroeckx/openssl by kroeckx 8 years ago
- Make SSL_read and SSL_write return the old behaviour and document it. Backport of 1b363941f376ee03024c64ad85ada351a05e696e, revert of 122580ef71e4e5f355a1a104c9bfb36feee43759 Fixes: #1903 — committed to kroeckx/openssl by kroeckx 8 years ago
- Make SSL_read and SSL_write return the old behaviour and document it. Backport of 1b363941f376ee03024c64ad85ada351a05e696e, revert of fa4c374572e94f467900f5820cd1d00af2470a17 Fixes: #1903 — committed to kroeckx/openssl by kroeckx 8 years ago
- Make SSL_read and SSL_write return the old behaviour and document it. Backport of beacb0f0c1ae7b0542fe053b95307f515b578eb7, revert of 122580ef71e4e5f355a1a104c9bfb36feee43759 Fixes: #1903 Reviewed-... — committed to openssl/openssl by kroeckx 8 years ago
- Make SSL_read and SSL_write return the old behaviour and document it. Backport of beacb0f0c1ae7b0542fe053b95307f515b578eb7, revert of fa4c374572e94f467900f5820cd1d00af2470a17 Fixes: #1903 Reviewed-... — committed to openssl/openssl by kroeckx 8 years ago
- PORT: Handle EOF portably, making copy_stream_data/2 etc. work in OpenSSL 1.1.0c. This is necessary due to a change in behaviour of OpenSSL 1.1.0c: When encountering EOF, SSL_read() now returns -1 in... — committed to SWI-Prolog/packages-ssl by triska 8 years ago
- PORT: EOF when writing is now correctly handled also in OpenSSL 1.1.0c. This has become necessary because in OpenSSL 1.1.0c, SSL_write() returns -1 on EOF whereas it previously returned 0. In later ... — committed to SWI-Prolog/packages-ssl by triska 8 years ago
- packages/libs/openssl: add patch for broken SSL_read Add a patch from the Debian openssl package to fix SSL_read() behaviour. Fixes a couple packages that rely on specific return codes from that func... — committed to avionic-design/pbs-stage2 by deleted user 7 years ago
It kind of explains the new API.
But I wish that the documentation would state clearly how to detect EOF. For now I detect it by checking the errno on SSL_ERROR_SYSCALL and if it is 0 (i.e. no error) then I assume that it wasn’t actually an error in a system call as one would expect from this error code but that the underlying connection simply got closed (EOF). If this is the correct way it would be good that if it would be explicitly documented this way. Also, currently the documentation of this part only describes the behavior with UNIX sockets which does not help if one needs to detect EOF on different OS.
Still, I find the interface of getting a SSL_ERROR_SYSCALL when actually no error happened in a syscall unexpected and not intuitive. And it does not help that the documentation claims that SSL_ERROR_SYSCALL is thrown on I/O errors because EOF is not an I/O error at all.
FYI: bio_ssl.c does not follow the rule to avoid SSL_shutdown if previously SSL_ERROR_SSL or SSL_ERROR_SYSCALL were encountered, it is almost impossible to avoid the SSL_shutdown:
Could in theory also a fatal error happen in the bidirectional shutdown sequence ? Is there a difference between a “fatal” SSL_ERROR_SYSCALL and a non-fatal SSL_ERROR_SYSCALL from SSL_shutdown?
The manual on SSL_shutdown return value 0 says: "The shutdown is not yet finished. Call SSL_shutdown() for a second time, if a bidirectional shutdown shall be performed. The output of SSL_get_error(3) may be misleading, as an erroneous SSL_ERROR_SYSCALL may be flagged even though no error occurred. "
Is a SSL_ERROR_SSL/SYSCALL with return value -1 more serious than with return value 0 ?
The old behavior was explained in the old documentation already, i.e. it was documented that it returns 0 on EOF with unclean shutdown and it did that too. The new documentation does not make this promise anymore but does also not explain how to detect EOF in this case.
Because in reality I have to deal with peers which don’t behave in the only correct way. These peers simply close the underlying TCP connection and don’t do an SSL shutdown. But reporting an error all this time to the user is not an option because there were no data lost. Thus I need to distinguish the “bad behavior” case from actual errors.
I can understand that it is nice to make the API more consistent. But I want to make the API also cover the use cases it covered before in an obvious or at least documented way.
My reading of the documentation you cite is the following: Return value is 0 either on a clean shutdown or if the peer simply closed the connection, i.e incomplete shutdown. Return value is -1 either if some (unspecified error) occurs or if an action needs to be taken by the caller.
Based on this reading an EOF from the client instead of a clean shutdown perfectly fits the first case, i.e. return value of 0. And this is exactly what OpenSSL did for years. While it might be that of how OpenSSL internally works it never was true that 0 returned only in the documented cases it at least behaved this way in the documented cases. And applications relied on this documented behavior since it always worked as documented.
If the API stays at it is now it will cause unexpected errors in existing applications (it does already, i.e. breaks several tests in IO::Socket::SSL Perl module). And once the source of the error is figured out the applications or libraries will need to changed in that they need to do some new error handling to figure out if the problem happened due to EOF or something else. And of course backwards compatibility with the old behavior must be maintained somehow in many cases in case the application or library is compiled against an older version of OpenSSL.
In my opinion the API should not be simply changed like it is done currently, i.e. suddenly, silently, without updating the API documentation and even inside the stable branch where no such incompatible API changes would be expected. My expectation would be instead that OpenSSL does exactly the same what it now expects from all applications, i.e. figure out if this was EOF or not. This way the necessary change would only be needed in one place (OpenSSL) and the documented API would be kept stable.
So can you please clarify, how to detect the EOF? I would expect the API to allow for that and not to require going down to the underlying transport to find out.