openssl: Out of date/incorrect advice in SSL_get_error docs as of 1.1.0c

This commit changed SSL_read to return -1 rather than 0 when BIO_read returns 0: https://github.com/openssl/openssl/commit/4880672a9b41a09a0984b55e219f02a2de7ab75e, which landed in the 1.1.0c release.

When SSL_get_error is called, it’ll return SSL_ERROR_SYSCALL, and the error stack will be empty. The man page for SSL_get_error says that at that point you should check SSL_read’s return code; a -1 means that there was a read error and to consult the BIO-specific error (e.g. errno), and 0 means that there was an EOF in violation of the protocol, but that doesn’t appear to be correct anymore.

Is this a bug in the implementation or the docs? If the new implementation is correct, I guess the new procedure should be to check the BIO error, and check if an error actually occurred to determine if it’s an EOF?

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 33 (28 by maintainers)

Commits related to this issue

Most upvoted comments

Can someone take a look at #1931 to see if the new documentation makes sense to them?

It kind of explains the new API.

But I wish that the documentation would state clearly how to detect EOF. For now I detect it by checking the errno on SSL_ERROR_SYSCALL and if it is 0 (i.e. no error) then I assume that it wasn’t actually an error in a system call as one would expect from this error code but that the underlying connection simply got closed (EOF). If this is the correct way it would be good that if it would be explicitly documented this way. Also, currently the documentation of this part only describes the behavior with UNIX sockets which does not help if one needs to detect EOF on different OS.

Still, I find the interface of getting a SSL_ERROR_SYSCALL when actually no error happened in a syscall unexpected and not intuitive. And it does not help that the documentation claims that SSL_ERROR_SYSCALL is thrown on I/O errors because EOF is not an I/O error at all.

FYI: bio_ssl.c does not follow the rule to avoid SSL_shutdown if previously SSL_ERROR_SSL or SSL_ERROR_SYSCALL were encountered, it is almost impossible to avoid the SSL_shutdown:

static int ssl_free(BIO *a)
{
    BIO_SSL *bs;

    if (a == NULL)
        return (0);
    bs = BIO_get_data(a);
    if (bs->ssl != NULL)
        SSL_shutdown(bs->ssl);

Could in theory also a fatal error happen in the bidirectional shutdown sequence ? Is there a difference between a “fatal” SSL_ERROR_SYSCALL and a non-fatal SSL_ERROR_SYSCALL from SSL_shutdown?

The manual on SSL_shutdown return value 0 says: "The shutdown is not yet finished. Call SSL_shutdown() for a second time, if a bidirectional shutdown shall be performed. The output of SSL_get_error(3) may be misleading, as an erroneous SSL_ERROR_SYSCALL may be flagged even though no error occurred. "

Is a SSL_ERROR_SSL/SYSCALL with return value -1 more serious than with return value 0 ?

It’s supposed to explain the old API, in what always worked.

The old behavior was explained in the old documentation already, i.e. it was documented that it returns 0 on EOF with unclean shutdown and it did that too. The new documentation does not make this promise anymore but does also not explain how to detect EOF in this case.

I’m wondering why you want to make a distinction between an unexpected EOF and other errors. The only correct way to close the connection is the SSL_ERROR_ZERO_RETURN.

Because in reality I have to deal with peers which don’t behave in the only correct way. These peers simply close the underlying TCP connection and don’t do an SSL shutdown. But reporting an error all this time to the user is not an option because there were no data lost. Thus I need to distinguish the “bad behavior” case from actual errors.

I can understand that it is nice to make the API more consistent. But I want to make the API also cover the use cases it covered before in an obvious or at least documented way.

My reading of the documentation you cite is the following: Return value is 0 either on a clean shutdown or if the peer simply closed the connection, i.e incomplete shutdown. Return value is -1 either if some (unspecified error) occurs or if an action needs to be taken by the caller.

Based on this reading an EOF from the client instead of a clean shutdown perfectly fits the first case, i.e. return value of 0. And this is exactly what OpenSSL did for years. While it might be that of how OpenSSL internally works it never was true that 0 returned only in the documented cases it at least behaved this way in the documented cases. And applications relied on this documented behavior since it always worked as documented.

If the API stays at it is now it will cause unexpected errors in existing applications (it does already, i.e. breaks several tests in IO::Socket::SSL Perl module). And once the source of the error is figured out the applications or libraries will need to changed in that they need to do some new error handling to figure out if the problem happened due to EOF or something else. And of course backwards compatibility with the old behavior must be maintained somehow in many cases in case the application or library is compiled against an older version of OpenSSL.

In my opinion the API should not be simply changed like it is done currently, i.e. suddenly, silently, without updating the API documentation and even inside the stable branch where no such incompatible API changes would be expected. My expectation would be instead that OpenSSL does exactly the same what it now expects from all applications, i.e. figure out if this was EOF or not. This way the necessary change would only be needed in one place (OpenSSL) and the documented API would be kept stable.

So can you please clarify, how to detect the EOF? I would expect the API to allow for that and not to require going down to the underlying transport to find out.