openssl: Locking API wrappers are broken

In https://github.com/openssl/openssl/issues/20286#issuecomment-1701318469, @nicowilliams points out that the implementation of, at least, the pthread wrappers around pthreads are broken:

121 void ossl_crypto_mutex_lock(CRYPTO_MUTEX *mutex)
122 {
123     pthread_mutex_t *mutex_p;
124
125     mutex_p = (pthread_mutex_t *)mutex;
126     pthread_mutex_lock(mutex_p);
127 }

See that thread for the whole discussion, but here’s a summary: if the only reason pthread can fail is because of openssl library misuse, then the library should assert. At least there should be a debug mode where that can happen. In general, void functions are a horrible idea because they swallow/hide the status of any internal failures. So even after the ossl_assert calls are added, the API must be changed to return a work/errors indicator.

About this issue

Original URL
State: open
Created 10 months ago
Reactions: 1
Comments: 16 (9 by maintainers)

Most upvoted comments

The default for SIGSEGV and SIGBUS is to kill the process, not to continue with UB. If you like that then you should like abort()ing on lock error, and it sounds like you do:

Right, exactly. I support this.

hlandau on Sep 1, 2023

Note that the windows equivalent of this function can’t return an error

Then those wrapper functions will return OK

richsalz on Sep 1, 2023

You are misreading the man page. The table you’re referring to is about what happens if you try to unlock a mutex you don’t hold:

Err, “undefined behaviour” is clearly listed there under “default” for relocking also.

I don’t view the error code approach as sane or safe. It is not too dissimilar from trying to mask SIGSEGV and make things “recoverable” in the event of a protection fault to make a program “resilient” to bugs. SIGSEGV should never happen but if it does, we must terminate the program as it is the only available safe course of action. See also libcs terminating on heap corruption etc. — it is the only safe recourse.

The reality is that if this breaks any attempt to use any part of OpenSSL which touches the affected state is liable to break, or deadlock, or so on, even free operations. What we have with our existing RW lock API is a bunch of false promises that if some OpenSSL operation fails due to a lock failing (which is always a bug anyway) that somehow this is a safely recoverable condition — but it isn’t.

@nicowilliams Speaking more generally — it is interesting that you’ve worked on Solaris/Illumos — I’m reminded of a discussion I listened to once which discussed the different approaches to assert handling in Illumos v. Linux.

As I understand it, Illumos handles assertion failures in the kernel as panics, on the premise that if something has gone wrong with the kernel’s invariants, the only safe thing to do is stop. This is based on Illumos having integrity as a primary value. By comparison Linux has its OOPS macro… which logs an assertion failure and then… continues running. Which is actually a pretty disturbing approach to quality and integrity of critical system software. I strongly support the Illumos approach here.

hlandau on Sep 1, 2023

Quite honestly, code that requires to check locks’ return code has no chance to ever fly, because once you engage into this you generally have to imagine super complexe schemes to allow to roll back in case the impossible error happens and that often makes the whole code impossible. I’d rather suggest that these call an application-registered abort function if it ever happens (and it must not as you cannot recover from this if you have any lock held).

wtarreau on Aug 31, 2023