openssl: Locking API wrappers are broken
In https://github.com/openssl/openssl/issues/20286#issuecomment-1701318469, @nicowilliams points out that the implementation of, at least, the pthread wrappers around pthreads are broken:
121 void ossl_crypto_mutex_lock(CRYPTO_MUTEX *mutex)
122 {
123 pthread_mutex_t *mutex_p;
124
125 mutex_p = (pthread_mutex_t *)mutex;
126 pthread_mutex_lock(mutex_p);
127 }
See that thread for the whole discussion, but here’s a summary: if the only reason pthread can fail is because of openssl library misuse, then the library should assert. At least there should be a debug mode where that can happen. In general, void functions are a horrible idea because they swallow/hide the status of any internal failures. So even after the ossl_assert calls are added, the API must be changed to return a work/errors indicator.
About this issue
- Original URL
- State: open
- Created 10 months ago
- Reactions: 1
- Comments: 16 (9 by maintainers)
Right, exactly. I support this.
Then those wrapper functions will return OK
Err, “undefined behaviour” is clearly listed there under “default” for relocking also.
I don’t view the error code approach as sane or safe. It is not too dissimilar from trying to mask SIGSEGV and make things “recoverable” in the event of a protection fault to make a program “resilient” to bugs. SIGSEGV should never happen but if it does, we must terminate the program as it is the only available safe course of action. See also libcs terminating on heap corruption etc. — it is the only safe recourse.
The reality is that if this breaks any attempt to use any part of OpenSSL which touches the affected state is liable to break, or deadlock, or so on, even free operations. What we have with our existing RW lock API is a bunch of false promises that if some OpenSSL operation fails due to a lock failing (which is always a bug anyway) that somehow this is a safely recoverable condition — but it isn’t.
@nicowilliams Speaking more generally — it is interesting that you’ve worked on Solaris/Illumos — I’m reminded of a discussion I listened to once which discussed the different approaches to assert handling in Illumos v. Linux.
As I understand it, Illumos handles assertion failures in the kernel as panics, on the premise that if something has gone wrong with the kernel’s invariants, the only safe thing to do is stop. This is based on Illumos having integrity as a primary value. By comparison Linux has its
OOPSmacro… which logs an assertion failure and then… continues running. Which is actually a pretty disturbing approach to quality and integrity of critical system software. I strongly support the Illumos approach here.Quite honestly, code that requires to check locks’ return code has no chance to ever fly, because once you engage into this you generally have to imagine super complexe schemes to allow to roll back in case the impossible error happens and that often makes the whole code impossible. I’d rather suggest that these call an application-registered abort function if it ever happens (and it must not as you cannot recover from this if you have any lock held).