openssl: Dynamically loading openssl library with threads running breaks the signal mask on MacOS, is likely broken on other systems
I’ve been chasing a strange issue on MacOS (M2 processor), and I’ve finally traced it down to an issue in openssl. Version 3.3.1, installed with homebrew. I’m running on MacOS 13, not sure about other versions, but likely it’s still an issue.
I have a program that will dynamically load the openssl library after startup if the user requests a certain operation. I then had some threads hang in strange ways. I traced that down to the signal mask being modified dynamically while the thread was sitting in pselect(), and thus didn’t get woken up on a signal.
In the startup code of openssl on an ARM system, specifically in OPENSSL_cpuid_setup(), it calls sigprocmask() to set a signal mask, do some tests, and restore the signal mask. The behavior of sigprocmask() in a threaded system is undefined, it may only set the calling thread’s sigmask (like on Linux) or it may set the sigmask on all threads (like on MacOS). Both are issues.
On Linux, since it only sets the calling threads sigmask, the other threads are free to handle the signal, likely resulting in bad things happening. I haven’t seen this on Linux, but it seems like it could be an issue. Well, it may not be an issue, it looks like the signals involved are ones that will caused by and only delivered to the thread that caused them. So maybe not an issue there.
On MacOS, it modifies all threads’ signal masks, causing the issue I’m seeing.
I can work around this by dynamically linking ssl to the main program, I suppose, but that’s less than optimal.
I can think of two ways to fix the issue:
- It looks like the code is probing for specific instructions and getting signals. You could fork off another process to test it. That’s kind of ugly but should work.
- You can probably use pthread_sigmask() instead of sigprocmask, if it’s available, and assuming I’m right about the signals involved in this check. That might complicate linking, though.
This is likely an issue on PPC, S390, and sparcv9, too.
The following program will reproduce the issue. In my setup, you can compile with:
gcc -g -o ssl_thread_issue -Wall ssl_thread_issue.c
and then run it using:
DYLD_LIBRARY_PATH=/opt/homebrew/lib ./ssl_thread_issue
Here’s the program:
#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <errno.h>
#include <string.h>
#include <pthread.h>
#include <dlfcn.h>
#include <sys/select.h>
struct cross_thread_info {
int waitfd;
int err;
};
static void *
cross_thread(void *data)
{
struct cross_thread_info *info = data;
sigset_t sigmask, omask;
int rv = 0;
fd_set rfds;
rv = pthread_sigmask(SIG_SETMASK, NULL, &sigmask);
if (rv) {
perror("pthread_sigmask 3");
info->err = 1;
goto out_err;
}
if (!sigismember(&sigmask, SIGUSR1)) {
fprintf(stderr, "SIGUSR1 not in sigmask 1\n");
info->err = 1;
goto out_err;
}
FD_ZERO(&rfds);
FD_SET(info->waitfd, &rfds);
sigdelset(&sigmask, SIGUSR1);
rv = pthread_sigmask(SIG_SETMASK, &sigmask, &omask);
if (rv) {
perror("pthread_sigmask 4");
info->err = 1;
goto out_err;
}
rv = pselect(info->waitfd + 1, &rfds, NULL, NULL, NULL, NULL);
rv = pthread_sigmask(SIG_SETMASK, &omask, &sigmask);
if (rv) {
perror("pthread_sigmask 3");
info->err = 1;
goto out_err;
}
if (sigismember(&sigmask, SIGUSR1)) {
fprintf(stderr, "SIGUSR1 in sigmask 1\n");
info->err = 4;
goto out_err;
}
out_err:
return NULL;
}
static int
check_pselect_cross_thread(sigset_t mask)
{
int rv = 1;
int pipefds[2] = { -1, -1 };
struct cross_thread_info info;
pthread_t th;
char dummy = 0;
rv = pipe(pipefds);
if (rv == -1) {
perror("pipe");
return 1;
}
info.err = 0;
info.waitfd = pipefds[0];
rv = pthread_create(&th, NULL, cross_thread, &info);
sleep(2); /* Give the thread time to enter pselect() */
if (dlopen("libssl.dylib", RTLD_LAZY | RTLD_GLOBAL) == NULL) {
fprintf(stderr, "dlopen failed: %s\n", dlerror());
rv = 1;
}
write(pipefds[1], &dummy, 1);
pthread_join(th, NULL);
if (!rv)
rv = info.err;
if (!rv)
printf("dlopen does not affect other threads' sigmasks\n");
close(pipefds[0]);
close(pipefds[1]);
return rv;
}
int
main(int argc, char *argv[])
{
sigset_t mask;
int rv;
/* Start with SIGUSR1 blocked. */
rv = sigprocmask(SIG_SETMASK, NULL, &mask);
if (rv == -1) {
perror("sigprocmask 1");
return 1;
}
sigaddset(&mask, SIGUSR1);
rv = sigprocmask(SIG_SETMASK, &mask, NULL);
if (rv == -1) {
perror("sigprocmask 2");
return 1;
}
rv = check_pselect_cross_thread(mask);
return rv;
}
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 15 (13 by maintainers)
Commits related to this issue
- Backport crypto/armcap.c from master branch at 7b508cd1e1 together with .pl fixes Makes the SIGILL-based code easier to read, and doesn't use it on Apple Silicon or where getauxval() is present, ther... — committed to tom-cosgrove-arm/openssl by tom-cosgrove-arm a year ago
- Backport crypto/armcap.c from master branch This backports 7b508cd1e1 together with .pl fixes Makes the SIGILL-based code easier to read, and doesn't use it on Apple Silicon or where getauxval() is ... — committed to openssl/openssl by tom-cosgrove-arm a year ago
@t8m Having looked at this, there is probably greater risk in trying to make a branch-specific change, especially for 3.1. It’s not just for Darwin-based systems; the multi-threaded issue is present on Linux, too.
I propose (at least for 3.1) to bring in the
armcap.cfrom the master branch. This was thoroughly tested in multiple compile configurations when I first raised in, on both Darwin and Linux systems, and clearly separates out “Apple”, “have getauxval()” and “use SIGILL probing”. It’s been inmasterfor over 4 months now.Some issues were found on ARMv7 systems to do with visibility of
OPENSSL_armcap_Pfollowing its merging, but these were fixed in93370db1fcand7b508cd1e1by making the symbol.externin the assembler (.pl) files, and I propose to bring in those changes too.I wouldn’t bring in the latest performance optimisations for SHA-3, but 3.1 would gain the performance optimisations for AES on Apple Silicon M2 as those happened before the
armcap.crefactor.In terms of reducing risk, having the code as similar as possible in 3.1 and
masterwill mean any future fixes needed toarmcap.ccan be easily applied.Does that sound reasonable?