openssl: AIX: RAND_priv_bytes fails inside drbg initialization
Our framework uses OpenSSL 1.1.1c and occasionally one of our decryption multithreading tests fails on AIX. It was quiet a struggle to pin down at least one source of this failure since it doesn’t happen very often. But every now and then the RUN_ONCE() call in RAND_DRBG_get0_private() fails and leads to chaos among the calling functions. The following minimal example shows this kind of error:
/* example.c */
#include <pthread.h>
#include <openssl/rand.h>
pthread_barrier_t barrier;
void *worker(void *args) {
unsigned char buffer[10];
pthread_barrier_wait(&barrier);
if (RAND_priv_bytes(buffer, 10) != 1) {
*(int *)0 = 1; // (*) segfault in order to get a core dump
}
return NULL;
}
int main() {
if (pthread_barrier_init(&barrier, NULL, 2) != 0) {
return 1;
}
pthread_t t1;
pthread_t t2;
if (pthread_create(&t1, NULL, worker, NULL) != 0) {
return 1;
}
if (pthread_create(&t2, NULL, worker, NULL) != 0) {
return 1;
}
if (pthread_join(t1, NULL) != 0) {
return 1;
}
if (pthread_join(t2, NULL) != 0) {
return 1;
}
return 0;
}
In order to compile this example you need to call
$ xlc_r -o example example.c -I <OpenSSL-Include> -L <OpenSSL-Libs> -lcrypto
To run it sufficiently often I use
$ i=0; while true; do ((i++)); echo $i; ./client.2 || break; done
It is important to put a certain pressure onto the system and usually the while loop trips if this pressure is released. Unfortunately, I don’t know any other more reliable way to reproduce this error.
The core dump tells me, that one thread is of course at (*) and the other one is inside do_rand_drbg_init(), e.g.
_global_lock_common(??, ??, ??) at 0x90000000124098c
_mutex_lock(??, ??, ??) at 0x90000000124f0f0
pthread_once(??, ??) at 0x90000000124ad10
CRYPTO_THREAD_run_once() at 0x90000000825da6c
OPENSSL_init_crypto() at 0x90000000825ecb4
ERR_get_state() at 0x90000000825bcb8
ERR_set_mark() at 0x90000000825ba4c
IPRA.$syscall_random() at 0x900000008299a00
rand_pool_acquire_entropy() at 0x900000008299e24
rand_drbg_get_entropy() at 0x9000000082987bc
RAND_DRBG_instantiate() at 0x90000000829e9f8
IPRA.$drbg_setup() at 0x90000000829d1ec
do_rand_drbg_init() at 0x90000000829cd60
do_rand_drbg_init_ossl_() at 0x90000000829cccc
pthread_once(??, ??) at 0x90000000124adb4
CRYPTO_THREAD_run_once() at 0x90000000825da6c
RAND_DRBG_get0_private() at 0x90000000829d980
RAND_priv_bytes() at 0x90000000829918c
worker(args = (nil)), line 9 in "example.c"
The error can be observed with OpenSSL 1.1.1g. I compile the OpenSSL on AIX in the following configuration:
$ perl Configure aix64-cc --prefix=`pwd`/dist/ no-asm
The configuration dump can be found here.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 20 (14 by maintainers)
Sorry, this week I don’t have any time left to investigate this issue further, but next week I’ll return and try to get some more meaningful backtraces.
A quick fix for our framework was to call
RAND_priv_bytes()during the initialization of the framework surrounded by a mutex. This prevents any race condition inside those once functions.Interesting, I’ll check the master right now!