openssl: 3.0.12 reveals pkcs11 engine / SoftHSM cleanup issue causing a segfault on exit
Hi,
the testsuite for libp11-0.4.12 breaks in 3.0.12, works in 3.0.11. The testsuite: https://sources.debian.org/src/libp11/0.4.12-1/debian/tests/engine/ The segfault occurs while generating a certificate request doing
openssl req -engine pkcs11 -new -key 'pkcs11:model=SoftHSM%20v2;object=test-key;pin-value=1234' \
-keyform engine -out /tmp/tmp.NmBE19AHMC/req.pem -text -x509 -subj /CN=libp11-DEP8
The segfault:
Engine "pkcs11" set.
Program received signal SIGSEGV, Segmentation fault.
SlotManager::getSlot (this=0x0, slotID=slotID@entry=1769803198) at ./src/lib/slot_mgr/SlotManager.cpp:174
174 ./src/lib/slot_mgr/SlotManager.cpp: No such file or directory.
(gdb) bt
#0 SlotManager::getSlot (this=0x0, slotID=slotID@entry=1769803198) at ./src/lib/slot_mgr/SlotManager.cpp:174
#1 0x00007ffff7783fd9 in SoftHSM::C_CloseAllSessions (this=0x5555556587d0, slotID=slotID@entry=1769803198) at ./src/lib/SoftHSM.cpp:1386
#2 0x00007ffff7764414 in C_CloseAllSessions (slotID=1769803198) at ./src/lib/main.cpp:347
#3 0x00007ffff7f099ba in pkcs11_slot_unref (slot=slot@entry=0x5555556ab730) at ./src/p11_slot.c:433
#4 0x00007ffff7f09a50 in pkcs11_release_slot (slot=0x5555556698d0) at ./src/p11_slot.c:477
#5 pkcs11_release_all_slots (slots=0x5555556698d0, nslots=<optimized out>) at ./src/p11_slot.c:464
#6 0x00007ffff7f0a358 in PKCS11_release_all_slots (pctx=<optimized out>, slots=<optimized out>, nslots=<optimized out>) at ./src/p11_front.c:111
#7 0x00007ffff7f02f4e in ctx_finish (ctx=0x5555556450e0) at ./src/eng_back.c:352
#8 0x00007ffff7f00e38 in engine_finish (engine=<optimized out>) at ./src/eng_front.c:163
#9 0x00007ffff7be3953 in engine_unlocked_finish (e=0x555555657b20, unlock_for_handlers=unlock_for_handlers@entry=0) at ../crypto/engine/eng_init.c:64
#10 0x00007ffff7be5f86 in int_cleanup_cb_doall (p=0x5555556586b0) at ../crypto/engine/eng_table.c:183
#11 int_cleanup_cb_doall (p=0x5555556586b0) at ../crypto/engine/eng_table.c:177
#12 0x00007ffff7c2e10c in doall_util_fn (arg=0x0, func_arg=0x0, func=0x7ffff7be5f60 <int_cleanup_cb_doall>, use_arg=0, lh=0x555555657230) at ../crypto/lhash/lhash.c:197
#13 OPENSSL_LH_doall (lh=0x555555657230, func=func@entry=0x7ffff7be5f60 <int_cleanup_cb_doall>) at ../crypto/lhash/lhash.c:205
#14 0x00007ffff7be6365 in lh_ENGINE_PILE_doall (doall=0x7ffff7be5f60 <int_cleanup_cb_doall>, lh=<optimized out>) at ../crypto/engine/eng_local.h:159
#15 engine_table_cleanup (table=0x7ffff7e836f8 <rsa_table>) at ../crypto/engine/eng_table.c:192
#16 0x00007ffff7be3bda in engine_cleanup_cb_free (item=0x555555658a00) at ../crypto/engine/eng_lib.c:169
#17 0x00007ffff7cb0170 in OPENSSL_sk_pop_free (st=0x555555658ba0, func=0x7ffff7be3bd0 <engine_cleanup_cb_free>) at ../crypto/stack/stack.c:426
#18 0x00007ffff7be3fcd in sk_ENGINE_CLEANUP_ITEM_pop_free (freefunc=0x7ffff7be3bd0 <engine_cleanup_cb_free>, sk=<optimized out>) at ../crypto/engine/eng_local.h:48
#19 engine_cleanup_int () at ../crypto/engine/eng_lib.c:176
#20 0x00007ffff7c329ae in OPENSSL_cleanup () at ../crypto/init.c:418
#21 OPENSSL_cleanup () at ../crypto/init.c:344
#22 0x00007ffff785c955 in __run_exit_handlers (status=0, listp=0x7ffff79f1840 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true)
at ./stdlib/exit.c:111
#23 0x00007ffff785ca8a in __GI_exit (status=<optimized out>) at ./stdlib/exit.c:141
#24 0x00005555555962e9 in ?? ()
#25 0x00007ffff78456ca in __libc_start_call_main (main=main@entry=0x5555555961b0, argc=argc@entry=15, argv=argv@entry=0x7fffffffeb78) at ../sysdeps/nptl/libc_start_call_main.h:58
#26 0x00007ffff7845785 in __libc_start_main_impl (main=0x5555555961b0, argc=15, argv=0x7fffffffeb78, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>,
stack_end=0x7fffffffeb68) at ../csu/libc-start.c:360
#27 0x0000555555596491 in ?? ()
This can bisected to commit 02b87cc189fa8cae8d6f69d68449a9aecc0e34f0
Sebastian
About this issue
- Original URL
- State: open
- Created 8 months ago
- Comments: 119 (114 by maintainers)
Commits related to this issue
- Always call OPENSSL_free prior to exit If an engine is loaded during the course of operations, and if that engine is written in C++, the data in that library will be deleted using an atexit handler p... — committed to nhorman/openssl by nhorman 8 months ago
- Always call OPENSSL_cleanup prior to exit If an engine is loaded during the course of operations, and if that engine is written in C++, the data in that library will be deleted using an atexit handle... — committed to nhorman/openssl by nhorman 8 months ago
- Always call OPENSSL_cleanup prior to exit If an engine is loaded during the course of operations, and if that engine is written in C++, the data in that library will be deleted using an atexit handle... — committed to nhorman/openssl by nhorman 8 months ago
- softhsm: fix segfault during shutdown with openssl 3.0.12 For details see: https://github.com/openssl/openssl/issues/22508 https://github.com/opendnssec/SoftHSMv2/issues/729 Signed-off-by: Michael O... — committed to LeSpocky/ptxdist by michaelolbrich 8 months ago
- ALT: Disable enginepkcs11 tests openssl's engine pkcs11 + softhsm doesn't work with recent openssl: https://github.com/openssl/openssl/issues/22508 — committed to stanislavlevin/bind9 by stanislavlevin 6 months ago
- 343320: libp11-0.4.12.63-alt0.2 +1 more - Reenabled testing (closes: #48229). openssl3-3.1.5-alt2 - Backported upstream fix for https://github.com/openssl/openssl/issues/22508. — committed to altlinux/specs by stanislavlevin 3 months ago
- ALT: tests: reenable enginepkcs11 tests Fix for https://github.com/openssl/openssl/issues/22508 landed in Sisyphus. TODO: convert this to the patch — committed to stanislavlevin/bind9 by stanislavlevin 3 months ago
- ALT: tests: reenable enginepkcs11 tests Fix for https://github.com/openssl/openssl/issues/22508 landed in Sisyphus. TODO: convert this to the patch — committed to stanislavlevin/bind9 by stanislavlevin 3 months ago
- ALT: tests: reenable enginepkcs11 tests Fix for https://github.com/openssl/openssl/issues/22508 landed in Sisyphus. TODO: convert this to the patch — committed to stanislavlevin/bind9 by stanislavlevin 3 months ago
- ALT: tests: reenable enginepkcs11 tests Fix for https://github.com/openssl/openssl/issues/22508 landed in Sisyphus. TODO: convert this to the patch — committed to stanislavlevin/bind9 by stanislavlevin 3 months ago
- spec: reenable enginepkcs11 tests Fix for https://github.com/openssl/openssl/issues/22508 landed in Sisyphus. — committed to stanislavlevin/bind9 by stanislavlevin 3 months ago
- spec: reenable enginepkcs11 tests Fix for https://github.com/openssl/openssl/issues/22508 landed in Sisyphus. — committed to stanislavlevin/bind9 by stanislavlevin 3 months ago
- spec: reenable enginepkcs11 tests Fix for https://github.com/openssl/openssl/issues/22508 landed in Sisyphus. — committed to stanislavlevin/bind9 by stanislavlevin 3 months ago
The #23063 PR was merged, this should resolve this problem as a side effect. However the underlying issue is still there.
I think this is the only reasonable approach, the fact SoftHSM uses atexit handlers in a loadable module is broken for any application using it directly, not just openssl.
The PKCS#11 API contract is that it is the application that controls the module shutdown via the finalizer function, using an atexit handler violates the contract.
An application that installs an exit handler to free its resources on exit will face the same segfault issue when using the PKCS#11 API directly as well.
So this specific segfault is effectively a SoftHSM issue and not an OpenSSL issue.
I would agree with that.
That said, given the fact that (1) softHSM has completely ignored the issue, despite several people raising it and (2) the fact that any other C++ library would be susceptible to this issues, that would seem to me to constrain our forward looking actions to:
I don’t see why any of this needs to wait for 4.0. I think 4.0 could be where we force to use the libctx. We could deprecate the non-libctx APIs to encourage the new APIs.
@mattcaswell in most cases libraries should be mandated to use their own openssl ctx, or provide the application with a call to set the openssl ctx to use (or a way top tell the app they are done with the default context).
If libraries were to use their own context then they could simply free their own context without conflict with the main application.
Of course Openssl needs to be able to handle the case where some contexts are still out and about when OPENSSL_cleanup() is called, and defer some un-initialization to when the latest context is actually freed by the latest library.
There are issues with mandating this.
We cannot mandate that libraries using OpenSSL should call
OPENSSL_cleanup. If the application and a library it is using are both using OpenSSL then the library must not callOPENSSL_cleanupin case the application is still using it. We could recommend that applications should callOPENSSL_cleanupand libraries should not. But then we still have problems where an application does not use OpenSSL directly at all, but a library that the application depends on does.I am suggesting softhsm is changed to be made out of 2 dsos, where the C wrapper loads the C++ one. This problem is not unique to libp11.
Any application using the pkcs11 API directly can trigger this same behavior (in fact I experienced it already early on with the pkcs11-provider and softhsm) if they decide to unload the pkcs11 module in an atexit() handler (which I think is perfectly fine if the application is in control of it).
It is a softhsm issue and should be directed to that project for resolution.
To expand on that, we should not be shutting down an engine/provider in atexit, that should have been done before that. There are basically 2 phases to the cleanup.
What is not clear to me currently is at what time and by who SoftHSM gets unloaded, and why p11 still has a reference to it’s data.
Isn’t this a libp11 problem to solve?
@nhorman said:
and
I’m not seeing an increase in refcnt.
The pattern for each legacy key type is:
Lets assume the RSA object starts with a ref count of 1. After the
EVP_PKEY_get1_RSAcall it now has a recnt of 2. TheEVP_PKEY_set1_RSAcall does multiple things:Finally
RSA_freeis called which decrements the refcnt to 1.So we start with a refcnt of 1, and end with a refcnt of 1.
@rsbeckerca atexit should guarantee that registered function order is exactly the reverse of the registration order (ie… calling atexit(a), atexit(b), atexit© implies that when exit is called, the registered functions should be called explicitly in c, b, a order.
That said, I’m struggling to see how this can ever work. I say that because OPENSSL_init_crypto should always be called prior to any dynamic engine being loaded (at least, I think it should). Given the OPENSSL_init_crypto is where the OPENSSL_cleanup handler is atexit registered, we should be guararnted that it will run after any subsequent atexit calls are made.
Assuming that softhsm registers an atexit handler (be it via explicit call, which I can’t find), or via some wrapper magic in the c++ runtime, we should be guaranteed that the softhsm atexit handler will run prior to the openssl atexit handler. The implication here being that we should be guaranteed that the softhsm singleton object gets freed prior to the pcks11 library shutting down, causing the error the use after free error we are seeing.
I’m sure I’m missing something, but right now this seems very broken.
There are 2 issues here.
Issue 1, softhsm should really , really , really, use their own libctx,a nd stop stomping on the default context that should be reserved for the main application.
Issue 2 atexit() handler, which is often called before the engine tries to close all sessions, one could argue that openssl is wrong in installing an atexit() handler w/o being directed by the application which is the only entity that has a chance of knowin in which order atexit() handlers should be installed (as it influences the order in which they are run.
But really softhsm should handle this with fixing issue 1, then it can use its own context and not get messed up when it is requested to close sessions after openssl already destroyed the default library context.
Note that softhsm is not the only pkcs11 module that has these issues, so much so I had to add this PR to the pkcs11-provider which experienced the same problems as the engine early on: https://github.com/latchset/pkcs11-provider/pull/225
I’ll try to look at it.