openssl: We shouldn't use atexit() in our libraries (or modules, engines and providers alike)

The general recommendation is that atexit() should never be used in shared objects, because the processing of the handlers is done at process exit, and might therefore happen after the shared object is unloaded; this is especially true when said shared object is loaded dynamically.

Linux solves that by making atexit() a per “image” (executable program or shared object) list of handlers, and processes each list at each “image” unload (for an executable program, that’s exit), which means that we don’t really see the issue there… or presumably on any platform that uses GNU libc.

Here are a few cases when we can find ourselves in this sort of situation:

  • A program linked statically with libcrypto loads a provider that’s linked with another libcrypto copy (statically or dynamically, that doesn’t really matter), all linked with the same shared libc.
  • A program linked with the standard libcrypto loads a provider that’s linked with a local libcrypto variant (built with a config target that has a shlib_variant attribute), all linked with the same shared libc.

An ugly workaround is to link each “image” with a static libc. Modules (loadable engines and providers) will not get cleaned up properly, but at least, the atexit handler list (which resides in libc or crt0) in each “image” will not contain handlers that live in another “image”.

A more robust solution is to duplicate what Linux does, i.e. have our own “atexit” in libcrypto, with its own handler list, and ensure that this handler list is processed on share object unload. We already have stubs to place such processing in for certain platforms (notably WIndows, see https://github.com/openssl/openssl/blob/master/crypto/dllmain.c) or already know how to that sort of thing (see https://github.com/openssl/openssl/blob/7ee992a5d931ab5ad9df00d2d8e47e1b7a72d7ac/providers/fips/self_test.c#L77-L155).


Related to #17059, where we have one of the example situations mentioned above

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Comments: 37 (37 by maintainers)

Most upvoted comments

Is there any chance that https://github.com/openssl/openssl/pull/23394 could be backported to 3.2.2, so we don’t have to wait years to resolve this? It’s an ongoing source of issues.

In addition to issues with dynamic libraries, atexit is hazardous in multi-threaded applications. atexit runs when the main thread returns from main, but there’s no guarantee that other threads have stopped by this point. This seems to be the cause of a shutdown crash in Python here: https://github.com/python/cpython/issues/114653

So the point is really that for 4.0, we would envisage making the OPENSSL_cleanup() call “mandatory”. What it does under the hood is not really relevant for the application.

@levitte can you explain why an atexit handler would be needed in the first place?

Back in the 1.0.x days, OpenSSL library init and deinit was a mess. There was a whole set of different functions that you had to call to initialise the library, and whole set of other ones required to deinitialise it. Depending on what your application was doing you may or may not need to call certain functions. It was very confusing, poorly documented and applications almost always got it wrong (even OpenSSL devs had a hard time keeping it straight what you were supposed to call when, and there were many inconsistencies within our own code base).

The concept was to get away from all of that and simplify things to remove the need for all that complexity and to try and do so in the least breaking way that we could. The solution was to to make most of those init/deinit functions no-ops and to automatically init and deinit the library. This way, applications that “inherited” the old way of init/deinit would continue to work but we would recommend the new simpler of way of requiring no initialisation.

Unfortunately while solving one major problem we introduced a whole new one. Probably we should have taken the hit of having a “breaking” change of requiring a single de-init call (OPENSSL_cleanup). This is only actually breaking if you care about mem leaks on exit.

They were added to not need OPENSSL_cleanup(). I don’t know the why of that.

I’m of the opinion that OPENSSL_cleanup() is the way to extricate ourselves from the current mess. However, stopping a library globally cleaning up is probably required.

Well, another effect of the duplicated static variable would be, if an engine pushes something on the error stack, that won’t be accessible by the caller, but probably the providers don’t have the same issue here?