rust_libloading: SEGFAULT sometimes occurs when testing libloading code
Operating system is Fedora 27.
uname -r
4.15.4-300.fc27.x86_64
When running cargo test
multiple times, sometimes test fails raising a segmentation fault.
#[test]
fn libloading() {
let lib = libloading::Library::new("libdltest.so").unwrap();
unsafe {
let test_fn: libloading::Symbol<unsafe extern fn(i32) -> i32> = lib.get(b"test").unwrap();
assert!(test_fn(100) == 200);
}
}
test test::libloading ... error: process didn't exit successfully: `/home/User/test_libloading/target/debug/deps/test_libloading-b3311c80df7834fd libloading` (signal: 11, SIGSEGV: invalid memory reference)
But when this code runs in an executable crate, It seems to run without the segfault even when I execute it multiple times.
fn main() {
let lib = libloading::Library::new("libdltest.so").unwrap();
unsafe {
let test_fn: libloading::Symbol<unsafe extern fn(i32) -> i32> = lib.get(b"test").unwrap();
assert!(test_fn(100) == 200);
}
}
About this issue
- Original URL
- State: open
- Created 6 years ago
- Comments: 16 (6 by maintainers)
Ok, the workaround with
RTLD_NODELETE
works (the values are from the libc-crate):I… don’t know. Not unloading the libraries will break a fairly common use-case of hot reloading, as
glibc
’s loader will not attempt to (re-)load a library that shares the same name with an already loaded DSO, even if the file contents have changed.With a recent update of
libloading
theLibrary::new
and the OS-specific counterparts already becameunsafe
for fairly similar reasons, so I’m kind of inclined to just consider library loading (and unloading) extremely unsafe and try document all of the cases people need to consider when they load libraries. And leave it up to them to figure out what makes most sense for their situations.@jackcmay This is literally the TLS issue, similar one to the issue which OS X fixed by explicitly not unloading the dynamic libraries if they have any thread locals, implicitly applying
RTLD_NODELETE
to them. This is the first time I see it being reproduced on Linux though.I’m fairly confident that you can construct a similarly problematic sample when the loader is C/C++ code (i.e. taking
libloading
out of the equation), as well as without a dynamic library written in Rust (by, for example, using the C++’sthread
variables, takingRust
’s libstd out of equation as well). I think you’re best off reporting this against glibc itself.Just like the OS X counterpart https://github.com/nagisa/rust_libloading/issues/5, I’m inclined to call this issue not-a-bug in
libloading
. It is most likely not a bug in rust’s libstd as well.In your case a viable workaround, if
RTLD_NODELETE
is not appropriate, could be ensuring that you do not use anything that creates thread-local variables (such as stdio from libstd).I know that macOS does employ this technique – they don’t unload the machine code for libraries that contain thread-local variables, but they are both in a position to selectively apply this, and they are able to make sure this approach of theirs does not end up having unfortunate side effects (unsuccessfully).
We’re not in that kind of position I fear. If we end up doing this, I would be inclined to adjust the library so that we don’t implicitly implement
Drop
at all – we do have close after all. This would help somewhat with issues like #46, which is effectively the same issue in a different context, too.… says @nagisa while looking at “Safer bindings around system dynamic library loading primitives” and wondering if they should remove the first word out of that sentence…
I may be wrong, but I would think that hot reloading would be less common than other uses cases (e.g. opening a typical shared library and calling a function). If that’s true, then I think it would make more sense for the (less-typical) hot-reloading code to opt into automatically unloading, instead of (more-typical) code needing to opt out of automatically unloading.
Looking at the issues that are linked here, it looks like severe; different projects have run into this. I think the fact that the issue occurs in the
Drop
impl makes this particularly subtle - while unsafe code authors should be aware of the need to audit surrounding safe code, I think it’s particularly easy to forget to consider what happens when a (fully initialized) struct is dropped, sometimes behind multiple layers of indirection.While this not a bug in
libloading
, I think it’s a major footgun for anyone using this library. As https://github.com/Aaron1011/pthread_dlopen demonstrates, it’s not possible to determine ahead of time if a shared library has registered TLS destructors (if it’s even possible at all), since they may be registered long after the library is loaded. If the usage oflibloading
is encapsulated by a safe API, this can easily lead to confusing segfaults without any obvious cause.In light of this, I think it might be a good idea of libloading to change its default behavior to not unload shared libraries on Linux when a
Library
is dropped. While this is obviously a hack, the only alternative is for every single consumer of the crate to add in its own special handling for Linux. Even if glibc released a fix tomorrow, older versions of glibc will still be in use for years. Since glibc is almost always dynamically linked, individual crate authors are generally stuck with whatever glibc is in use on the machine the crate ends up running on.Concretely, I think that
Library
’sDrop
implementation should not unload libraries on Linux, unless the user explicitly requests it (e.g. via a new methodlibrary.unload_on_drop()
). While this would have side effects (increased memory usage, and skipping of__attribute__((destructor))
functions), I think preventing crashes by default is more important.@nagisa What are your thoughts?
I’ve created a standalone C reproducer, using
pthread_key_create
: https://github.com/Aaron1011/pthread_dlopenIt does the following:
main()
, and block on it usingpthread_join
pthread_key_create
with a destrutor function, and callpthread_setspecific
with a non-NULL value to force the destrutor to actually run on thread exit.dlclose
on the shared library.This causes the following segfault:
It appears that calling
dlclose
on a shared library does not unregister any pthreads destructors (and presumably, any destructors registered using the same underlying mechanism that pthreads uses).