rust_libloading: SEGFAULT sometimes occurs when testing libloading code

Operating system is Fedora 27.

uname -r
4.15.4-300.fc27.x86_64

When running cargo test multiple times, sometimes test fails raising a segmentation fault.

#[test]
fn libloading() {
    let lib = libloading::Library::new("libdltest.so").unwrap();
    unsafe {
        let test_fn: libloading::Symbol<unsafe extern fn(i32) -> i32> = lib.get(b"test").unwrap();
        assert!(test_fn(100) == 200);
    }
}
test test::libloading ... error: process didn't exit successfully: `/home/User/test_libloading/target/debug/deps/test_libloading-b3311c80df7834fd libloading` (signal: 11, SIGSEGV: invalid memory reference)

But when this code runs in an executable crate, It seems to run without the segfault even when I execute it multiple times.

fn main() {
    let lib = libloading::Library::new("libdltest.so").unwrap();
    unsafe {
        let test_fn: libloading::Symbol<unsafe extern fn(i32) -> i32> = lib.get(b"test").unwrap();
        assert!(test_fn(100) == 200);
    }
}

About this issue

  • Original URL
  • State: open
  • Created 6 years ago
  • Comments: 16 (6 by maintainers)

Commits related to this issue

Most upvoted comments

Ok, the workaround with RTLD_NODELETE works (the values are from the libc-crate):

// Load and initialize library
#[cfg(target_os = "linux")]
let library: Library = {
    // Load library with `RTLD_NOW | RTLD_NODELETE` to fix a SIGSEGV
    ::libloading::os::unix::Library::open(Some(path.as_ref()), 0x2 | 0x1000)?.into()
};
#[cfg(not(target_os = "linux"))]
let library = Library::new(path.as_ref())?;

I… don’t know. Not unloading the libraries will break a fairly common use-case of hot reloading, as glibc’s loader will not attempt to (re-)load a library that shares the same name with an already loaded DSO, even if the file contents have changed.

With a recent update of libloading the Library::new and the OS-specific counterparts already became unsafe for fairly similar reasons, so I’m kind of inclined to just consider library loading (and unloading) extremely unsafe and try document all of the cases people need to consider when they load libraries. And leave it up to them to figure out what makes most sense for their situations.

@jackcmay This is literally the TLS issue, similar one to the issue which OS X fixed by explicitly not unloading the dynamic libraries if they have any thread locals, implicitly applying RTLD_NODELETE to them. This is the first time I see it being reproduced on Linux though.

I’m fairly confident that you can construct a similarly problematic sample when the loader is C/C++ code (i.e. taking libloading out of the equation), as well as without a dynamic library written in Rust (by, for example, using the C++’s thread variables, taking Rust’s libstd out of equation as well). I think you’re best off reporting this against glibc itself.

Just like the OS X counterpart https://github.com/nagisa/rust_libloading/issues/5, I’m inclined to call this issue not-a-bug in libloading. It is most likely not a bug in rust’s libstd as well.

In your case a viable workaround, if RTLD_NODELETE is not appropriate, could be ensuring that you do not use anything that creates thread-local variables (such as stdio from libstd).

I know that macOS does employ this technique – they don’t unload the machine code for libraries that contain thread-local variables, but they are both in a position to selectively apply this, and they are able to make sure this approach of theirs does not end up having unfortunate side effects (unsuccessfully).

We’re not in that kind of position I fear. If we end up doing this, I would be inclined to adjust the library so that we don’t implicitly implement Drop at all – we do have close after all. This would help somewhat with issues like #46, which is effectively the same issue in a different context, too.

… says @nagisa while looking at “Safer bindings around system dynamic library loading primitives” and wondering if they should remove the first word out of that sentence…

Not unloading the libraries will break a fairly common use-case of hot reloading

I may be wrong, but I would think that hot reloading would be less common than other uses cases (e.g. opening a typical shared library and calling a function). If that’s true, then I think it would make more sense for the (less-typical) hot-reloading code to opt into automatically unloading, instead of (more-typical) code needing to opt out of automatically unloading.

Looking at the issues that are linked here, it looks like severe; different projects have run into this. I think the fact that the issue occurs in the Drop impl makes this particularly subtle - while unsafe code authors should be aware of the need to audit surrounding safe code, I think it’s particularly easy to forget to consider what happens when a (fully initialized) struct is dropped, sometimes behind multiple layers of indirection.

While this not a bug in libloading, I think it’s a major footgun for anyone using this library. As https://github.com/Aaron1011/pthread_dlopen demonstrates, it’s not possible to determine ahead of time if a shared library has registered TLS destructors (if it’s even possible at all), since they may be registered long after the library is loaded. If the usage of libloading is encapsulated by a safe API, this can easily lead to confusing segfaults without any obvious cause.

In light of this, I think it might be a good idea of libloading to change its default behavior to not unload shared libraries on Linux when a Library is dropped. While this is obviously a hack, the only alternative is for every single consumer of the crate to add in its own special handling for Linux. Even if glibc released a fix tomorrow, older versions of glibc will still be in use for years. Since glibc is almost always dynamically linked, individual crate authors are generally stuck with whatever glibc is in use on the machine the crate ends up running on.

Concretely, I think that Library’s Drop implementation should not unload libraries on Linux, unless the user explicitly requests it (e.g. via a new method library.unload_on_drop()). While this would have side effects (increased memory usage, and skipping of __attribute__((destructor)) functions), I think preventing crashes by default is more important.

@nagisa What are your thoughts?

I’ve created a standalone C reproducer, using pthread_key_create: https://github.com/Aaron1011/pthread_dlopen

It does the following:

  1. Spawn a new thread from main(), and block on it using pthread_join
  2. From the new thread, load a simple shared library, and call a function in it.
  3. In the shared library, call pthread_key_create with a destrutor function, and call pthread_setspecific with a non-NULL value to force the destrutor to actually run on thread exit.
  4. Back in the main program (on the thread), call dlclose on the shared library.
  5. Return from the thread function

This causes the following segfault:

[Current thread is 1 (Thread 0x7facc3ff4640 (LWP 701108))]
gef➤  bt
#0  0x00007facc4229129 in ?? ()
#1  0x00007facc41cd411 in __nptl_deallocate_tsd.part.0 () from /usr/lib/libpthread.so.0
#2  0x00007facc41ce2ba in start_thread () from /usr/lib/libpthread.so.0
#3  0x00007facc40f7053 in clone () from /usr/lib/libc.so.6
gef➤  q

It appears that calling dlclose on a shared library does not unregister any pthreads destructors (and presumably, any destructors registered using the same underlying mechanism that pthreads uses).