may: Undefined behavior invoked by moving stacks between threads

A coroutine’s stack can be moved from one thread to another. However, a stack may always contain things that are not Send, breaking things.

This is one example:

extern crate may;

use std::cell::RefCell;
use std::rc::Rc;
use std::thread::ThreadId;

use may::coroutine;
use coroutine::yield_now;

thread_local!(static ID: RefCell<Option<Rc<ThreadId>>> = RefCell::new(None));

fn main() {
    may::config().set_io_workers(60);
    may::config().set_workers(60);
    let h = coroutine::spawn(move || {
        let v = (0..10000)
            .map(|i| {
                coroutine::spawn(|| {
                    let handle = Rc::new(std::thread::current().id());
                    ID.with(|id| {
                        *id.borrow_mut() = Some(Rc::clone(&handle));
                    });
                    for _ in 0..10000 {
                        if *handle != std::thread::current().id() {
                            println!("Access to Rc content without a mutex from a different thread, {:?} vs {:?}",
                                     *handle, std::thread::current().id());
                        }
                        yield_now();
                    }
                })
            })
            .collect::<Vec<_>>();
        for i in v {
            i.join().unwrap();
        }
    });
    h.join().unwrap();
}

If something else was accessing the Rc (like making copis of it) from the original thread (which it could, because it is accessible in the thread local storage), it would be undefined behaviour ‒ both the coroutine, that moved, and the thing in that thread (possibly other coroutine) could be accessing the counters in the Rc at the same time, or the data inside, which could be for example a RefCell.

Now, suggesting not to use thread local storage doesn’t solve anything, because:

  • I might not use thread local storage myself, but I can’t certainly be expected to audit all the libraries I use not to use thread local storage. Even the standard library uses thread local storage internally.
  • There are things that are not Send for other reasons. One example might be Zero-MQ sockets, which explode the whole application if ever touched from a different thread then they were created in.

I believe this problem is fundamental to any attempt to move stacks between threads in Rust. Such thing just breaks the Rust contract.

So my only suggestion is to create the coroutine in one thread (it is possible to check nothing Send crosses a closure boundary, so the closure can be safely sent to another thread) and then pin it there to that thread.

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 1
  • Comments: 18 (10 by maintainers)

Most upvoted comments

thanks all, I’m considering change the api spec to add unsafe declaration for spawning a coroutine, since it’s in an early stage, it’s a good time to do that. Apply the rust unsafe rules is important for users.

  • very annoying to use the API

It’s also very annoying to encounter UB in ‘safe’ code…

Your API already is unsafe. Pretending it’s safe won’t make it so, it’ll only hurt users.

Very good explanation about unsafe in rust was publish just recently: https://manishearth.github.io/blog/2017/12/24/undefined-vs-unsafe-in-rust/.

To cite:

Basically, in Rust a bit of code is “safe” if it cannot exhibit undefined behavior under all circumstances of that code being used.

I’d like to point out that this is not about just libstd (but I’ll try to reproduce it with libstd).

Anyway, if you really want to go the way to tell user „don’t do this“ and „do this“, that’s fine, but you should make corresponding methods unsafe, to ensure they read and follow the instructions. That’s the purpose of unsafe, it tells people they need to be careful. Unlike C++ people don’t expect to have to watch out for things.

What do you mean, by „seeing“ UB?

No, this code doesn’t contain UB itself, only points to a place where it could happen. But if I, for example, added ID.with(|id| Rc::clone(id.borrow())); Rc::clone(handle) into the inner loop, just above the conditional, that would already be UB. I’d would modifying the reference counts inside the Rc from multiple threads without synchronisation, possibly making it wrong and freeing the insides too early or something.

Note that UB can do anything. That includes pretending to work correctly when you watch. That’s the tricky part about UB ‒ you don’t have to see it, it can be lurk in the program and do arbitrarily bad things only from time to time (when the customer watches).

And I know I don’t want to use TLS from coroutines. The problem is, a blog post is not the appropriate solution. TLS isn’t the only problem, there are others ‒ system calls that are expected to happen on specific thread, FFI libraries, etc. You never know if a library you use might be using TLS and not telling you. You are not supposed to have to watch for these things in Rust if you’re not using unsafe.

If your library exposes a „safe“ interface which allows your user to invoke UB by it, the library is broken and dangerous to use. So you can either ensure the UB can never happen (by eg. not moving coroutines between threads after they are created), or mark the spawn method as unsafe, because it’s the exactly correct mechanism to tell the user of the library he has to check, that there’s something not covered by the compiler/library.