evmap: Current implementation is unsound. Segfault and double free are possible with out-of-sync maps from a bad `PartialEq` implementation.

So basically, the current implementation seems to rely on the PartialEq implementation of a value for removing it. This is totally unsafe for the very same reasons why retain is unsafe: You cannot guarantee that the PartialEq implementation gives the same result on two seperate calls with the same arguments.

use std::sync::atomic::{AtomicBool, Ordering::SeqCst};
static SNEAKY: AtomicBool = AtomicBool::new(false);

#[derive(Eq, Hash)]
struct Sneaky(Vec<u8>);
impl PartialEq for Sneaky {
    fn eq(&self, other: &Self) -> bool {
        if SNEAKY.swap(false, SeqCst) {
            false
        } else {
            self.0 == other.0
        }
    }
}

fn main() {
    let (_, mut w) = evmap::new::<u8, Box<Sneaky>>();
    w.insert(0, Box::new(Sneaky(vec![0])));
    w.refresh();
    w.remove(0, Box::new(Sneaky(vec![0])));
    SNEAKY.store(true, SeqCst);
    w.refresh();
    w.refresh();
    w.refresh();
}
$ cargo run
   Compiling evmaptest v0.1.0 (/home/frank/[...]/evmaptest)
    Finished dev [unoptimized + debuginfo] target(s) in 0.78s
     Running `target/debug/evmaptest`
[1]    30645 segmentation fault (core dumped)  cargo run

Change to struct Sneaky(u8); and Box::new(Sneaky(0)) and you get

free(): double free detected in tcache 2
[1]    31123 abort (core dumped)  cargo run

instead.


Since Rust doesn’t have (and won’t have anytime soon) an effects system, we cannot guarantee “pure” implementations for e.g. PartialEq on the values type. I guess the only way to fix this is to make sure that the equality check happens only once, and then e.g. some kind of raw index into the bag is determined and stored into the op-log. That index only is then used for removing the same value from the other map.

I’m not familiar with the implementation details of this crate as to assess if this is possible and how hard of a change this would be but it seems like some nontrivial changes are needed. I’d suspect that PartialEq implementations (and perhaps also the Hash implementation?) of both the keys and the value type might be able to cause problems (I haven’t tested the behavior of the map for bad/sneaky key types yet though).

Also, @jonhoo I wanna say thanks for the great coding streams.


A bit of testing shows that this exact code can produce a segfault on every version starting from version 5. I only tested the latest minor versions, i.e.: 5.1.1, 6.0.1, 7.1.3, 8.0.1, 9.0.1, 10.0.2, 11.0.0-alpha.1; as well as 5.0.0 and the current master and left-right branches.


This code variation
use std::sync::atomic::{AtomicBool, Ordering::SeqCst};
static SNEAKY: AtomicBool = AtomicBool::new(false);

#[derive(Eq, Hash, Debug)]
struct Sneaky(Vec<u8>);
impl PartialEq for Sneaky {
    fn eq(&self, other: &Self) -> bool {
        if SNEAKY.swap(false, SeqCst) {
            false
        } else {
            self.0 == other.0
        }
    }
}

fn main() {
    let (r, mut w) = evmap::new::<u8, Box<Sneaky>>();
    let dbg = || r.get_and(&0, |vs| {
        println!("{:?}", vs);
    });
    w.insert(0, Box::new(Sneaky(vec![0])));
    w.refresh();
    w.remove(0, Box::new(Sneaky(vec![0])));
    SNEAKY.store(true, SeqCst);
    w.refresh();
    r.get_and(&0, |vs| {
        println!("{:?}", vs);
    });
    w.refresh();
    w.refresh();
    r.get_and(&0, |vs| {
        println!("{:?}", vs);
    });
}

leads to segfault on 2.0.0, 2.0.2, 3.0.1, 4.2.3. (and probably every other version 2-4 as well).

Version 1.* didn’t have ShallowCopy yet, so it probably should be fine.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 17 (10 by maintainers)

Commits related to this issue

Most upvoted comments

And if they want to use third-party types, they’ll have to use the unsafe constructor to acknowledge that they’ve checked the type. How does that sound?

I feel like the option to derive TrustedHashEq for user-defined structs that e.g. only consist of standard-library types has its value too. Having the trait sealed instead of unsafe does take this option away.

But: A sealed trait can always be upgraded into an unsafe trait later (without any breaking changes), so I guess it could be reasonable to just start with the sealed version and see how big the need for some safe derive macros really is.

if someone implements ConsistentHashEq maliciously in their crate, and then someone else relies on that crate, they now have unsound behavior with no unsafe code in their crate. Maybe that’s fine, but it feels weird. The precedent from the standard library is good though.

That’s how unsafe code and inparticular unsafe traits work. If a library author creates an unsafe impl that violates the contract of ConsistentHashEq (i.e. Hash or Eq is not deterministic) then that unsafe code is “unsound”. It is not undefined behavior in and by itself but it allows for other, unsafe-free code to trigger undefined behavior. Some also call this “library-UB”, as in, the rust-evmap library defines that non-conforming ConsistentHashEq implementations can lead to undefined behavior. “library-UB” in contrast to “compiler-UB” which is when some code directly violates safety contracts that the compiler imposes (e.g. dereferencing an invalid memory address or reading a u8 from uninitialized memory).

if someone wants to have their keys or values be bytes::Bytes, they now need to create a newtype and derive the trait for that. It’s ShallowCopy all over again, except also for keys.

True. That’s a general problem with new traits from not super-popular crates (like e.g. serde).

A derive helps, but not a lot

I’m no expert in derive macros, but I guess a derive for ConsistentHashEq would need to both make sure that the same type also derives Eq and Ord (if something like this is possible) and it has to ensure that all the type’s fields are ConsistentHashEq as well (e.g. with generic bounds on generic type parameters). This can mostly only help users of evmap to safely derive ConsistentHashEq for their custom structs made up of standard library types (and perhaps newtypes with unsafe implementations as above). The newtypes are not very ergonomic, as you said:

Ultimately, if users have to either unsafely implement TrustedHashEq or the call to evmap::new is unsafe, I’m inclined to say that having new be unsafe is more ergonomic. It means you only have to check the safety in the place where you instantiate the map, and you don’t need any newtypes or additional trait implementations in evmap itself.

But do note that the ConsistentHashEq solution does not preclude the “unsafe new” solution. One can simply offer two functions for constructing, a safe new with a ConsistentHashEq bound and using the DefaultHasher hasher, and another (yet to be named better) new_asserting_consistency function that’s unsafe and only needs Hash + Eq. Furthermore, the with_hasher construction has to be unsafe anyways, since otherwise the Hasher itself (or the BuildHasher) could be non-deterministic. (Unless we want marker traits for something like ConsistentBuildHasher, too.)

I also think providing a safe interface would be nice, but only if it is actually safe to use, not if it just seems safe to use. If that makes sense?

Your argument almost sounds like a slippery slope argument, because if you can’t trust unsafe code to do the right thing, what can you trust?

Yup, you’re completely right, it is a slippery slope argument. Or rather, it stems from a deep-seated desire to avoid making people write unsafe wherever possible, unless they absolute have to, because I know how easy it is to get wrong. The temptation to just “throw an unsafe impl TrustedHashEq for Foo {} in there” is quite high. And I feel like we’re less likely to have users get into a bad spot if we instead say that the unsafety happens when you use a type than when you produce a type. My argument isn’t so much that “unsafe traits are bad”, as it is “only make people use unsafe if there isn’t another way”. But I also acknowledge that my argument is more “it feels like the wrong trade-off to me” than “it’s objectively better”.

I don’t think you need to implement this for all std types, only the common ones, integers, slices, references, smart pointers, and applicable collections. Beyond that people can use unsafe new. You only need to cover the common case with the safe version.

Actually, this gave me an idea that I think I would be comfortable with: to make TrustedHashEq (maybe under the name KnownStableHashEq) a sealed trait. We would only implement it inside evmap for common types that we’ve checked ourselves, and it would enable users that want to use any of those types to use a completely safe interface without any chance of foot-shooting. And if they want to use third-party types, they’ll have to use the unsafe constructor to acknowledge that they’ve checked the type. How does that sound?