miri: `_mm_movemask_epi8` unexpectedly triggers Undefined Behavior

playground

use core::arch::x86_64::*;

#[target_feature(enable = "sse2")]
pub unsafe fn is_ascii_sse2(a: &[u8; 16]) -> bool {
    let a = _mm_loadu_si128(a.as_ptr().cast());
    let m = _mm_movemask_epi8(a);
    m == 0
}

fn main() {
    assert!(cfg!(target_feature = "sse2"));
    let s = b"helloworld123456";
    assert!(unsafe { is_ascii_sse2(s) });
}

error: Undefined Behavior: each element of a SIMD mask must be all-0-bits or all-1-bits
    --> /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/../../stdarch/crates/core_arch/src/x86/sse2.rs:1381:5
     |
1381 |     simd_bitmask::<_, u16>(a.as_i8x16()) as u32 as i32
     |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ each element of a SIMD mask must be all-0-bits or all-1-bits
     |
     = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
     = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
     = note: BACKTRACE:
     = note: inside `std::arch::x86_64::_mm_movemask_epi8` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/../../stdarch/crates/core_arch/src/x86/sse2.rs:1381:5
note: inside `is_ascii_sse2` at src/main.rs:6:13

About this issue

Original URL
State: closed
Created 2 years ago
Comments: 42 (34 by maintainers)

Most upvoted comments

The simd_foo intrinsics are definitely not intended to behave in all cases like the underlying platform intrinsics. There are plenty of situations where doing so would be undesirable or impossible, and has caused problems in the past.

thomcc on Oct 27, 2022

The documentation in rustc’s codegen says that simd_bitmask only looks at the MSB:

https://github.com/rust-lang/rust/blob/bed4ad65bf7a1cef39e3d66b3670189581b3b073/compiler/rustc_codegen_llvm/src/intrinsic.rs#L1081-L1089

Amanieu on Oct 26, 2022

I’d like us to have codegen tests that ensure this is optimized appropriately if we’re going to go this route. I wouldn’t feel good about shipping a perf regression to all _mm_movemask_epi8 users, since there are a lot (that’s quite an important function).

thomcc on Oct 28, 2022

It looks like the pattern simd_lt then simd_bitmask can be correctly optimized to one instruction. I won’t be surprised if _mm_movemask_epi8 is implemented by this way. But I’m not sure whether it will affect other optimizations.

https://godbolt.org/z/7KKrcojY9

#![feature(portable_simd)]

extern crate core;

use core::arch::x86_64::*;
use core::simd::*;

#[inline(always)]
fn i8xn_bitmask<const N: usize>(x: Simd<i8, N>) -> <Mask<i8, N> as ToBitMask>::BitMask
where
    LaneCount<N>: SupportedLaneCount,
    Mask<i8, N>: ToBitMask,
{
    x.simd_lt(Simd::splat(0)).to_bitmask()
}

#[target_feature(enable = "sse2")]
pub unsafe fn is_ascii_sse2(a: &[u8; 16]) -> bool {
    let a = _mm_loadu_si128(a.as_ptr().cast());
    let m = i8xn_bitmask(a.into());
    m == 0
}

#[target_feature(enable = "avx2")]
pub unsafe fn is_ascii_avx2(a: &[u8; 32]) -> bool {
    let a = _mm256_loadu_si256(a.as_ptr().cast());
    let m = i8xn_bitmask(a.into());
    m == 0
}

Nugine on Oct 28, 2022

private internals APIs

I think these are not quite as private as you might like – any tool that wants to analyze or verify Rust code for correctness (e.g. Miri but also all the static analysis and verification tools being built) need to have proper models of these functions. So they are about as private as the rest of MIR I would say. (Intrinsics are part of the MIR syntax for all intents and purposes.) We are certainly not making stability promises, but we should keep in mind that not all consumers of this API are in-tree.

For SIMD specifically, I don’t know for sure that there is an out-of-tree consumer, but I know for sure that there are people that would like to run their static analysis and verification tools on SIMD code and they asked about how to best handle that. Given that modeling all of stdarch is not feasible, the portable-simd intrinsics are a great way to increase the amount of code these tools can handle. That’s why Miri supports them.

Anyway, two intrinsics seem like a reasonable compromise to me for this particular case.

RalfJung on Oct 27, 2022

Would it make sense to split simd_bitmask into one intrinsic which only looks at the MSB of every lane and one which makes it UB to have lanes that are not all zero or all one? If not as I already said I’m fine with reverting this specific stdarch change. For most simd_* cases in stdarch there is only one possible behavior of the simd_* intrinsic that makes sense. For example for simd_add anything other than a regular twos complement lanewise addition for integer vector types should in all likelyhood use a different name like simd_add_unchecked. For those cases IMO stdarch should keep using them. I have tried to use simd_* intrinsics for float ops in stdarch, but there behavior is more ambiguous and in fact tests failed around NaN. As such I think the float operations in stdarch should keep using LLVM intrinsics. They aren’t all that important for cg_clif anyway as integer and bitwise operations are much more common.

bjorn3 on Oct 27, 2022

To clarify, I mean the simd_* compiler intrinsics, not the vendor intrinsics. So I don’t think it’s appropriate to use it here.

calebzulawski on Oct 26, 2022