miri: `_mm_movemask_epi8` unexpectedly triggers Undefined Behavior
use core::arch::x86_64::*;
#[target_feature(enable = "sse2")]
pub unsafe fn is_ascii_sse2(a: &[u8; 16]) -> bool {
let a = _mm_loadu_si128(a.as_ptr().cast());
let m = _mm_movemask_epi8(a);
m == 0
}
fn main() {
assert!(cfg!(target_feature = "sse2"));
let s = b"helloworld123456";
assert!(unsafe { is_ascii_sse2(s) });
}
error: Undefined Behavior: each element of a SIMD mask must be all-0-bits or all-1-bits
--> /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/../../stdarch/crates/core_arch/src/x86/sse2.rs:1381:5
|
1381 | simd_bitmask::<_, u16>(a.as_i8x16()) as u32 as i32
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ each element of a SIMD mask must be all-0-bits or all-1-bits
|
= help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
= help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
= note: BACKTRACE:
= note: inside `std::arch::x86_64::_mm_movemask_epi8` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/../../stdarch/crates/core_arch/src/x86/sse2.rs:1381:5
note: inside `is_ascii_sse2` at src/main.rs:6:13
See also:
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 42 (34 by maintainers)
The
simd_foointrinsics are definitely not intended to behave in all cases like the underlying platform intrinsics. There are plenty of situations where doing so would be undesirable or impossible, and has caused problems in the past.The documentation in rustc’s codegen says that
simd_bitmaskonly looks at the MSB:https://github.com/rust-lang/rust/blob/bed4ad65bf7a1cef39e3d66b3670189581b3b073/compiler/rustc_codegen_llvm/src/intrinsic.rs#L1081-L1089
I’d like us to have codegen tests that ensure this is optimized appropriately if we’re going to go this route. I wouldn’t feel good about shipping a perf regression to all
_mm_movemask_epi8users, since there are a lot (that’s quite an important function).It looks like the pattern
simd_lt then simd_bitmaskcan be correctly optimized to one instruction. I won’t be surprised if_mm_movemask_epi8is implemented by this way. But I’m not sure whether it will affect other optimizations.https://godbolt.org/z/7KKrcojY9
I think these are not quite as private as you might like – any tool that wants to analyze or verify Rust code for correctness (e.g. Miri but also all the static analysis and verification tools being built) need to have proper models of these functions. So they are about as private as the rest of MIR I would say. (Intrinsics are part of the MIR syntax for all intents and purposes.) We are certainly not making stability promises, but we should keep in mind that not all consumers of this API are in-tree.
For SIMD specifically, I don’t know for sure that there is an out-of-tree consumer, but I know for sure that there are people that would like to run their static analysis and verification tools on SIMD code and they asked about how to best handle that. Given that modeling all of stdarch is not feasible, the portable-simd intrinsics are a great way to increase the amount of code these tools can handle. That’s why Miri supports them.
Anyway, two intrinsics seem like a reasonable compromise to me for this particular case.
Would it make sense to split simd_bitmask into one intrinsic which only looks at the MSB of every lane and one which makes it UB to have lanes that are not all zero or all one? If not as I already said I’m fine with reverting this specific stdarch change. For most simd_* cases in stdarch there is only one possible behavior of the simd_* intrinsic that makes sense. For example for simd_add anything other than a regular twos complement lanewise addition for integer vector types should in all likelyhood use a different name like simd_add_unchecked. For those cases IMO stdarch should keep using them. I have tried to use simd_* intrinsics for float ops in stdarch, but there behavior is more ambiguous and in fact tests failed around NaN. As such I think the float operations in stdarch should keep using LLVM intrinsics. They aren’t all that important for cg_clif anyway as integer and bitwise operations are much more common.
To clarify, I mean the
simd_*compiler intrinsics, not the vendor intrinsics. So I don’t think it’s appropriate to use it here.