portable-simd: non-power-of-2 Simd types have wrong size

I tried this code: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=f4ee5c373508bcb37be6cda3bb627b87

#![feature(repr_simd)]
use core::alloc::Layout;
#[repr(simd)]
struct Simd<T, const N: usize>(pub [T; N]);

fn main() {
    assert_eq!(
        Layout::new::<Simd<i8, 3>>(),
        Layout::from_size_align(3, 1).unwrap()
    );
}

I expected to see this happen: run with no errors because Simd<T, N> types should be the same size as the corresponding array type [T; N] (something we unofficially agreed on iirc, but is not formally decided)

Instead, this happened:

thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `Layout { size: 4, align: 4 (1 << 2) }`,
 right: `Layout { size: 3, align: 1 (1 << 0) }`', src/main.rs:7:5

Meta

rustc --version --verbose:

<current nightly as of issue creation -- probably 2022-11-29 nightly>

crate version in Cargo.toml: N/A

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Comments: 42 (42 by maintainers)

Commits related to this issue

Most upvoted comments

Only the types in core::arch need to be abi compatible right? There are From impls to convert between the core::arch types and core::simd::Simd, but nothing that converts references for one type to references for the other type.

My commit does not change layout at all, it just changes the layout consistency check, assuming that the existing layout code is right. If the existing layout code is buggy then yeah I also ported the bug to the layout checker now. Feel free to fix both at the same time. 😃 The important part is the the size and align of the type must be uniquely determined by the size and align of the vector elements and the number of elements.

Considering this practically rather than idealistically for a moment, it’s questionable to me whether it’s really worth fixing the layout vs considering other options. bool is perhaps the most frequently used primitive, whereas std::simd is niche, and non-PoT vectors considerably more niche. Embedding a split memory/register repr throughout the compiler seems impractical and even if we decided that was the best option, I genuinely doubt anyone would ever implement it.

That said, not supporting non-PoT vectors could excessively complicate our API and perhaps delay stabilization while we try to sort out a simpler alternative to SupportedLaneCount (see #364). I think it might be worth exploring other options that would give us non-PoT vectors in a more practical way. I’ve thought of 3 alternatives:

  • Leave things as they are. The current implementation of non-PoT vectors works. We could leave them be and use either an assert or non-local error in functions like [T]::as_simd. This would leave us with the option of reducing the alignment in the future (maybe across an edition change).
  • Add a new CompactSimd or UnalignedSimd type. This type could implement all of the same functions and traits as Simd, but have a lesser alignment, either repr(Rust) or repr(compact_simd). To avoid a split memory/register repr, I would not make the type a vector type, but instead convert it to a repr(simd) type before passing it to the existing simd_* intrinsics. We could easily implement Into<Simd> for CompactSimd and perhaps Deref<Target = UnalignedSimd> for Simd. [T]::as_simd could still error, but have a matching [T]::as_compact_simd that works for any number of elements.
  • Make Simd repr(compact_simd) and convert before using. Like the previous example, but wrapped into one type. We could add new intrinsics simd_align and simd_unalign that convert between repr(simd) and repr(compact_simd), a no-op when alignments match. In this case, repr(compact_simd) would be a glorified repr(align(N)).

Personally, I like the first or second alternative best. LLVM seems to be able to generate pretty good code for non-PoT vectors, and I think messing with the alignment could cause problems. The second option would allow the user to choose which is more important–having a flexible memory layout or reliable codegen that doesn’t juggle layouts.

cc @programmerjake @scottmcm

Alright this was a bit painful to encode in LLVM IR but here’s an experiment so we can see what’s going on:

define ptr @add_one_i32x3(ptr %p) {
    %p_plus_one_i32x3 = getelementptr <3 x i32>, ptr %p, i32 1
    ret ptr %p_plus_one_i32x3
}

define <3 x i32> @load_i32x3_unaligned(ptr %p) {
    %v = load <3 x i32>, ptr %p, align 1
    ret <3 x i32> %v
}
add_one_i32x3:                          # @add_one_i32x3
        lea     rax, [rdi + 16]
        ret
load_i32x3_unaligned:                   # @load_i32x3_unaligned
        movsd   xmm0, qword ptr [rdi]           # xmm0 = mem[0],zero
        movss   xmm1, dword ptr [rdi + 8]       # xmm1 = mem[0],zero,zero,zero
        shufps  xmm1, xmm0, 48                  # xmm1 = xmm1[0,0],xmm0[3,0]
        shufps  xmm0, xmm1, 132                 # xmm0 = xmm0[0,1],xmm1[0,2]
        ret

Good news: load_i32x3_unaligned behaves as you’d expect, only reading 12 bytes, as qword(8)+dword(4). This would suggest <3 x i32> has an ā€œinherent/dataā€ size of 12 bytes (similar to how i96 works?).

Bad news: add_one_i32x3 adds 16 bytes to the pointer, which is not what you want. This would suggest <3 x i32> has a ā€œstrideā€ of 16 bytes (i.e. rounded up to the next power of 2).

Given this I don’t think I can in good conscience recommend messing around with Abi::Vector.

What you could do is add a #[repr(compact_simd)] (ā€œpackedā€ seems ambiguous but I basically mean ā€œdecrease alignment to the largest value that does not result in paddingā€), and then have layout logic pick:

  • Abi::Vector wherever possible
  • Abi::CompactedVector (a new variant which would have to be added)
    • for LLVM, this would need to use the memory/immediate split that exists for bool (note: ā€œimmediateā€ below basically means ā€œSSA valueā€, i.e. ā€œin a registerā€ except more abstract)
      • when in memory: bool is i8 and i32x3 would have to be [3 x i32] (if we need the type at all)
      • as ā€œimmediateā€: bool is i1 and i32x3 could be <3 x i32> (allowing all vector opts etc.)
      • memory->immediate (reading): the memory type is loaded before being converted
        • this is fine for bool (where the conversion is a i8->i1 truncation), but loading a [3 x i32] seems like a bad time
      • immediate->memory (writing): the immediate type is converted and then stored
        • again, this would require creating an array value (which likely lowers to 3 i32 stores)

We could/should even make the normal #[repr(simd)] disallow or warn when it encounters cases involving padding, instead of giving the user the padded type.

Now, it might be easier than the complication above, if I go back to (sorry it got delayed so much…):

However, I do not have strong confidence that nothing anywhere would ever do a GEPi on a vector type like <3 x i32> - while we can theoretically remove most uses of LLVM struct/array types, vector types like <3 x i32> will fundamentally have to stick around (for the ā€œimmediateā€ aka SSA aka ā€œin registersā€ side), and LLVM does not have a good way to guarantee that you’re not using types anywhere near memory.

Or in other words, you want rustc to use the <3 x i32> LLVM type but never let it compute its stride of 16 bytes (to use in e.g. GEPi).

An easy example (even after https://github.com/rust-lang/rust/pull/98615, the PR linked above), is that a Rust slice type like [i32x3], or even just calling .add(1) on a *const i32x3, will currently let LLVM use the stride of the type. At least struct fields only have constant offsets, but .add(N) would require us to generate a multiplication of the form N * 12, but we don’t do anything like that.

At least the Abi::CompactedVector idea allows you to have a ā€œsplit memory-vs-register semanticsā€ type, without impacting anything that works with #[repr(simd)] today (which does need the larger alignment to be compatible with LLVM, at least for now).

Reading more of the discussion above, you really cannot talk about llvm_alignof(Simd<T, N>) as if it’s different than the Rust alignment.

At the point where you start diverging in alignments from LLVM, you can no longer use primitive vector types and have to wrap them in packed structs to decrease alignment… except the vector is already rounded up to a ā€œplausible hardware sizeā€ by that point so the packed struct can no longer decrease the size!

You would probably need to ā€œjustā€ not use non-pow2 LLVM vector types, i.e. do not have Abi::Vector for such malformed #[repr(simd)] types. Or maybe they can still be used when by-value but great care must be taken around loading and storing - you effectively would need to generate e.g. 3 loads/stores to read/write to/from <3 x T> values (or weird FCA loads/stores of [3 x T] which LLVM would desugar).

Frankly, after what I’ve seen from GLSL (which has e.g. size=12 align=16 types that ā€œeatā€ padding when used as fields!), my humble suggestion might be to ban these types instead of giving them magical semantics, since you generally can’t ā€œtake backā€ the magic, even if you manage to make it work initially.