portable-simd: non-power-of-2 Simd types have wrong size

I tried this code: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=f4ee5c373508bcb37be6cda3bb627b87

#![feature(repr_simd)]
use core::alloc::Layout;
#[repr(simd)]
struct Simd<T, const N: usize>(pub [T; N]);

fn main() {
    assert_eq!(
        Layout::new::<Simd<i8, 3>>(),
        Layout::from_size_align(3, 1).unwrap()
    );
}

I expected to see this happen: run with no errors because Simd<T, N> types should be the same size as the corresponding array type [T; N] (something we unofficially agreed on iirc, but is not formally decided)

Instead, this happened:

thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `Layout { size: 4, align: 4 (1 << 2) }`,
 right: `Layout { size: 3, align: 1 (1 << 0) }`', src/main.rs:7:5

About this issue

Original URL
State: open
Created 2 years ago
Comments: 42 (42 by maintainers)

Commits related to this issue

Auto merge of #117116 - calebzulawski:repr-simd-packed, r=workingjubilee Implement repr(packed) for repr(simd) This allows creating vectors with non-power-of-2 lengths that do not have padding. See... — committed to rust-lang-ci/rust by bors 7 months ago
Rollup merge of #117116 - calebzulawski:repr-simd-packed, r=workingjubilee Implement repr(packed) for repr(simd) This allows creating vectors with non-power-of-2 lengths that do not have padding. S... — committed to GuillaumeGomez/rust by GuillaumeGomez 7 months ago
Auto merge of #117116 - calebzulawski:repr-simd-packed, r=workingjubilee Implement repr(packed) for repr(simd) This allows creating vectors with non-power-of-2 lengths that do not have padding. See... — committed to rust-lang-ci/rust by bors 7 months ago
Auto merge of #117116 - calebzulawski:repr-simd-packed, r=workingjubilee Implement repr(packed) for repr(simd) This allows creating vectors with non-power-of-2 lengths that do not have padding. See... — committed to rust-lang-ci/rust by bors 7 months ago
Auto merge of #117116 - calebzulawski:repr-simd-packed, r=workingjubilee Implement repr(packed) for repr(simd) This allows creating vectors with non-power-of-2 lengths that do not have padding. See... — committed to rust-lang/miri by bors 7 months ago
add all_lane_counts feature to enable non-power-of-2 lane counts <= 64 — committed to rust-lang/portable-simd by programmerjake 2 years ago
Auto merge of #117116 - calebzulawski:repr-simd-packed, r=workingjubilee Implement repr(packed) for repr(simd) This allows creating vectors with non-power-of-2 lengths that do not have padding. See... — committed to lnicola/rust-analyzer by bors 7 months ago
Auto merge of #117116 - calebzulawski:repr-simd-packed, r=workingjubilee Implement repr(packed) for repr(simd) This allows creating vectors with non-power-of-2 lengths that do not have padding. See... — committed to RalfJung/rust-analyzer by bors 7 months ago

Most upvoted comments

Only the types in core::arch need to be abi compatible right? There are From impls to convert between the core::arch types and core::simd::Simd, but nothing that converts references for one type to references for the other type.

bjorn3 on Dec 4, 2022

My commit does not change layout at all, it just changes the layout consistency check, assuming that the existing layout code is right. If the existing layout code is buggy then yeah I also ported the bug to the layout checker now. Feel free to fix both at the same time. 😃 The important part is the the size and align of the type must be uniquely determined by the size and align of the vector elements and the number of elements.

RalfJung on Dec 2, 2022

Considering this practically rather than idealistically for a moment, it’s questionable to me whether it’s really worth fixing the layout vs considering other options. bool is perhaps the most frequently used primitive, whereas std::simd is niche, and non-PoT vectors considerably more niche. Embedding a split memory/register repr throughout the compiler seems impractical and even if we decided that was the best option, I genuinely doubt anyone would ever implement it.

That said, not supporting non-PoT vectors could excessively complicate our API and perhaps delay stabilization while we try to sort out a simpler alternative to SupportedLaneCount (see #364). I think it might be worth exploring other options that would give us non-PoT vectors in a more practical way. I’ve thought of 3 alternatives:

Leave things as they are. The current implementation of non-PoT vectors works. We could leave them be and use either an assert or non-local error in functions like [T]::as_simd. This would leave us with the option of reducing the alignment in the future (maybe across an edition change).
Add a new CompactSimd or UnalignedSimd type. This type could implement all of the same functions and traits as Simd, but have a lesser alignment, either repr(Rust) or repr(compact_simd). To avoid a split memory/register repr, I would not make the type a vector type, but instead convert it to a repr(simd) type before passing it to the existing simd_* intrinsics. We could easily implement Into<Simd> for CompactSimd and perhaps Deref<Target = UnalignedSimd> for Simd. [T]::as_simd could still error, but have a matching [T]::as_compact_simd that works for any number of elements.
Make Simd repr(compact_simd) and convert before using. Like the previous example, but wrapped into one type. We could add new intrinsics simd_align and simd_unalign that convert between repr(simd) and repr(compact_simd), a no-op when alignments match. In this case, repr(compact_simd) would be a glorified repr(align(N)).

Personally, I like the first or second alternative best. LLVM seems to be able to generate pretty good code for non-PoT vectors, and I think messing with the alignment could cause problems. The second option would allow the user to choose which is more important–having a flexible memory layout or reliable codegen that doesn’t juggle layouts.

cc @programmerjake @scottmcm

calebzulawski on Sep 24, 2023

Alright this was a bit painful to encode in LLVM IR but here’s an experiment so we can see what’s going on:

define ptr @add_one_i32x3(ptr %p) {
    %p_plus_one_i32x3 = getelementptr <3 x i32>, ptr %p, i32 1
    ret ptr %p_plus_one_i32x3
}

define <3 x i32> @load_i32x3_unaligned(ptr %p) {
    %v = load <3 x i32>, ptr %p, align 1
    ret <3 x i32> %v
}

add_one_i32x3:                          # @add_one_i32x3
        lea     rax, [rdi + 16]
        ret
load_i32x3_unaligned:                   # @load_i32x3_unaligned
        movsd   xmm0, qword ptr [rdi]           # xmm0 = mem[0],zero
        movss   xmm1, dword ptr [rdi + 8]       # xmm1 = mem[0],zero,zero,zero
        shufps  xmm1, xmm0, 48                  # xmm1 = xmm1[0,0],xmm0[3,0]
        shufps  xmm0, xmm1, 132                 # xmm0 = xmm0[0,1],xmm1[0,2]
        ret

Good news: load_i32x3_unaligned behaves as you’d expect, only reading 12 bytes, as qword(8)+dword(4). This would suggest <3 x i32> has an “inherent/data” size of 12 bytes (similar to how i96 works?).

Bad news: add_one_i32x3 adds 16 bytes to the pointer, which is not what you want. This would suggest <3 x i32> has a “stride” of 16 bytes (i.e. rounded up to the next power of 2).

Given this I don’t think I can in good conscience recommend messing around with Abi::Vector.

What you could do is add a #[repr(compact_simd)] (“packed” seems ambiguous but I basically mean “decrease alignment to the largest value that does not result in padding”), and then have layout logic pick:

Abi::Vector wherever possible
Abi::CompactedVector (a new variant which would have to be added)
- for LLVM, this would need to use the memory/immediate split that exists for bool (note: “immediate” below basically means “SSA value”, i.e. “in a register” except more abstract)
  - when in memory: bool is i8 and i32x3 would have to be [3 x i32] (if we need the type at all)
  - as “immediate”: bool is i1 and i32x3 could be <3 x i32> (allowing all vector opts etc.)
  - memory->immediate (reading): the memory type is loaded before being converted
    - this is fine for bool (where the conversion is a i8->i1 truncation), but loading a [3 x i32] seems like a bad time
  - immediate->memory (writing): the immediate type is converted and then stored
    - again, this would require creating an array value (which likely lowers to 3 i32 stores)

We could/should even make the normal #[repr(simd)] disallow or warn when it encounters cases involving padding, instead of giving the user the padded type.

Now, it might be easier than the complication above, if I go back to (sorry it got delayed so much…):

https://github.com/rust-lang/rust/pull/98615

However, I do not have strong confidence that nothing anywhere would ever do a GEPi on a vector type like <3 x i32> - while we can theoretically remove most uses of LLVM struct/array types, vector types like <3 x i32> will fundamentally have to stick around (for the “immediate” aka SSA aka “in registers” side), and LLVM does not have a good way to guarantee that you’re not using types anywhere near memory.

Or in other words, you want rustc to use the <3 x i32> LLVM type but never let it compute its stride of 16 bytes (to use in e.g. GEPi).

An easy example (even after https://github.com/rust-lang/rust/pull/98615, the PR linked above), is that a Rust slice type like [i32x3], or even just calling .add(1) on a *const i32x3, will currently let LLVM use the stride of the type. At least struct fields only have constant offsets, but .add(N) would require us to generate a multiplication of the form N * 12, but we don’t do anything like that.

At least the Abi::CompactedVector idea allows you to have a “split memory-vs-register semantics” type, without impacting anything that works with #[repr(simd)] today (which does need the larger alignment to be compatible with LLVM, at least for now).

eddyb on Dec 4, 2022

Reading more of the discussion above, you really cannot talk about llvm_alignof(Simd<T, N>) as if it’s different than the Rust alignment.

At the point where you start diverging in alignments from LLVM, you can no longer use primitive vector types and ~~have to wrap them in packed structs to decrease alignment~~… except the vector is already rounded up to a “plausible hardware size” by that point so the packed struct can no longer decrease the size!

You would probably need to “just” not use non-pow2 LLVM vector types, i.e. do not have Abi::Vector for such malformed #[repr(simd)] types. Or maybe they can still be used when by-value but great care must be taken around loading and storing - you effectively would need to generate e.g. 3 loads/stores to read/write to/from <3 x T> values (or weird FCA loads/stores of [3 x T] which LLVM would desugar).

Frankly, after what I’ve seen from GLSL (which has e.g. size=12 align=16 types that “eat” padding when used as fields!), my humble suggestion might be to ban these types instead of giving them magical semantics, since you generally can’t “take back” the magic, even if you manage to make it work initially.

eddyb on Dec 2, 2022

portable-simd: non-power-of-2 Simd types have wrong size

Meta

About this issue

Commits related to this issue

Most upvoted comments