portable-simd: non-power-of-2 Simd types have wrong size
I tried this code: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=f4ee5c373508bcb37be6cda3bb627b87
#![feature(repr_simd)]
use core::alloc::Layout;
#[repr(simd)]
struct Simd<T, const N: usize>(pub [T; N]);
fn main() {
assert_eq!(
Layout::new::<Simd<i8, 3>>(),
Layout::from_size_align(3, 1).unwrap()
);
}
I expected to see this happen: run with no errors because Simd<T, N> types should be the same size as the corresponding array type [T; N] (something we unofficially agreed on iirc, but is not formally decided)
Instead, this happened:
thread 'main' panicked at 'assertion failed: `(left == right)`
left: `Layout { size: 4, align: 4 (1 << 2) }`,
right: `Layout { size: 3, align: 1 (1 << 0) }`', src/main.rs:7:5
Meta
rustc --version --verbose:
<current nightly as of issue creation -- probably 2022-11-29 nightly>
crate version in Cargo.toml:
N/A
About this issue
- Original URL
- State: open
- Created 2 years ago
- Comments: 42 (42 by maintainers)
Commits related to this issue
- Auto merge of #117116 - calebzulawski:repr-simd-packed, r=workingjubilee Implement repr(packed) for repr(simd) This allows creating vectors with non-power-of-2 lengths that do not have padding. See... — committed to rust-lang-ci/rust by bors 7 months ago
- Rollup merge of #117116 - calebzulawski:repr-simd-packed, r=workingjubilee Implement repr(packed) for repr(simd) This allows creating vectors with non-power-of-2 lengths that do not have padding. S... — committed to GuillaumeGomez/rust by GuillaumeGomez 7 months ago
- Auto merge of #117116 - calebzulawski:repr-simd-packed, r=workingjubilee Implement repr(packed) for repr(simd) This allows creating vectors with non-power-of-2 lengths that do not have padding. See... — committed to rust-lang-ci/rust by bors 7 months ago
- Auto merge of #117116 - calebzulawski:repr-simd-packed, r=workingjubilee Implement repr(packed) for repr(simd) This allows creating vectors with non-power-of-2 lengths that do not have padding. See... — committed to rust-lang-ci/rust by bors 7 months ago
- Auto merge of #117116 - calebzulawski:repr-simd-packed, r=workingjubilee Implement repr(packed) for repr(simd) This allows creating vectors with non-power-of-2 lengths that do not have padding. See... — committed to rust-lang/miri by bors 7 months ago
- add all_lane_counts feature to enable non-power-of-2 lane counts <= 64 — committed to rust-lang/portable-simd by programmerjake 2 years ago
- Auto merge of #117116 - calebzulawski:repr-simd-packed, r=workingjubilee Implement repr(packed) for repr(simd) This allows creating vectors with non-power-of-2 lengths that do not have padding. See... — committed to lnicola/rust-analyzer by bors 7 months ago
- Auto merge of #117116 - calebzulawski:repr-simd-packed, r=workingjubilee Implement repr(packed) for repr(simd) This allows creating vectors with non-power-of-2 lengths that do not have padding. See... — committed to RalfJung/rust-analyzer by bors 7 months ago
Only the types in
core::archneed to be abi compatible right? There areFromimpls to convert between thecore::archtypes andcore::simd::Simd, but nothing that converts references for one type to references for the other type.My commit does not change layout at all, it just changes the layout consistency check, assuming that the existing layout code is right. If the existing layout code is buggy then yeah I also ported the bug to the layout checker now. Feel free to fix both at the same time. š The important part is the the size and align of the type must be uniquely determined by the size and align of the vector elements and the number of elements.
Considering this practically rather than idealistically for a moment, itās questionable to me whether itās really worth fixing the layout vs considering other options.
boolis perhaps the most frequently used primitive, whereasstd::simdis niche, and non-PoT vectors considerably more niche. Embedding a split memory/register repr throughout the compiler seems impractical and even if we decided that was the best option, I genuinely doubt anyone would ever implement it.That said, not supporting non-PoT vectors could excessively complicate our API and perhaps delay stabilization while we try to sort out a simpler alternative to
SupportedLaneCount(see #364). I think it might be worth exploring other options that would give us non-PoT vectors in a more practical way. Iāve thought of 3 alternatives:[T]::as_simd. This would leave us with the option of reducing the alignment in the future (maybe across an edition change).CompactSimdorUnalignedSimdtype. This type could implement all of the same functions and traits asSimd, but have a lesser alignment, eitherrepr(Rust)orrepr(compact_simd). To avoid a split memory/register repr, I would not make the type a vector type, but instead convert it to arepr(simd)type before passing it to the existingsimd_*intrinsics. We could easily implementInto<Simd> for CompactSimdand perhapsDeref<Target = UnalignedSimd> for Simd.[T]::as_simdcould still error, but have a matching[T]::as_compact_simdthat works for any number of elements.repr(compact_simd)and convert before using. Like the previous example, but wrapped into one type. We could add new intrinsicssimd_alignandsimd_unalignthat convert betweenrepr(simd)andrepr(compact_simd), a no-op when alignments match. In this case,repr(compact_simd)would be a glorifiedrepr(align(N)).Personally, I like the first or second alternative best. LLVM seems to be able to generate pretty good code for non-PoT vectors, and I think messing with the alignment could cause problems. The second option would allow the user to choose which is more importantāhaving a flexible memory layout or reliable codegen that doesnāt juggle layouts.
cc @programmerjake @scottmcm
Alright this was a bit painful to encode in LLVM IR but hereās an experiment so we can see whatās going on:
Good news:
load_i32x3_unalignedbehaves as youād expect, only reading 12 bytes, asqword(8)+dword(4). This would suggest<3 x i32>has an āinherent/dataā size of 12 bytes (similar to howi96works?).Bad news:
add_one_i32x3adds 16 bytes to the pointer, which is not what you want. This would suggest<3 x i32>has a āstrideā of 16 bytes (i.e. rounded up to the next power of 2).Given this I donāt think I can in good conscience recommend messing around with
Abi::Vector.What you could do is add a
#[repr(compact_simd)](āpackedā seems ambiguous but I basically mean ādecrease alignment to the largest value that does not result in paddingā), and then have layout logic pick:Abi::Vectorwherever possibleAbi::CompactedVector(a new variant which would have to be added)bool(note: āimmediateā below basically means āSSA valueā, i.e. āin a registerā except more abstract)boolisi8andi32x3would have to be[3 x i32](if we need the type at all)boolisi1andi32x3could be<3 x i32>(allowing all vector opts etc.)loaded before being convertedbool(where the conversion is ai8->i1truncation), but loading a[3 x i32]seems like a bad timestoredi32stores)We could/should even make the normal
#[repr(simd)]disallow or warn when it encounters cases involving padding, instead of giving the user the padded type.Now, it might be easier than the complication above, if I go back to (sorry it got delayed so muchā¦):
However, I do not have strong confidence that nothing anywhere would ever do a GEPi on a vector type like
<3 x i32>- while we can theoretically remove most uses of LLVM struct/array types, vector types like<3 x i32>will fundamentally have to stick around (for the āimmediateā aka SSA aka āin registersā side), and LLVM does not have a good way to guarantee that youāre not using types anywhere near memory.Or in other words, you want
rustcto use the<3 x i32>LLVM type but never let it compute its stride of 16 bytes (to use in e.g. GEPi).An easy example (even after https://github.com/rust-lang/rust/pull/98615, the PR linked above), is that a Rust slice type like
[i32x3], or even just calling.add(1)on a*const i32x3, will currently let LLVM use the stride of the type. At leaststructfields only have constant offsets, but.add(N)would require us to generate a multiplication of the formN * 12, but we donāt do anything like that.At least the
Abi::CompactedVectoridea allows you to have a āsplit memory-vs-register semanticsā type, without impacting anything that works with#[repr(simd)]today (which does need the larger alignment to be compatible with LLVM, at least for now).Reading more of the discussion above, you really cannot talk about
llvm_alignof(Simd<T, N>)as if itās different than the Rust alignment.At the point where you start diverging in alignments from LLVM, you can no longer use primitive vector types and
have to wrap them in packed structs to decrease alignment⦠except the vector is already rounded up to a āplausible hardware sizeā by that point so the packed struct can no longer decrease the size!You would probably need to ājustā not use non-pow2 LLVM vector types, i.e. do not have
Abi::Vectorfor such malformed#[repr(simd)]types. Or maybe they can still be used when by-value but great care must be taken around loading and storing - you effectively would need to generate e.g. 3 loads/stores to read/write to/from<3 x T>values (or weird FCA loads/stores of[3 x T]which LLVM would desugar).Frankly, after what Iāve seen from GLSL (which has e.g. size=12 align=16 types that āeatā padding when used as fields!), my humble suggestion might be to ban these types instead of giving them magical semantics, since you generally canāt ātake backā the magic, even if you manage to make it work initially.