runtime: BroadcastScalarToVector256(byte*) broadcasts not correct value?
From https://twitter.com/HaroldAptroot/status/1099389327245828096
public static void Main(string[] args)
{
Console.WriteLine(test(128));
}
static unsafe Vector256<byte> test(byte v)
{
Vector256<byte> x = Avx2.BroadcastScalarToVector256(&v);
return x;
}
Outputs for debug
<224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224, 224>
Outputs for release
<32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32>
Produces the following asm
; Lcl frame size = 0
G_M25726_IG01:
vzeroupper
nop
mov dword ptr [rsp+10H], edx
G_M25726_IG02:
lea rax, bword ptr [rsp+10H]
vpbroadcastb ymm0, yrax
vmovupd ymmword ptr[rcx], ymm0
mov rax, rcx
G_M25726_IG03:
vzeroupper
ret
; Total bytes of code 30, prolog size 5 for method Program:test(ubyte):struct
It inlines into the Main method producing the following asm
G_M42296_IG01:
sub rsp, 88
vzeroupper
vmovaps qword ptr [rsp+40H], xmm6
vmovaps qword ptr [rsp+30H], xmm7
G_M42296_IG02:
mov dword ptr [rsp+2CH], 128
lea rcx, bword ptr [rsp+2CH]
vpbroadcastb ymm6, yrcx
mov rcx, 0xD1FFAB1E
vextractf128 ymm7, ymm6, 1
call CORINFO_HELP_NEWSFAST
vinsertf128 ymm6, ymm7, 1
vmovupd ymmword ptr[rax+8], ymm6
mov rcx, rax
call Console:WriteLine(ref)
nop
G_M42296_IG03:
vmovaps xmm6, qword ptr [rsp+40H]
vmovaps xmm7, qword ptr [rsp+30H]
vzeroupper
add rsp, 88
ret
; Total bytes of code 98, prolog size 19 for method Program:Main()
Contrast with
static unsafe Vector256<byte> test(byte v)
{
Vector256<byte> x = Vector256.Create(v);
return x;
}
Which produces
G_M25727_IG02:
movzx rax, dl
vmovd xmm0, eax
vpbroadcastb ymm0, ymm0
vmovupd ymmword ptr[rcx], ymm0
mov rax, rcx
And correctly outputs
<128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128>
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 18 (18 by maintainers)
Not exactly an elegant implementation but it gets the job done as a prototype: https://github.com/mikedn/coreclr/commit/a00b42e9442ceeeacda7981bbb462bb8ec7a8a21
generates
Uh oh, didn’t notice that. I think those should be removed.
Yeah, but it’s also
HW_Category_IMM
so it can’t beHW_Category_MemoryLoad
as well. That’s fishy but I don’t know if there’s a better way. It seems thatHW_Category_MemoryLoad
should perhaps be a flag and not a category.