runtime: Vector{128,256}.ToScalar suboptimal codegen \ { double }
Vector128<long>.ToScalar() stores the xmm to the stack, then reads r64 from there via a mov.
vmovapd xmmword ptr [rsp], xmm0
mov rax, qword ptr [rsp]
Ideally this would use movq (c++ intrinsic: _mm_cvtsi128_si64), so asm becomes:
- vmovapd xmmword ptr [rsp], xmm0
- mov rax, qword ptr [rsp]
+ movq rax, xmm0
Vector128<double>.ToScalar() produces expected code (vmovsd) – no issue there.
Same CQ issue for int, and for Vector256<T>.
Didn’t check other types, than noted here.
category:cq theme:vector-codegen skill-level:intermediate cost:medium
About this issue
- Original URL
- State: open
- Created 5 years ago
- Reactions: 3
- Comments: 15 (14 by maintainers)
It is a bit complex, please see https://github.com/dotnet/coreclr/issues/21062. But I am sure that codegen has something wrong.