runtime: Vector{128,256}.ToScalar suboptimal codegen \ { double }

Vector128<long>.ToScalar() stores the xmm to the stack, then reads r64 from there via a mov.

vmovapd  xmmword ptr [rsp], xmm0
mov      rax, qword ptr [rsp]

Ideally this would use movq (c++ intrinsic: _mm_cvtsi128_si64), so asm becomes:

-   vmovapd  xmmword ptr [rsp], xmm0
-   mov      rax, qword ptr [rsp]
+   movq     rax, xmm0

Vector128<double>.ToScalar() produces expected code (vmovsd) – no issue there. Same CQ issue for int, and for Vector256<T>. Didn’t check other types, than noted here.

category:cq theme:vector-codegen skill-level:intermediate cost:medium

About this issue

  • Original URL
  • State: open
  • Created 5 years ago
  • Reactions: 3
  • Comments: 15 (14 by maintainers)

Most upvoted comments

It is a bit complex, please see https://github.com/dotnet/coreclr/issues/21062. But I am sure that codegen has something wrong.