runtime: Support emitting a constant for the Vector64/128/256.Create hardware intrinsic methods

For the case where the Vector64.Create, Vector128.Create, and Vector256.Create helper functions are called with all constant arguments, we should support emitting a constant which can be loaded from memory, rather than emitting a chain of shuffle or insert calls.

We may also find some benefit in doing the same for partial constants as a partial constant with several inserts can still be faster than treating it as non-constant.

category:cq theme:vector-codegen skill-level:intermediate cost:medium

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 4
  • Comments: 18 (18 by maintainers)

Most upvoted comments

Hmm, right, SetVector tends to generate so much code that either way things aren’t that great in terms of size. The concerns about data size would be more relevant to SetAll, that one generates much less code and yet native compilers tends to also use memory constants for that as well. But then native compilers have the luxury of deduplicating constants…

If a program has many distinct constant vectors, it would spend much memory.

I’m not so sure on this part. It looks like it generally takes more bytes to do the insert/shift code than it does to store the raw bytes and read from memory.

; This takes ~38 bytes of code, plus 16-bytes of storage
vmovss xmm0, dword ptr [rip+0x00]
vinsertps xmm0, xmm0, dword ptr [reg+0x04], 0x10
vinsertps xmm0, xmm0, dword ptr [reg+0x08], 0x20
vinsertps xmm0, xmm0, dword ptr [reg+0x12], 0x30
; This takes ~8 bytes of code, plus 16-bytes of storage
vmovups xmm0, xmmword ptr [reg+0x00]

So, even with “perfect” deduping of float constants (which we don’t have), we still only have a code savings of ~2-bytes.