runtime: Several x86 intrinsic APIs are "missing"
In the process of adding the APIs for dotnet/runtime#23315 (see https://github.com/dotnet/coreclr/pull/15341), I found several non-scalar x86 intrinsic APIs that are “missing”.
We should determine which of these need to be added and which were explicitly excluded.
int _mm_movemask_ps (__m128 a) // movmskps
__m128d _mm_loadh_pd (__m128d a, double const* mem_addr) // movhpd
__m128d _mm_loadl_pd (__m128d a, double const* mem_addr) // movlpd
__m128i _mm_loadl_epi64 (__m128i const* mem_addr) // movq
void _mm_stream_si32 (int* mem_addr, int a) // movnti
void _mm_stream_si64 (__int64* mem_addr, __int64 a) // movnti
__m256i _mm256_permute2f128_si256 (__m256i a, __m256i b, int imm8) // vperm2f128
__m256i _mm256_stream_load_si256 (__m256i const* mem_addr) // vmovntdqa
// The following 8 have intrinsics which take an imm8 and emit the same underlying instruction
__m128i _mm_sll_epi16 (__m128i a, __m128i count) // psllw
__m128i _mm_sll_epi32 (__m128i a, __m128i count) // pslld
__m128i _mm_sll_epi64 (__m128i a, __m128i count) // psllq
__m128i _mm_sra_epi16 (__m128i a, __m128i count) // psraw
__m128i _mm_sra_epi32 (__m128i a, __m128i count) // psrad
__m128i _mm_srl_epi16 (__m128i a, __m128i count) // psrlw
__m128i _mm_srl_epi32 (__m128i a, __m128i count) // psrld
__m128i _mm_srl_epi64 (__m128i a, __m128i count) // psrlq
// The following 6 have the corresponding _mm256 forms exposed
__m128i _mm_sllv_epi32 (__m128i a, __m128i count) // vpsllvd
__m128i _mm_sllv_epi64 (__m128i a, __m128i count) // vpsllvq
__m128i _mm_srav_epi32 (__m128i a, __m128i count) // vpsravd
__m128i _mm_srlv_epi32 (__m128i a, __m128i count) // vpsrlvd
__m128i _mm_srlv_epi64 (__m128i a, __m128i count) // vpsrlvq
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 2
- Comments: 34 (34 by maintainers)
@4creators, there are a number of other hardware instructions for which exposing intrinsics may be useful.
However, I don’t think we want to get too far ahead of ourselves right now.
We already have enough intrinsics to cover the next 2 point releases (2.1 and whatever the next version is) and they will take some time to implement, test, and tune.
Not to mention that the ARM work will be going on in parallel.
It might be useful to have an issue listing the hardware instructions we should investigate in the future, but I don’t think we should be adding any more at this time (and likely not until our current backlog clears more).
This issue was specifically tracking the intrinsics that were part of the approved list of x86 ISAs, but which were missed in the initial PR (it was also a useful place to mention the
ShuffleParamidea, since an equivalent is traditionally part of the intrinsics/macros provided for the approved ISAs).@4creators Thank you! Actually, I already have a fix on my local machine. Will upload tomorrow.