go: internal/cpu: detect OS support for AVX-512 and CPU support for VAES and VPCLMULQDQ instructions for performance optimization of crypto ciphers
What version of Go are you using (go version
)?
$ go version 1.15.5 linux/amd64
Does this issue reproduce with the latest release?
yes
What operating system and processor architecture are you using (go env
)?
go env
Output
$ go env
What did you do?
What did you expect to see?
Detect OS support for AVX-512 registers, detect CPU support for AVX-512 VAES and VPCLMULQDQ instructions.
What did you see instead?
AVX-512 OS support and VAES crypto instructions are not currently supported in Go. We have developed proposed patches for go v1.15.5 for internal/cpu: check OS support for AVX-512 registers and check cpu registers for presence of VAES and VPCLMULQDQ, set flags accordingly. The patches will be contributed and submitted to the Go Gerrit for review.
References:
About this issue
- Original URL
- State: open
- Created 3 years ago
- Comments: 32 (19 by maintainers)
It sounds like the proposal is to add a few more names in the CPU info:
X86.HasVPCLMULQDQ X86.HasGFNI X86.HasVAES
I don’t think we necessarily need to deprecate the existing names, since they can still be used to check for AVX512 and the specific feature.
To be clear this proposal is not about using AVX-512 or even these not-quite-AVX-512 things in any specific package.
For VAES & VPCLMULQDQ, they also add support to working with 256 YMM registers from AVX2. So AVX512 is not a hard prerequisite for using VAES & VPCLMULQDQ.
This also shows in the AMD Ryzen Zen 3 CPUs, which don’t support AVX512 but do support VAES & VPCLMULQDQ which can be used on 256 bit registers there.
Implementing that would not incur any of the downsides I think of AVX512 and the potential throttling effects it has, but theoretically would still provide a significant speedup of AES operations. It also avoids issues on Darwin with ZMM registers in that case.
Agreed. The worst case is using an AVX512 “heavy” instruction once per second or something.
On Ice Lake the risk and potential impact of such worst cases is massively reduced.
On the darwin/amd64 fix, the code I submitted does a kernel version check (only on Darwin when AVX512 is present) to ensure that Apple’s patch is present.
Here’s the spot in the issue thread where I note the properly patched versions of MacOS/darwin and discuss the Golang fix I developed: https://github.com/golang/go/issues/49233#issuecomment-1023529992
I think using
VAES
andVPCLMULQDQ
without AVX512 is fine pending on that we actually be able to test them and no other policy e.g. crypto assembly is restricting their use. If they will actually be used we need to add the corresponding feature bit checks tointernal/cpu
and in the specific code paths also check for any other AVX features needed.VPCLMULQDQ might have a caveat in that I think some CPUs actually implement them as slow microcode which we could cover in benchmarking on cl submission to figure out if it could make things worse and if that is bad enough to warrant not using it or adding more detection.
As AVX512 came up in https://go-review.googlesource.com/c/go/+/379394 internal/cpu,internal/bytealg: add SIMD prefix match for Index/amd64:
Before using AVX512 in runtime/std I think we should