go: internal/cpu: detect OS support for AVX-512 and CPU support for VAES and VPCLMULQDQ instructions for performance optimization of crypto ciphers

What version of Go are you using (go version)?

$ go version 1.15.5 linux/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env

What did you do?

What did you expect to see?

Detect OS support for AVX-512 registers, detect CPU support for AVX-512 VAES and VPCLMULQDQ instructions.

What did you see instead?

AVX-512 OS support and VAES crypto instructions are not currently supported in Go. We have developed proposed patches for go v1.15.5 for internal/cpu: check OS support for AVX-512 registers and check cpu registers for presence of VAES and VPCLMULQDQ, set flags accordingly. The patches will be contributed and submitted to the Go Gerrit for review.

References:

  1. https://www.tomshardware.com/news/intel-10nm-xeon-ice-lake-sp-sunny-cove-core-architecture
  2. https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Comments: 32 (19 by maintainers)

Most upvoted comments

It sounds like the proposal is to add a few more names in the CPU info:

X86.HasVPCLMULQDQ X86.HasGFNI X86.HasVAES

I don’t think we necessarily need to deprecate the existing names, since they can still be used to check for AVX512 and the specific feature.

To be clear this proposal is not about using AVX-512 or even these not-quite-AVX-512 things in any specific package.

For VAES & VPCLMULQDQ, they also add support to working with 256 YMM registers from AVX2. So AVX512 is not a hard prerequisite for using VAES & VPCLMULQDQ.

This also shows in the AMD Ryzen Zen 3 CPUs, which don’t support AVX512 but do support VAES & VPCLMULQDQ which can be used on 256 bit registers there.

Implementing that would not incur any of the downsides I think of AVX512 and the potential throttling effects it has, but theoretically would still provide a significant speedup of AES operations. It also avoids issues on Darwin with ZMM registers in that case.

The worry here is that very rare uses cause down-clocking for large periods of time.

Agreed. The worst case is using an AVX512 “heavy” instruction once per second or something.

On Ice Lake the risk and potential impact of such worst cases is massively reduced.

On the darwin/amd64 fix, the code I submitted does a kernel version check (only on Darwin when AVX512 is present) to ensure that Apple’s patch is present.

Here’s the spot in the issue thread where I note the properly patched versions of MacOS/darwin and discuss the Golang fix I developed: https://github.com/golang/go/issues/49233#issuecomment-1023529992

I think using VAES and VPCLMULQDQ without AVX512 is fine pending on that we actually be able to test them and no other policy e.g. crypto assembly is restricting their use. If they will actually be used we need to add the corresponding feature bit checks to internal/cpu and in the specific code paths also check for any other AVX features needed.

VPCLMULQDQ might have a caveat in that I think some CPUs actually implement them as slow microcode which we could cover in benchmarking on cl submission to figure out if it could make things worse and if that is bad enough to warrant not using it or adding more detection.

As AVX512 came up in https://go-review.googlesource.com/c/go/+/379394 internal/cpu,internal/bytealg: add SIMD prefix match for Index/amd64:

Before using AVX512 in runtime/std I think we should

  • have an understanding and differentiate between allowing AVX512 that can and that cant cause downclocking and thereby performance regressions for other code
  • have builders that specifically run on AVX512 hardware (AMD and newer Intel desktop cpus dont support it) to test it
  • consider if the maintenance burden added is worth it, especially since support in OSes doesnt seem very tested and can cause subtle bugs with signals/preemption: https://github.com/golang/go/issues/49233