go: x/sys/cpu: cpu.X86.HasAVX512 is incorrectly always false on darwin

What version of Go are you using (go version)?

$ go version
go version go1.15.5 darwin/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/vsi/Library/Caches/go-build"
GOENV="/Users/vsi/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/vsi/go/pkg/mod"
GONOPROXY="github.com/vsivsi"
GONOSUMDB="github.com/vsivsi"
GOOS="darwin"
GOPATH="/Users/vsi/go"
GOPRIVATE="github.com/vsivsi"
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/Cellar/go/1.15.5/libexec"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.15.5/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/kp/kjdr0ytx5z9djnq4ysl15x0h0000gn/T/go-build186752670=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

Test for AVX512 support using cpu.X86.HasAVX512

main.go

package main

import (
	"fmt"

	"golang.org/x/sys/cpu"
)

func main() {
	fmt.Println(cpu.X86.HasAVX512)
}

What did you expect to see?

The program above should print true on any OS/hardware combination that is capable of running AVX512 instructions.

What did you see instead?

This program prints false on all Macs that are perfectly capable of running AVX512 instructions generated by the Go assembler.

The reason is complicated, and appears to have to do with how recent versions of the darwin kernel (those since AVX512 enabled processors began appearing in Mac hardware) choose to support the greatly expanded AVX512 thread state.

In summary, darwin implements a two-tier “promotion” based scheme to economize on saving thread state when AVX512 specific registers are not in use. It implements this by initially disabling AVX512 support for new threads, and then trapping undefined instruction faults for AVX512 instructions in the kernel, enabling AVX512 support for the thread, and then restarting execution at the faulted instruction. This scheme has the advantage of maintaining pre-AVX512 efficiency when preempting threads that haven’t used any AVX512 extensions. But the cost appears to be that testing for AVX512 support is more complex.

Specifically, this code assumes that disabled AVX512 OS support is permanent:

https://github.com/golang/sys/blob/master/cpu/cpu_x86.go#L90

The test in the code above is performed at init time before any AVX512 instructions have been run, and hence the bits inspected from xgetbv() reflect at that point that AVX512 support is disabled by the OS. Upon failing that test (cpu.X86.HasAVX512 != true), the CPUID bits indicating that the hardware is AVX512 capable are simply ignored.

Given darwin’s two-tier thread state scheme, clearly something more sophisticated is needed here to properly detect whether AVX512 instructions can be run.

Here is a reference to the darwin code implementing these checks: https://github.com/apple/darwin-xnu/blob/0a798f6738bc1db01281fc08ae024145e84df927/osfmk/i386/fpu.c#L176

And here is an issue on an Intel compiler project raising the same problem: https://github.com/ispc/ispc/issues/1854

There is also a known issue with darwin where threads executing unsupported AVX512 instructions get stuck in a tight loop of some kind, so properly detecting AVX512 support and the CPUID flags for specific extensions is critical. See:

https://github.com/golang/go/issues/42649

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 20 (10 by maintainers)

Commits related to this issue

Most upvoted comments

Before I go to the work of putting together a CL, here is my proposed implementation of the darwin commpage check:

In cpu_gc_x86.go add:

// darwinHasAVX512 is implemented in cpu_x86.s for gc compiler
// and in cpu_gccgo.go for gccgo.
func darwinHasAVX512() bool

In cpu_gccgo_x86.go add:

// gccgo doesn't build on Darwin, per:
// https://github.com/Homebrew/homebrew-core/blob/HEAD/Formula/gcc.rb#L7
func darwinHasAVX512() bool {
	return false
}

In cpu_x86.go (note addition of conditional darwin codepath):

if X86.HasOSXSAVE {
	eax, _ := xgetbv()
	// Check if XMM and YMM registers have OS support.
	osSupportsAVX = isSet(1, eax) && isSet(2, eax)

	if runtime.GOOS == "darwin" {
		// Check darwin commpage for AVX512 support
		osSupportsAVX512 = osSupportsAVX && darwinHasAVX512()
	} else {
		// Check if OPMASK and ZMM registers have OS support.
		osSupportsAVX512 = osSupportsAVX && isSet(5, eax) && isSet(6, eax) && isSet(7, eax)
	}
}

In cpu_x86.s add this code:

// func darwinHasAVX512() bool
TEXT ·darwinHasAVX512(SB), NOSPLIT, $0-1

    MOVB    $0, ret+0(FP) // default to false

#ifdef GOOS_darwin   // return if not darwin
#ifdef GOARCH_amd64  // return if not amd64

// These values from:
// https://github.com/apple/darwin-xnu/blob/xnu-4570.1.46/osfmk/i386/cpu_capabilities.h 
#define commpage64_base_address         0x00007fffffe00000
#define commpage64_cpu_capabilities64   (commpage64_base_address+0x010)
#define commpage64_version              (commpage64_base_address+0x01E) 
#define hasAVX512F                      0x0000004000000000

    MOVQ    $commpage64_version, BX
    MOVW    (BX), AX
    CMPW    AX, $13  // versions < 13 do not support AVX512
    JL      no_avx512
    MOVQ    $commpage64_cpu_capabilities64, BX
    MOVQ    (BX), AX
    MOVQ    $hasAVX512F, CX
    ANDQ    CX, AX
    JZ      no_avx512
    MOVB    $1, ret+0(FP) 
no_avx512:

#endif
#endif

    RET

Note that the cpu_capabilities field occurs above the page_version in the commpage layout. Seems very unlikely these fields will ever move or change.

Also note the use of conditional compilation above to render the darwinHasAVX512 function a stub that always returns false in all cases except GOOS=="darwin" && GOARCH=="amd64". This seems preferable to the alternative, which would be to create two new separate assembly files for the only purpose of providing the above function and its stub in those two cases.

Based on my understanding of everything, I think if the commpage advertises kHasAVX512F then it is safe to assume that darwin will promote threads attempting to use AVX512 instructions.

But I think that is different from what you are suggesting, if I’m understanding correctly, which it that if the CPUID bit for AVX512F is set then we can probably safely make that same assumption.

As I say above, I think that is almost certainly true for unmodified Apple hardware running unpatched MacOS.

Outside of that ideal walled garden, all bets are off. So one question is, does golang as a project care about this case? I assume it does, otherwise why call the OS target “darwin” and not “macos” or whatever? But I honestly don’t know the history of this.

Virtualization keeps coming up, and I think it also raises important potential cases. To my knowledge it is certainly plausible that an older (pre-AVX512) version of MacOS could be run on newer hardware that supports AVX512 under virtualization. If in that case the hypervisor chooses to enable AVX512 within that VM, then yes, it would be incorrect for golang to rely exclusively on the CPUID bits to determine support. If this scenario is indeed possible, it is another argument for checking the commpage for kHasAVX512F (which an older version of darwin shouldn’t set, regardless of what the CPUID says). What I don’t know is if the checks MacOS makes under virtualization to ensure it is running on genuine Mac hardware obey the same constraints as when it is running on bare metal.

To be clear, “older” here means MacOS 10.12 Sierra or before, i.e. versions of MacOS/OS X that are no longer supported by Apple.