go: cmd/go: slow "native" performance with Mac OS X 10.14.1 and 10.12.6
Following a debug session with @fatih
What version of Go are you using (go version
)?
go version go1.11.2 darwin/amd64 # and go version go1.11.2 linux/amd64
Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (go env
)?
go env
Output
GOARCH="amd64" GOBIN="/Users/fatih/go/bin" GOCACHE="/Users/fatih/Library/Caches/go-build" GOEXE="" GOFLAGS="" GOHOSTARCH="amd64" GOHOSTOS="darwin" GOOS="darwin" GOPATH="/Users/fatih/go" GOPROXY="" GORACE="" GOROOT="/usr/local/Cellar/go/1.11.2/libexec" GOTMPDIR="" GOTOOLDIR="/usr/local/Cellar/go/1.11.2/libexec/pkg/tool/darwin_amd64" GCCGO="gccgo" CC="clang" CXX="clang++" CGO_ENABLED="1" GOMOD="" CGO_CFLAGS="-g -O2" CGO_CPPFLAGS="" CGO_CXXFLAGS="-g -O2" CGO_FFLAGS="-g -O2" CGO_LDFLAGS="-g -O2" PKG_CONFIG="pkg-config" GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/k_/87syx3r50m93m72hvqj2qqlw0000gn/T/go-build130848285=/tmp/go-build -gno-record-gcc-switches -fno-common"and
GOARCH=“amd64” GOBIN=“” GOCACHE=“/root/.cache/go-build” GOEXE=“” GOFLAGS=“” GOHOSTARCH=“amd64” GOHOSTOS=“linux” GOOS=“linux” GOPATH=“/go” GOPROXY=“” GORACE=“” GOROOT=“/usr/local/go” GOTMPDIR=“” GOTOOLDIR=“/usr/local/go/pkg/tool/linux_amd64” GCCGO=“gccgo” CC=“gcc” CXX=“g++” CGO_ENABLED=“1” GOMOD=“” CGO_CFLAGS=“-g -O2” CGO_CPPFLAGS=“” CGO_CXXFLAGS=“-g -O2” CGO_FFLAGS=“-g -O2” CGO_LDFLAGS=“-g -O2” PKG_CONFIG=“pkg-config” GOGCCFLAGS=“-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build416231211=/tmp/go-build -gno-record-gcc-switches”
What did you do?
The symptom that @fatih and I were investigating was godef
performing ~2x more slowly on his fast machine than on my slower machine.
@fatih’s machine specs for reference:
Model Name: iMac Pro
Model Identifier: iMacPro1,1
Processor Name: Intel Xeon W
Processor Speed: 3 GHz
Number of Processors: 1
Total Number of Cores: 10
L2 Cache (per Core): 1 MB
L3 Cache: 13.8 MB
Memory: 64 GB
OS X 10.14.1 (18B75)
We think we have a smaller reproduction that demonstrates what appears to be an OS X 10.14.n issue that may or may not be related to Go. But go list
appears to be a good way to demonstrate the problem and hopefully therefore a good place to start further investigation.
The following was run in a Terminal on @fatih’s setup, and then within a Docker container on the same machine. Run-times of the go list
commands were compared:
/bin/bash
(
set -eux
export GOPATH=$(mktemp -d)
command cd $(mktemp -d)
git clone https://github.com/digitalocean/csi-digitalocean
command cd csi-digitalocean/
git checkout b9ed6c2ea9ad85c5e4956bdfe418940bb5aa883a
rm go.sum
go mod tidy
command cd driver
time go list -export -deps -json -e . > /dev/null 2>&1
time go list -export -deps -json -e . > /dev/null 2>&1
time go list -export -deps -json -e . > /dev/null 2>&1
time go list -export -deps -json -e . > /dev/null 2>&1
) 2>&1 | tee output.txt
The “native” terminal run shows “average” speeds of ~300ms:
+ go list -export -deps -json -e .
real 0m1.524s
user 0m7.224s
sys 0m2.391s
+ go list -export -deps -json -e .
real 0m0.297s
user 0m0.420s
sys 0m0.898s
+ go list -export -deps -json -e .
real 0m0.300s
user 0m0.403s
sys 0m0.919s
+ go list -export -deps -json -e .
real 0m0.308s
user 0m0.402s
sys 0m0.968s
Whereas the Docker run shows “average” speeds of ~180ms:
+ go list -export -deps -json -e .
real 0m4.837s
user 0m12.070s
sys 0m5.030s
+ go list -export -deps -json -e .
real 0m0.186s
user 0m0.180s
sys 0m0.110s
+ go list -export -deps -json -e .
real 0m0.165s
user 0m0.160s
sys 0m0.120s
+ go list -export -deps -json -e .
real 0m0.187s
user 0m0.190s
sys 0m0.100s
What did you expect to see?
Similar run-times in each, possibly even faster times in the “native” OS X environment.
What did you see instead?
The “native” run-times are almost 70% longer.
cc @bcmills (and FYI @ianthehat for any go/packages
side-effects)
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 19 (17 by maintainers)
For the benefit of others, @marcan also posted this fantastic thread: https://twitter.com/marcan42/status/1494213855387734019
Just FWIW, this is a problem with Apple NVMe drives (T2 and M1 Macs alike). Their cache flush performance is abysmal. This affects both macOS and (native) Linux (presumably existing VM solutions fail to forward this as F_FULLSYNC if they run faster). You get about 46 IOPS of F_FULLSYNC on macOS on an M1 machine, and about the same on native Linux with fsync().
So there isn’t much to be done on the software side other than deciding whether you care about data integrity on macOS or not; on Linux there is no way for user processes to control this as far as I know (i.e. flush to cache but not the cache), though it can be emulated by setting the drive write cache type to “write through” in sysfs (which tells the kernel to never issue flush commands, so fsync() behaves like it does on macOS).
I don’t have an answer to why macOS 10.12 and 10.14 performance would be different, but I think I know why Docker and macOS are different.
As far as I understand, Docker on macOS actually runs a Linux kernel and userspace under a hypervisor. That means overhead for I/O and system calls is more similar to native Linux overhead than it is to macOS overhead. “go list” does a lot of
stat
andread
system calls on small files, so the system call overhead matters a lot.Here are the results from a quick benchmark I ran. The
Read
andWrite
tests are reading and writing slices of various sizes to a file (the file is almost certainly in the kernel file cache, so disk speed doesn’t matter). TheImport
test is measuring the time to callgo/build.Context.Import
on"fmt"
.macOS native:
macOS Docker
Notably, the
Stat
andRead/1K
tests are much, much slower on native macOS. Windows is quite a bit slower than this.@randall77 is there some indication that this situation will improve in the future? Is Apple addressing the
fsync
issue? I’m now at 15sec build times on Darwin vs 4.5 sec in a Fedora VM running on the same super-beefy hardware.