go: runtime: Darwin slow when using signals + int->float instructions
Split off from #37121
package bench1
import (
"math"
"testing"
)
const N = 64
func BenchmarkFast(b *testing.B) {
var x, y, z [N]float32
for i := 0; i < b.N; i++ {
mulFast(&x, &y, &z)
}
}
func mulFast(x, y, z *[N]float32) {
for i := 0; i < N; i++ {
z[i] = x[i] * y[i]
}
}
func BenchmarkSlow(b *testing.B) {
var z [N]float32
var x, y [N]uint32
for i := 0; i < b.N; i++ {
mulSlow(&x, &y, &z)
}
}
func mulSlow(x, y *[N]uint32, z *[N]float32) {
for i := 0; i < N; i++ {
z[i] = math.Float32frombits(x[i]) * math.Float32frombits(y[i])
}
}
% ~/go1.12.9/bin/go test bench1_test.go -test.bench .\* -test.benchtime=10000000x
goos: darwin
goarch: amd64
BenchmarkFast-16 10000000 55.9 ns/op
BenchmarkSlow-16 10000000 61.1 ns/op
PASS
% ~/go1.12.9/bin/go test bench1_test.go -test.bench .\* -test.benchtime=10000000x -test.cpuprofile=cpu.prof
goos: darwin
goarch: amd64
BenchmarkFast-16 10000000 89.7 ns/op
BenchmarkSlow-16 10000000 223 ns/op
PASS
For some strange reason, code that includes int->float instructions runs a lot slower when profiling is on.
This bug is reproducible from at least 1.11.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 4
- Comments: 28 (25 by maintainers)
We have an office full of people playing musical VZEROUPPERs trying to figure out what the problem is. A whole bunch of things don’t help, we think the bug is in Darwin…
I reported a bug to apple here. We’ll see if we get any traction on that.
Here’s a C/assembly reproducer:
main.c
add.s:
On our 1.11 builders, my C repro is ~14% slower with the VZEROUPPER commented out. On 1.12, ~13% slower. On 1.14, ~11% faster. On 1.15, ~11% faster.
So for some reason this bug doesn’t appear on the builders like it does on my desktop. I could imagine that some virtualization layer hides the bug, and the VZEROUPPER instruction has a cost.
Yes, #41152 is all we need now.
Ok, sounds fixed. So once we stop supporting releases <10.15.6, we can get rid of the VZEROUPPER patch. We need a way to ask googlebot to reopen issues triggered on a minimum supported OS level.
@jyknight suggests lack of a vzeroupper, perhaps in the Darwin signal handler.