go: runtime: make([]byte, n) becomes much slower compared with go 1.11.1
What version of Go are you using (go version
)?
go version devel +2e9f081 Tue Oct 30 04:39:53 2018 +0000 linux/amd64
Does this issue reproduce with the latest release?
yes
What operating system and processor architecture are you using (go env
)?
GOARCH=“amd64” GOBIN=“” GOCACHE=“/home/lni/.cache/go-build” GOEXE=“” GOFLAGS=“” GOHOSTARCH=“amd64” GOHOSTOS=“linux” GOOS=“linux” GOPATH=“/home/lni/golang_ws” GOPROXY=“” GORACE=“” GOROOT=“/usr/local/go” GOTMPDIR=“” GOTOOLDIR=“/usr/local/go/pkg/tool/linux_amd64” GCCGO=“gccgo” CC=“gcc” CXX=“g++” CGO_ENABLED=“1” GOMOD=“” CGO_CFLAGS=“-g -O2” CGO_CPPFLAGS=“” CGO_CXXFLAGS=“-g -O2” CGO_FFLAGS=“-g -O2” CGO_LDFLAGS=“-g -O2” PKG_CONFIG=“pkg-config” GOGCCFLAGS=“-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build682632202=/tmp/go-build -gno-record-gcc-switches”
What did you do?
I tried the devel version of go for both devel +80b8377 and devel +2e9f081, my go program became slower. When checking the benchmarks, I noticed that make([]byte, n) accessed in parallel is much slower when compared to go 1.11.1
package makebenchmark
import (
"testing"
)
func benchmarkMakeByteSliceInParalell(b *testing.B, sz int) {
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
m := make([]byte, sz)
if len(m) != sz {
b.Errorf("unexpected len")
}
}
})
}
func BenchmarkMakeByteSliceInParalell512(b *testing.B) {
benchmarkMakeByteSliceInParalell(b, 512)
}
func BenchmarkMakeByteSliceInParalell8(b *testing.B) {
benchmarkMakeByteSliceInParalell(b, 8)
}
go 1.11.1:
go test -count=1 -v -benchmem -run ^$ -bench=BenchmarkMakeByteSliceInParalell
goos: linux
goarch: amd64
pkg: github.com/lni/goplayground/makebenchmark
BenchmarkMakeByteSliceInParalell512-40 10000000 239 ns/op 512 B/op 1 allocs/op
BenchmarkMakeByteSliceInParalell8-40 300000000 4.08 ns/op 8 B/op 1 allocs/op
PASS
go version
go version go1.11.1 linux/amd64
go version devel +2e9f081 and devel +80b8377 reported similar results
/home/lni/src/go/bin/go test -count=1 -v -benchmem -run ^$ -bench=BenchmarkMakeByteSliceInParalell
goos: linux
goarch: amd64
pkg: github.com/lni/goplayground/makebenchmark
BenchmarkMakeByteSliceInParalell512-40 3000000 564 ns/op 512 B/op 1 allocs/op
BenchmarkMakeByteSliceInParalell8-40 300000000 5.09 ns/op 8 B/op 1 allocs/op
PASS
/home/lni/src/go/bin/go version
go version devel +2e9f081 Tue Oct 30 04:39:53 2018 +0000 linux/amd64
What did you expect to see?
make([]byte, n) has similar ns/op when compared with 1.11.1
What did you see instead?
make([]byte, n) is much slower
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 7
- Comments: 22 (13 by maintainers)
Commits related to this issue
- runtime: add iterator abstraction for mTreap This change adds the treapIter type which provides an iterator abstraction for walking over an mTreap. In particular, the mTreap type now has iter() and r... — committed to golang/go by mknyszek 6 years ago
- runtime: make mTreap iterator bidirectional This change makes mTreap's iterator type, treapIter, bidirectional instead of unidirectional. This change helps support moving the find operation on a trea... — committed to golang/go by mknyszek 5 years ago
@lni do you mind trying out this patch with your application/library? (https://go-review.googlesource.com/c/go/+/151538)
The original numbers are a little harder to reproduce now as it seems other runtime/compiler improvements are hiding the original performance regression a little bit, but in some preliminary benchmarking it looked like the change above helped with your original benchmark.
I still need to start some longer runs for better statistical significance so I’ll update here with those soon.
As reusee@ sleuthed out, it’s probably my commit (
07e738e
). Thank you everyone for looking into this.I took some time to dig in and understand exactly what was going on. This benchmark exercises the case where we’re allocating size-segregated spans with object size 512, which we then free, and re-allocate from. At first, I thought that the regression might have been the result of span allocation having to traverse the treap, instead of walking over a fixed-size array. But, CPU profiling shows that it’s more likely a result of freeing spans. Before my change, freeing these size-segregated spans was a very fast operation: indexing into an array followed by a linked list insertion at head. Now, one needs to traverse a treap, allocate a treap node out of a SLAB, and rebalance it. While all of these operations should be reasonably fast, it’s definitely more work than what was being done before. I think the regression in performance we see as a result of a higher GOMAXPROCS arises out of the fact that the heap lock is held when a span is freed, and my change causes this benchmark to spend more time holding the heap lock.
So, for a microbenchmark like this which really hammers on the GC and the allocator, it makes sense to me why my change had such an impact. Running other benchmarks (such as the go1 benchmarks, or the garbage benchmark, whose numbers are in my change’s commit message) the actual performance hit isn’t quite so dramatic. I quickly ran github.com/dr2chase/bent (which runs a number of benchmarks from actual Go programs in the wild) with my change and without it and I haven’t yet seen any significant change there either. However, I will run those benchmarks with more iterations and report back with actual numbers.
The motivation behind my change was that it really simplified a number of changes I just landed today for #14045. I’m erring on the side that it was worth it, as it seems as though in practice the performance impact of my change appears to be relatively small.
Bisect to https://github.com/golang/go/commit/07e738ec32025da458cdf968e4f991972471e6e9