go: cmd/compile: generic functions are significantly slower than identical non-generic functions in some cases
What version of Go are you using (go version)?
$ go version go version go1.18beta1 windows/amd64
Does this issue reproduce with the latest release?
It reproduces on the 1.18 Beta 1 release.
What operating system and processor architecture are you using (go env)?
go env Output
$ go env set GO111MODULE= set GOARCH=amd64 set GOBIN= set GOCACHE=C:\Users\argusdusty\AppData\Local\go-build set GOENV=C:\Users\argusdusty\AppData\Roaming\go\env set GOEXE=.exe set GOEXPERIMENT= set GOFLAGS= set GOHOSTARCH=amd64 set GOHOSTOS=windows set GOINSECURE= set GOMODCACHE=C:\Users\argusdusty\Code\pkg\mod set GONOPROXY= set GONOSUMDB= set GOOS=windows set GOPATH=C:\Users\argusdusty\Code set GOPRIVATE= set GOPROXY=https://proxy.golang.org,direct set GOROOT=C:\Program Files\Go set GOSUMDB=sum.golang.org set GOTMPDIR= set GOTOOLDIR=C:\Program Files\Go\pkg\tool\windows_amd64 set GOVCS= set GOVERSION=go1.18beta1 set GCCGO=gccgo set GOAMD64=v1 set AR=ar set CC=gcc set CXX=g++ set CGO_ENABLED=1 set GOMOD=C:\Users\argusdusty\Code\src\foo\go.mod set GOWORK= set CGO_CFLAGS=-g -O2 set CGO_CPPFLAGS= set CGO_CXXFLAGS=-g -O2 set CGO_FFLAGS=-g -O2 set CGO_LDFLAGS=-g -O2 set PKG_CONFIG=pkg-config set GOGCCFLAGS=-m64 -mthreads -fmessage-length=0 -fdebug-prefix-map=C:\Users\ARGUSD~1\AppData\Local\Temp\go-build3090171098=/tmp/go-build -gno-record-gcc-switches
What did you do?
sort_test.go:
package sort_test
import (
"sort"
"testing"
)
func IsSortedGeneric[T sort.Interface](data T) bool {
n := data.Len()
for i := n - 1; i > 0; i-- {
if data.Less(i, i-1) {
return false
}
}
return true
}
var data = sort.IntSlice{-10, -5, 0, 1, 2, 3, 5, 7, 11, 100, 100, 100, 1000, 10000}
func BenchmarkIsSorted(b *testing.B) {
for i := 0; i < b.N; i++ {
sort.IsSorted(data)
}
}
func BenchmarkIsSortedGeneric(b *testing.B) {
for i := 0; i < b.N; i++ {
IsSortedGeneric[sort.IntSlice](data)
}
}
go test -bench=. -cpu=1 -benchmem sort_test.go
What did you expect to see?
IsSortedGeneric to be faster than, or at least very similar in runtime to, the otherwise identical sort.IsSorted.
What did you see instead?
goos: windows
goarch: amd64
cpu: AMD Ryzen 5 3600X 6-Core Processor
BenchmarkIsSorted 21008770 55.26 ns/op 24 B/op 1 allocs/op
BenchmarkIsSortedGeneric 2859728 421.3 ns/op 336 B/op 14 allocs/op
PASS
ok command-line-arguments 3.003s
This appears to be a problem specifically when methods are part of the type constraint - and you may note above that the number of allocs (14) is exactly equal to the number of method calls (1 .Len, 13 .Less calls). Compiler output points to runtime.convTslice triggered at each method call as the cause of the allocs.
A generic IsSorted implementation that restricts to a slice of constraints.Ordered does not have this slowdown. Experimentation also finds that supplying a pointer type instead of a value does not have this issue:
type PtrIntSlice []int
func (x *PtrIntSlice) Len() int { return len(*x) }
func (x *PtrIntSlice) Less(i, j int) bool { return (*x)[i] < (*x)[j] }
func (x *PtrIntSlice) Swap(i, j int) { (*x)[i], (*x)[j] = (*x)[j], (*x)[i] }
var ptrdata = PtrIntSlice{-10, -5, 0, 1, 2, 3, 5, 7, 11, 100, 100, 100, 1000, 10000}
func BenchmarkIsSortedPtrGeneric(b *testing.B) {
for i := 0; i < b.N; i++ {
IsSortedGeneric[*PtrIntSlice](&ptrdata)
}
}
BenchmarkIsSortedPtrGeneric 44432265 27.07 ns/op 0 B/op 0 allocs/op
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 22 (14 by maintainers)
Links to this issue
Commits related to this issue
- cmd/compile: implement generic method expressions with closures in dictionary Currently we do quite a dance for method expressions on generic types. We write a new closure, and in that closure conver... — committed to golang/go by randall77 2 years ago
- cmd/compile: use method expression closures to implement bound method calls When we have x.M(args) where x is a value of type parameter type, we currently cast x to the bound of that type parameter (... — committed to golang/go by randall77 2 years ago
- cmd/compile: stop interface conversions for generic method calls from allocating Let T be a type parameter, and say we instantiate it with S, a type that isn't pointer-like (e.g. a pair of ints, or a... — committed to jproberts/go by randall77 2 years ago
Just wanted to share some benchmarks showing the improvement for the example at the top of this issue.
The first change in CL 378178 that was merged in January reduced the time by ~90% (compared to tip as of 4f6f68e, which was before any change for this issue).
The pending CL 385274 reduces the time by another ~20% (and also simplifies the implementation).
Change https://golang.org/cl/378178 mentions this issue:
cmd/compile: stop interface conversions for generic method calls from allocatingIf you use
-gcflags=all='-G=3 -d=unified=1', the generic version becomes much faster than the non-generic one.gotip versiongo version devel go1.18-0f05ed3b78 Tue Dec 14 20:25:03 2021 +0000 linux/amd64gotip test -bench=. -cpu=1 -benchmem ./...gotip test -gcflags=all='-G=3 -d=unified=1' -bench=. -cpu=1 -benchmem ./...I still do not know what the
unifiedflag does. I saw it once on @mdempsky’s twitch livestream@g-talbot The goal of generics is not to provide faster execution time. The goal is to make it easier to write certain kinds of code with less repetitive boilerplate and more compile-time type checking. Rewriting code to use generics may sometimes make it faster. It may sometimes make it slower (as in this example). While we will continue to improve the compiler balancing the needs of fast builds and fast runtimes, it will never be the case that generic code will always be faster than non-generic code. That is not a goal.
@komuw It was interesting to me, so I took a dive, and what I found was: as described here unified IR doesn’t currently do dictionary (shape like) approach for generics, and instead does stenciling (aka separate code for each type no matter memory representation). So resulting binary is indeed faster, but ALOT bigger. That also explains why there was no allocations with this flag.
Further reading: [dev.typeparams] all: add GOEXPERIMENT=unified knob https://github.com/golang/go/issues/46786 [go/dev.typeparams] [dev.typeparams] cmd/compile: unified IR construction
@randall77 But you do know the exact type. It’s passed right here in the brackets. The compiler should be able to deduce that it can short-circuit the type here and that only sort.IntSlice’s Less() implementation is being called. This is a real issue. What’s the point of generics if the compiler can’t short-circuit the interface machinery when it knows the precise types? You lose the performance benefit via inlining if you can’t do that.
@randall77 , why does that lead to 14 allocations per iteration, though? I would expect that to result in 1 additional allocation per iteration. Instead, we seem to be allocating for each element of the slice.
We don’t know it when compiling IsSortedGeneric. We don’t compile
IsSortedGeneric[T]for everyTthat instantiates it. We compile it only for each distinct shape of the instantiatingT. See https://github.com/golang/proposal/blob/master/design/generics-implementation-gcshape.md , and in particular the second bullet in the risks section.