go: strings: 10-30% speed regression in Contains from 1.13 to tip
What version of Go are you using (go version
)?
$ go version go version go1.13.4 linux/amd64 go version devel +8cf5293c Tue Nov 19 06:10:03 2019 +0000 linux/amd64
Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (go env
)?
go env
Output
$ go env GO111MODULE="" GOARCH="amd64" GOBIN="" GOCACHE="/home/jake/.cache/go-build" GOENV="/home/jake/.config/go/env" GOEXE="" GOFLAGS="" GOHOSTARCH="amd64" GOHOSTOS="linux" GONOPROXY="" GONOSUMDB="" GOOS="linux" GOPATH="/home/jake/go" GOPRIVATE="" GOPROXY="https://proxy.golang.org,direct" GOROOT="/usr/lib/go" GOSUMDB="sum.golang.org" GOTMPDIR="" GOTOOLDIR="/usr/lib/go/pkg/tool/linux_amd64" GCCGO="gccgo" AR="ar" CC="gcc" CXX="g++" CGO_ENABLED="1" GOMOD="/home/jake/testproj/strtest/go.mod" CGO_CFLAGS="-g -O2" CGO_CPPFLAGS="" CGO_CXXFLAGS="-g -O2" CGO_FFLAGS="-g -O2" CGO_LDFLAGS="-g -O2" PKG_CONFIG="pkg-config" GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build998582668=/tmp/go-build -gno-record-gcc-switches"
What did you do?
package strtest
import (
"strconv"
"strings"
"testing"
)
var sink1 bool
func BenchmarkContains(b *testing.B) {
tests := []string{
"This is just a cool test.",
"This is just a cool test. (Or is it?)",
"Hello, (_USER_)!",
"RIP count reset to (_VARS_RIP-(_GAME_CLEAN_)_SET_0_), neat.",
"(_USER_) rubs on (_PARAMETER_) 's booty! PogChamp / (_(_|",
}
for i, test := range tests {
b.Run(strconv.Itoa(i), func(b *testing.B) {
for i := 0; i < b.N; i++ {
sink1 = strings.Contains(test, "(_")
}
})
}
}
Benchmark this against 1.13 and tip.
What did you expect to see?
No difference in performance (or hopefully better).
What did you see instead?
Contains
is consistently slower. benchstat
-ing the above with go test -run=- -bench . -cpu=1,4 -count=10 .
:
name old time/op new time/op delta
Contains/0 10.1ns ± 2% 11.4ns ± 1% +12.94% (p=0.000 n=10+10)
Contains/0-4 10.1ns ± 0% 11.4ns ± 2% +12.57% (p=0.000 n=7+10)
Contains/1 12.1ns ± 1% 13.0ns ± 1% +7.56% (p=0.000 n=9+9)
Contains/1-4 12.1ns ± 2% 12.8ns ± 2% +5.48% (p=0.000 n=8+10)
Contains/2 7.70ns ± 3% 9.82ns ± 6% +27.52% (p=0.000 n=10+10)
Contains/2-4 7.82ns ± 3% 9.64ns ± 2% +23.25% (p=0.000 n=9+10)
Contains/3 9.31ns ± 1% 11.01ns ± 1% +18.22% (p=0.000 n=9+10)
Contains/3-4 9.45ns ± 3% 11.11ns ± 2% +17.56% (p=0.000 n=9+9)
Contains/4 7.61ns ± 1% 9.87ns ± 1% +29.60% (p=0.000 n=8+9)
Contains/4-4 7.81ns ± 3% 9.73ns ± 3% +24.69% (p=0.000 n=10+9)
This has come out of some performance testing between 1.13 and tip in a project I’m working on which shows a consistent 6% regression for a hand-written parser (https://github.com/hortbot/hortbot/blob/master/internal/cbp/cbp_test.go), and this was the first thing I noticed while comparing the profiles. The first thing the parser does is check for one of two tokens before doing any work, with the assumption that if neither are present, there’s no work to be done.
In the grand scheme of things, maybe a few extra nanoseconds isn’t such a big deal, but I haven’t tested other functions in strings
quite yet.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 21 (20 by maintainers)
What CPU does run on your dev machine and at what CL are you building at tip?
Any CL can potentially cause the benchmark code to align differently and then benchmark the effects of branch alignment. I have seen in the past that it can matter a lot where the benchmark loop is placed in the file which can cause different alignment.
With all the side channel attacks on caching/branch prediction there can also be benchmark differences due to different microcode versions of the same CPU: https://www.phoronix.com/scan.php?page=article&item=intel-jcc-microcode&num=1