go: runtime: frequent enlisting of short-lived background workers leads to performance regression with async preemption
What version of Go are you using (go version
)?
λ go version go version go1.14 windows/amd64 λ go version go version go1.13.8 windows/amd64
Does this issue reproduce with the latest release?
Yes.
What operating system and processor architecture are you using (go env
)?
go env
Output
λ go env set GO111MODULE= set GOARCH=amd64 set GOBIN= set GOCACHE=C:\Users\klaus\AppData\Local\go-build set GOENV=C:\Users\klaus\AppData\Roaming\go\env set GOEXE=.exe set GOFLAGS= set GOHOSTARCH=amd64 set GOHOSTOS=windows set GONOPROXY= set GONOSUMDB= set GOOS=windows set GOPATH=e:\gopath set GOPRIVATE= set GOPROXY=https://goproxy.io set GOROOT=c:\go set GOSUMDB=sum.golang.org set GOTMPDIR= set GOTOOLDIR=c:\go\pkg\tool\windows_amd64 set GCCGO=gccgo set AR=ar set CC=gcc set CXX=g++ set CGO_ENABLED=1 set GOMOD= set CGO_CFLAGS=-g -O2 set CGO_CPPFLAGS= set CGO_CXXFLAGS=-g -O2 set CGO_FFLAGS=-g -O2 set CGO_LDFLAGS=-g -O2 set PKG_CONFIG=pkg-config set GOGCCFLAGS=-m64 -mthreads -fmessage-length=0 -fdebug-prefix-map=d:\temp\wintemp\go-build155272862=/tmp/go-build -gno-record-gcc-switches
What did you do?
Run benchmark: https://play.golang.org/p/WeuJg6yaOuJ
go test -bench=. -test.benchtime=10s
used to test.
What did you expect to see?
Close or similar benchmark speeds.
What did you see instead?
40% performance regression.
λ benchcmp go113.txt go114.txt
benchmark old ns/op new ns/op delta
BenchmarkCompressAllocationsSingle/flate-32 87026 121741 +39.89%
BenchmarkCompressAllocationsSingle/gzip-32 88654 122632 +38.33%
This is not a purely theoretical benchmark. While suboptimal, this is the easiest way to compress a piece of data, so this will be seen in the wild. It could also indicate a general regression for applications allocating a lot.
Edit: This is not related to changes in the referenced packages. Seeing this when using packages outside the stdlib as well.
About this issue
- Original URL
- State: open
- Created 4 years ago
- Reactions: 9
- Comments: 40 (36 by maintainers)
@mknyszek Well, what I think is that I think someone should review https://golang.org/cl/216198 which seems to already do what you want.
Unfortunately, CL 223797 still has some lock ordering issues, so we’ve decided it’s safer to bump this to 1.17.
Change https://golang.org/cl/223797 mentions this issue:
runtime: prefer to wake an idle P when enlisting bg mark workers
@klauspost That’s true, thanks for pointing it out. I’ll fix it again.
@leitzler Looking at the profiles… it looks like it’s exactly the same issue.
runtime.tgkill
is suddenly at the top of the profile and it comes fromsignalM
, which in turn comes frompreemptM
. Then it follows the same path up toenlistWorker
.Yeah, I think this is consistent with our previous analysis. There is no contention if there is just one thread. The more threads, the more threads trying to preempt each other, thus the heavier contention.