go: crypto/rsa: Some severe performance regressions in Go 1.20

What version of Go are you using (go version)?

$ go version
1.20.3

Does this issue reproduce with the latest release?

Yes (only on latest release).

What operating system and processor architecture are you using (go env)?

linux/amd64

go env Output
$ go env
~ » go env                                                                                                                  sungyoon@sungyoon
GO111MODULE="on"
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/user/.cache/go-build"
GOENV="/home/user/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/user/go-repos/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/user/go-repos:/opt/go/path:/home/user/go-code"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/opt/go/root"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/opt/go/root/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.20.2"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
GOWORK=""
CGO_CFLAGS="-O2 -g"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-O2 -g"
CGO_FFLAGS="-O2 -g"
CGO_LDFLAGS="-O2 -g"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build2856124056=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Some services at Uber started seeing severe performance degradation after upgrading to Go 1.20.

Profiles revealed crypto/rsa related stacks showing up everywhere.

Here is a repro benchmark that shows around ~60% regression compared to Go 1.19:

package cryptosign

import (
        "crypto"
        "crypto/rand"
        "crypto/rsa"
        "crypto/sha256"
)

func Sign(key any, msg []byte) (sig []byte, err error) {
        k, _ := key.(*rsa.PrivateKey)
        h := sha256.New()
        h.Write(msg)
        return rsa.SignPKCS1v15(rand.Reader, k, crypto.SHA256, h.Sum(nil))
}

func BenchmarkSign(b *testing.B) {
        msg := []byte("secret text")
        rsaKey, _ := rsa.GenerateKey(rand.Reader, 2048)

        b.ResetTimer()

        for i := 0; i < b.N; i++ {
                Sign(rsaKey, msg)
        }
}
benchstat before.txt after.txt                                                                                                                                                                                
goos: linux
goarch: amd64
pkg: github.com/sywhang/issues/cryptosign
cpu: AMD EPYC 7B13
        │ before.txt  │              after.txt              │
        │   sec/op    │   sec/op     vs base                │
Sign-96   1.246m ± 2%   2.009m ± 7%  +61.23% (p=0.000 n=10)

What did you expect to see?

I am aware of the new crypto/rsa changes that were introduced in Go 1.20 that involves removing big.Int to bigmod changes, which could be related to the regression. (https://github.com/golang/go/issues/56980).

This benchmark was created based on profile from a single service that reported this issue internally, and there may be more paths in crypto/rsa that has similar issues. Will update as we find more such paths if we find any.

What did you see instead?

60% regression as noted above.

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 19 (18 by maintainers)

Most upvoted comments

@evanj We’ll have to see a completed fix, and give it plenty of time for testing, before we are able to consider a backport. While performance is important, it would of course be a bad idea to backport a risky fix to security code.

You make a great point that it would be better if we could test release candidates in production. I would like to do this, but prioritizing that work has been hard, as you might be able to guess by the fact that we are just migrating to 1.20 now, after 1.20.3 has been released!

FWIW I’m not sure our slowdown is specifically EPYC related. I’ll try to extract a reproduction case from our workload and share some details once we have more fully investigated it.

@FiloSottile I’d be happy to share the profiles with you, if that helps.

Certainly, I agree with you this regression would’ve been much better to catch during rc phase. The issue though is that not many service owners want to deploy rc builds to prod or or even staging. My team doesn’t own any services that have meaningful traffic, or even one that exercises the code path. We only heard about the regression from service owners after we upgraded the whole company to the new version after several steps of verification.

On the amd64 platforms I was able to benchmark the slowdown of RSA-2048 was 20% as noted in the release notes. Looks like AMD EPYC is suffering more than the Intel CPUs I tested. We have work planned as part of #57752 that should bring the performance of Go 1.21 almost in line with Go 1.19 (if not better).