go: crypto/tls: 500% increase in allocations from `(*tls.Conn).Read` in go 1.17

What version of Go are you using (go version)?

$ go version
1.17.3

Does this issue reproduce with the latest release?

I believe so, but have not yet confirmed.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOARCH="amd64"
GOOS="linux"

What did you do?

I’m running a service that has many http2 clients.

What did you expect to see?

Fewer allocations when reading data from established connections.

What did you see instead?

Around 20-25% of our service’s total allocations at times are coming from context.WithCancel calls made by (*tls.Conn).handshakeContext.

It looks like every (*tls.Context).Read call made by http2’s ReadFrame is calling c.Handshake which calls through to c.handshakeContext, which begins with a call to context.WithCancel, which allocates.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 3
  • Comments: 15 (10 by maintainers)

Most upvoted comments

As mentioned in the Gophers Slack discussion, it seems to me we could simply move the early exit checks above the context manipulation functions to remove these allocations in the case where the handshake has already completed.

Thanks @dt and @FiloSottile for getting this in ❤️

I did a quick run comparing release-branch.go1.16 and release-branch.go1.17 using crypto/tls.BenchmarkThroughput. The raw timing numbers across those two are not directly comparable, as an unrelated change in 9d0819b27 switched the cipher config to avoid hardware aes, but just looking at the allocations / op:

name                                     old allocs/op  new allocs/op  delta
Throughput/MaxPacket/1MB/TLSv12-32            773 ± 0%      1030 ± 0%   +33.25%  (p=0.016 n=5+4)
Throughput/MaxPacket/1MB/TLSv13-32          1.44k ± 0%     1.67k ± 0%   +16.47%  (p=0.008 n=5+5)
Throughput/MaxPacket/2MB/TLSv12-32            843 ± 0%      1546 ± 0%   +83.35%  (p=0.008 n=5+5)
Throughput/MaxPacket/2MB/TLSv13-32          1.51k ± 0%     2.19k ± 0%   +45.50%  (p=0.008 n=5+5)
Throughput/MaxPacket/4MB/TLSv12-32            973 ± 1%      2576 ± 0%  +164.79%  (p=0.008 n=5+5)
Throughput/MaxPacket/4MB/TLSv13-32          1.64k ± 0%     3.22k ± 0%   +96.12%  (p=0.008 n=5+5)
Throughput/MaxPacket/8MB/TLSv12-32          1.24k ± 1%     4.65k ± 0%  +274.66%  (p=0.008 n=5+5)
Throughput/MaxPacket/8MB/TLSv13-32          1.91k ± 1%     5.30k ± 0%  +177.47%  (p=0.008 n=5+5)
Throughput/MaxPacket/16MB/TLSv12-32         1.77k ± 1%     8.78k ± 0%  +394.98%  (p=0.008 n=5+5)
Throughput/MaxPacket/16MB/TLSv13-32         2.43k ± 0%     9.41k ± 0%  +287.08%  (p=0.008 n=5+5)
Throughput/MaxPacket/32MB/TLSv12-32         2.83k ± 1%    17.01k ± 0%  +501.69%  (p=0.008 n=5+5)
Throughput/MaxPacket/32MB/TLSv13-32         3.50k ± 1%    17.65k ± 0%  +404.74%  (p=0.008 n=5+5)
Throughput/MaxPacket/64MB/TLSv12-32         4.93k ± 1%    33.60k ± 0%  +581.96%  (p=0.008 n=5+5)
Throughput/MaxPacket/64MB/TLSv13-32         5.63k ± 1%    34.14k ± 0%  +506.90%  (p=0.008 n=5+5)
Throughput/DynamicPacket/1MB/TLSv12-32        786 ± 0%      1093 ± 0%   +39.00%  (p=0.008 n=5+5)
Throughput/DynamicPacket/1MB/TLSv13-32      1.45k ± 0%     1.73k ± 0%   +19.32%  (p=0.016 n=4+5)
Throughput/DynamicPacket/2MB/TLSv12-32        856 ± 0%      1607 ± 0%   +87.69%  (p=0.008 n=5+5)
Throughput/DynamicPacket/2MB/TLSv13-32      1.52k ± 0%     2.25k ± 0%   +47.69%  (p=0.008 n=5+5)
Throughput/DynamicPacket/4MB/TLSv12-32        987 ± 1%      2640 ± 0%  +167.46%  (p=0.008 n=5+5)
Throughput/DynamicPacket/4MB/TLSv13-32      1.66k ± 0%     3.28k ± 0%   +97.82%  (p=0.008 n=5+5)
Throughput/DynamicPacket/8MB/TLSv12-32      1.26k ± 0%     4.71k ± 0%  +272.60%  (p=0.008 n=5+5)
Throughput/DynamicPacket/8MB/TLSv13-32      1.92k ± 1%     5.35k ± 0%  +178.67%  (p=0.008 n=5+5)
Throughput/DynamicPacket/16MB/TLSv12-32     1.79k ± 1%     8.84k ± 0%  +394.33%  (p=0.008 n=5+5)
Throughput/DynamicPacket/16MB/TLSv13-32     2.46k ± 1%     9.46k ± 0%  +285.13%  (p=0.008 n=5+5)
Throughput/DynamicPacket/32MB/TLSv12-32     2.86k ± 1%    17.10k ± 0%  +498.46%  (p=0.008 n=5+5)

By looking at 1.16 vs 90d6bbbe42c15d4 – the last commit to touch crypto/tls commit prior to the cipher change above – I think we can see what effect these extra allocations have on the actual performance:

name                                 old time/op    new time/op    delta
Throughput/MaxPacket/2MB/TLSv12-32     6.20ms ± 1%    6.37ms ± 1%    +2.63%  (p=0.008 n=5+5)
Throughput/MaxPacket/2MB/TLSv13-32     6.46ms ± 1%    6.61ms ± 1%    +2.38%  (p=0.008 n=5+5)
Throughput/MaxPacket/32MB/TLSv12-32    77.9ms ± 0%    80.9ms ± 0%    +3.88%  (p=0.008 n=5+5)
Throughput/MaxPacket/32MB/TLSv13-32    80.3ms ± 1%    82.7ms ± 0%    +2.93%  (p=0.008 n=5+5)

name                                 old speed      new speed      delta
Throughput/MaxPacket/2MB/TLSv12-32    338MB/s ± 1%   329MB/s ± 1%    -2.56%  (p=0.008 n=5+5)
Throughput/MaxPacket/2MB/TLSv13-32    325MB/s ± 1%   317MB/s ± 1%    -2.32%  (p=0.008 n=5+5)
Throughput/MaxPacket/32MB/TLSv12-32   431MB/s ± 0%   415MB/s ± 0%    -3.74%  (p=0.008 n=5+5)
Throughput/MaxPacket/32MB/TLSv13-32   418MB/s ± 1%   406MB/s ± 0%    -2.85%  (p=0.008 n=5+5)

name                                 old alloc/op   new alloc/op   delta
Throughput/MaxPacket/2MB/TLSv12-32      216kB ± 0%     249kB ± 0%   +15.22%  (p=0.008 n=5+5)
Throughput/MaxPacket/2MB/TLSv13-32      257kB ± 0%     290kB ± 0%   +12.85%  (p=0.008 n=5+5)
Throughput/MaxPacket/32MB/TLSv12-32     267kB ± 2%     848kB ± 1%  +217.68%  (p=0.008 n=5+5)
Throughput/MaxPacket/32MB/TLSv13-32     309kB ± 1%     889kB ± 0%  +187.97%  (p=0.008 n=5+5)

name                                 old allocs/op  new allocs/op  delta
Throughput/MaxPacket/2MB/TLSv12-32        843 ± 0%      1558 ± 0%   +84.79%  (p=0.008 n=5+5)
Throughput/MaxPacket/2MB/TLSv13-32      1.51k ± 0%     2.22k ± 0%   +47.52%  (p=0.008 n=5+5)
Throughput/MaxPacket/32MB/TLSv12-32     2.83k ± 1%    17.02k ± 0%  +501.81%  (p=0.008 n=5+5)
Throughput/MaxPacket/32MB/TLSv13-32     3.50k ± 1%    17.69k ± 0%  +405.72%  (p=0.008 n=5+5)

Is this enough of a regression to justify a backport?

I’m unclear as to whether or not the performance regression seen here qualifies for a backport under the backport policy’s definition of a serious issue with no workaround:

A “serious” problem is one that prevents a program from working at all.

I’ll go ahead and open backport issues to continue the discussion there I guess?

@gopherbot please consider this for backport to 1.17, as it is a regression compared to 1.16.

@gopherbot please consider this for backport to 1.18, as it is a regression compared to 1.16.

Is there any update on this issue?