go: x/net/http2: server sends RST_STREAM w/ PROTOCOL_ERROR to clients it incorrectly believes have violated max advertised num streams
What version of Go are you using (go version)?
$ go version go version go1.15.0 darwin/amd64
Does this issue reproduce with the latest release?
Yes
What’s the issue
I’m seeing an issue with a grpc-go client talking to a http2.Server{}. I’ve documented the issue https://github.com/grpc/grpc-go/issues/4057. To summarize the issue:
- I have a grpc-go client that is making requests to an
http2.Servr{}on the same machine. The client is generating a lot of load on the server. The server is configured to allow a maximum of 320 requests on a single connection. - I’m seeing sporadic, and relatively regular,
RST_STREAMw/PROTOCOL_ERRORframes from the server. - The client usually has upwards of 500 simultaneous requests open at a time, which means there are 320 concurrent streams and ~200 streams waiting for these streams to complete. The machine that hosts the server is under heavy load.
I put some println statements in the client and server code and isolated the error that the server is returning to be from this line. This code checks the sc.curClientStreams and rejects the new stream if it believes the client is violating the advertised max num of concurrent streams. I added further logging to the server and the client to record the number of active streams and found that this number never creeps above 320 on the client or the server - the grpc-go client code believes it is being well behaved, which indicated to me something is fishy in the http2.Server code.
The only place in the http2.Server code that decremtnts sc.curClientStreams is here in sc.closeStream which is called from sc.wroteFrame, and sc.wroteFrame is only ever called in the sc.serve loop. Writes to the client can be triggered asynchronously by sc.writeFrameAsync (i.e. see here), and then once the write is complete the result is sent to the sc.wroteFrameCh which is read by the sc.serve loop (i.e. see here). The sc.serve loop contains a big select statement that reads from the sc.wroteFrameCh, but also the sc.readFrameCh which receives new frames from clients. Since these channels are read from the same select statement, if both contain a message then there is no guarantee which one will be read from first.
I believe what is happening to cause the spurious PROTOCOL_ERRORs is:
- The server is maxed out with 320 concurrent streams.
- One of the streams on the server finishes. This triggers a call to
sc.writeFrameAsyncto be spawned in a separate goroutine that writes a frame f1 that ends the stream. sc.writeFrameAsyncwrites the frame f1 to the TCP connection and adds a message to thesc.wroteFrameCh.- The client reads frame f1 immediately, sees that the stream has completed, and therefore starts a new stream to the server by sending a frame f2.
- The
sc.readFramesloop reads frame f2 and pushes it onto thesc.readFrameChchannel. - Now, the
sc.serveloop goes through another iteration, and since bothsc.readFrameChandsc.writeFrameAsynccontain messages, it’s undefined which one it processes first. In this case, it processes the message onsc.readFrameChfirst, which contains frame f2. - Since the message in the
sc.wroteFrameChhas not been processed yet, thesc.curClientStreamsvariable has not yet been decremented and thus the server blieves thatsc.curClientStreamsis still set to 320. Whensc.processFrameFromReaderis called it will return aPROTOCOL_ERROR.
To attempt to fix this, I patched the http2.Server code:
diff --git a/vendor/golang.org/x/net/http2/server.go b/vendor/golang.org/x/net/http2/server.go
index 686501b4..a79fed98 100644
--- a/vendor/golang.org/x/net/http2/server.go
+++ b/vendor/golang.org/x/net/http2/server.go
@@ -868,6 +868,13 @@ func (sc *serverConn) serve() {
case res := <-sc.wroteFrameCh:
sc.wroteFrame(res)
case res := <-sc.readFrameCh:
+ // Process any wrote frames before reading new frames from the client since a
+ // wrote frame could have triggered a new stream to be started.
+ select {
+ case wroteRes := <-sc.wroteFrameCh:
+ sc.wroteFrame(wroteRes)
+ default:
+ }
if !sc.processFrameFromReader(res) {
return
}
When I ran this on the server it significantly reduced the occurrences of the PRTOCOL_ERRORs I saw. I was seeing 20 every 3 minutes before the patch, and after I observed about 10 over the course of 2 hours.
I believe that this patch solves some of the problem, but not all of it, as the error still persists. Also, I may be totally wrong in my diagnosis above. Does the above reasoning make sense as a possible failure?
I’m going to try and get a minimal repro working, but wanted to create this issue first to get folks thoughts.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 17 (5 by maintainers)
Commits related to this issue
- http2: don't reuse Transport conns after seeing stream protocol errors Updates tailscale/corp#2363 Updates golang/go#47635 Updates golang/go#42777 — committed to tailscale/net by bradfitz 3 years ago
- http2: don't reuse Transport conns after seeing stream protocol errors Updates tailscale/corp#2363 Updates golang/go#47635 Updates golang/go#42777 — committed to tailscale/net by bradfitz 3 years ago
- http2: complete outstanding async write before reading the next frame For golang/go#42777 Change-Id: I8d7e35ded3fe508f6c285e84ba768104c3406bdf Reviewed-on: https://go-review.googlesource.com/c/net/+... — committed to golang/net by fraenkel 4 years ago
- http2: complete outstanding async write before reading the next frame For golang/go#42777 Change-Id: I8d7e35ded3fe508f6c285e84ba768104c3406bdf Reviewed-on: https://go-review.googlesource.com/c/net/+... — committed to tailscale/net by fraenkel 4 years ago
- [internal-branch.go1.16-vendor] http2: complete outstanding async write before reading the next frame Updates golang/go#42777 Fixes golang/go#47691 Change-Id: I8d7e35ded3fe508f6c285e84ba768104c3406b... — committed to golang/net by fraenkel 4 years ago
- [internal-branch.go1.17-vendor] http2: complete outstanding async write before reading the next frame Updates golang/go#42777 Fixes golang/go#47692 Change-Id: I8d7e35ded3fe508f6c285e84ba768104c3406b... — committed to golang/net by fraenkel 4 years ago
- all: update golang.org/x/net to pull in CL 346890 For #42777. Change-Id: I963db8c666e8bcf0fc4f390b359db6408a0f792b Reviewed-on: https://go-review.googlesource.com/c/go/+/347010 Trust: Damien Neil <d... — committed to golang/go by neild 3 years ago
- http2: make Transport not reuse conns after a stream protocol error If a server sends a stream error of type "protocol error" to a client, that's the server saying "you're speaking http2 wrong". At t... — committed to golang/net by bradfitz 3 years ago
- [internal-branch.go1.17-vendor] http2: make Transport not reuse conns after a stream protocol error If a server sends a stream error of type "protocol error" to a client, that's the server saying "yo... — committed to golang/net by bradfitz 3 years ago
- [internal-branch.go1.16-vendor] http2: make Transport not reuse conns after a stream protocol error If a server sends a stream error of type "protocol error" to a client, that's the server saying "yo... — committed to golang/net by bradfitz 3 years ago
- http2: complete outstanding async write before reading the next frame For golang/go#42777 Change-Id: I8d7e35ded3fe508f6c285e84ba768104c3406bdf Reviewed-on: https://go-review.googlesource.com/c/net/+... — committed to dteh/fhttp by fraenkel 4 years ago
- http2: make Transport not reuse conns after a stream protocol error If a server sends a stream error of type "protocol error" to a client, that's the server saying "you're speaking http2 wrong". At t... — committed to dteh/fhttp by bradfitz 3 years ago
- [internal-branch.go1.17-vendor] http2: complete outstanding async write before reading the next frame Updates golang/go#42777 Fixes golang/go#47692 Change-Id: I8d7e35ded3fe508f6c285e84ba768104c3406b... — committed to fedosgad/oohttp by fraenkel 4 years ago
- [internal-branch.go1.17-vendor] http2: make Transport not reuse conns after a stream protocol error If a server sends a stream error of type "protocol error" to a client, that's the server saying "yo... — committed to fedosgad/oohttp by bradfitz 3 years ago
@gopherbot please open backport issues for 1.16 and 1.17.
Change https://golang.org/cl/330909 mentions this issue:
http2: complete outstanding async write before reading the next frameI understand the failure. The trick is coming up with a test case which may not be so simple.