go: x/net/http2: Blocked Write on single connection causes all future calls to block indefinitely
What version of Go are you using (go version)?
$ go version go version go1.12.7 linux/amd64
Does this issue reproduce with the latest release?
Yes.
What operating system and processor architecture are you using (go env)?
go env Output
$ go env GOARCH="amd64" GOBIN="" GOCACHE="/home/prashant/.cache/go-build" GOEXE="" GOFLAGS="" GOHOSTARCH="amd64" GOHOSTOS="linux" GOOS="linux" GOPATH="/home/prashant/go" GOPROXY="" GORACE="" GOROOT="/home/prashant/.gimme/versions/go1.12.7.linux.amd64" GOTMPDIR="" GOTOOLDIR="/home/prashant/.gimme/versions/go1.12.7.linux.amd64/pkg/tool/linux_amd64" GCCGO="gccgo" CC="gcc" CXX="g++" CGO_ENABLED="1" GOMOD="" CGO_CFLAGS="-g -O2" CGO_CPPFLAGS="" CGO_CXXFLAGS="-g -O2" CGO_FFLAGS="-g -O2" CGO_LDFLAGS="-g -O2" PKG_CONFIG="pkg-config" GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build705930436=/tmp/go-build -gno-record-gcc-switches"
What did you do?
Used the HTTP/2 library to make calls as a client. One of the backends stalled and stopped reading from the TCP connection (although it was still running and the TCP connection was active).
This caused the TCP write buffer on the client to fill up.
What did you expect to see?
I expected that all future calls to the blocked server would timeout eventually, and that calls to any other server would not be affected.
What did you see instead?
All calls on the transport are blocked forever, including calls to working backends. Stack traces should the calls are blocked on a mutex, so any timeouts are ignored.
Repro
Repro: https://github.com/prashantv/go-http2-stuck-repro/
The repro sets up 2 “backends”:
stuckURL: Accepts a TCP connection and does not read from the connection, to simulate a stuck backend
normal1URL: Echoes the request body back to the caller.
- A call is made to
stuckURLwith a payload that never completes. This causes the TCP write buffer to fill up, and theWriteto block, while holding the*ClientConn.wmu(write lock on the connection). We sleep a second to make sure the TCP buffer fills up, and the write is blocked.
Stack trace
goroutine 44 [IO wait]: internal/poll.runtime_pollWait(0x7f74ab342b98, 0x77, 0xffffffffffffffff) /home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/runtime/netpoll.go:182 +0x56 internal/poll.(*pollDesc).wait(0xc000144598, 0x77, 0x4000, 0x4800, 0xffffffffffffffff) /home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/internal/poll/fd_poll_runtime.go:87 +0x9b internal/poll.(*pollDesc).waitWrite(...) /home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/internal/poll/fd_poll_runtime.go:96 internal/poll.(*FD).Write(0xc000144580, 0xc0001f2000, 0x4009, 0x4800, 0x0, 0x0, 0x0) /home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/internal/poll/fd_unix.go:276 +0x26e net.(*netFD).Write(0xc000144580, 0xc0001f2000, 0x4009, 0x4800, 0x3, 0x3, 0xc0001aadaa) /home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/net/fd_unix.go:220 +0x4f net.(*conn).Write(0xc00013c0a0, 0xc0001f2000, 0x4009, 0x4800, 0x0, 0x0, 0x0) /home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/net/net.go:189 +0x69 github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2.stickyErrWriter.Write(0x93f320, 0xc00013c0a0, 0xc000195058, 0xc0001f2000, 0x4009, 0x4800, 0x4009, 0x0, 0x0) /home/prashant/go/src/github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2/transport.go:372 +0x68 bufio.(*Writer).Write(0xc000130480, 0xc0001f2000, 0x4009, 0x4800, 0xc000034070, 0xc000034000, 0x0) /home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/bufio/bufio.go:622 +0x145 github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2.(*Framer).endWrite(0xc0001ea000, 0x0, 0x0) /home/prashant/go/src/github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2/frame.go:366 +0xab github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2.(*Framer).WriteDataPadded(0xc0001ea000, 0x3, 0xc0001ee000, 0x4000, 0x4000, 0x0, 0x0, 0x0, 0x0, 0x0) /home/prashant/go/src/github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2/frame.go:685 +0x25d github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2.(*Framer).WriteData(...) /home/prashant/go/src/github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2/frame.go:643 github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2.(*clientStream).writeRequestBody(0xc000154640, 0x7f74ab343070, 0xc000138600, 0x7f74ab343090, 0xc000138600, 0x0, 0x0) /home/prashant/go/src/github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2/transport.go:1270 +0x337 github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2.(*Transport).getBodyWriterState.func1() /home/prashant/go/src/github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2/transport.go:2462 +0xc2 created by github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2.bodyWriterState.scheduleBodyWrite /home/prashant/go/src/github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2/transport.go:2509 +0xa3
- Another call to
stuckURLis made, which is blocked inroundTriptrying to get the*ClientConn.wmu(since (1) holds this lock), and with the*ClientConn.mulock held.
Stack trace
goroutine 46 [semacquire]: sync.runtime_SemacquireMutex(0xc000195054, 0x40de00) /home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/runtime/sema.go:71 +0x3d sync.(*Mutex).Lock(0xc000195050) /home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/sync/mutex.go:134 +0x109 github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2.(*ClientConn).roundTrip(0xc000194f00, 0xc000204100, 0x0, 0x0, 0x0, 0x0) /home/prashant/go/src/github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2/transport.go:1024 +0x467 github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2.(*Transport).RoundTripOpt(0xc00013a4e0, 0xc000204100, 0xc000118000, 0xc0001164b0, 0xc00013e03c, 0xc00013e040) /home/prashant/go/src/github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2/transport.go:447 +0x172 github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2.(*Transport).RoundTrip(0xc00013a4e0, 0xc000204100, 0xc00013a4e0, 0xbf490e456743e889, 0xe34199ec9) /home/prashant/go/src/github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2/transport.go:408 +0x3a net/http.send(0xc000204000, 0x93f060, 0xc00013a4e0, 0xbf490e456743e889, 0xe34199ec9, 0xc6c560, 0xc00011c050, 0xbf490e456743e889, 0x1, 0x0) /home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/net/http/client.go:250 +0x461 net/http.(*Client).send(0xc000134db0, 0xc000204000, 0xbf490e456743e889, 0xe34199ec9, 0xc6c560, 0xc00011c050, 0x0, 0x1, 0xc000020a80) /home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/net/http/client.go:174 +0xfb net/http.(*Client).do(0xc000134db0, 0xc000204000, 0x0, 0x0, 0x0) /home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/net/http/client.go:641 +0x279 net/http.(*Client).Do(…) /home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/net/http/client.go:509 net/http.(*Client).Get(0xc000134db0, 0xc000136180, 0x16, 0x8b330c, 0xf, 0x93f020) /home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/net/http/client.go:398 +0x9e created by github.com/prashantv/go-http2-stuck-repro.TestSendLargeUnreadPayload /home/prashant/go/src/github.com/prashantv/go-http2-stuck-repro/repro_test.go:46 +0x402
- Another call to
stuckURLgrabs the connection pool lock,*clientConnPool.mu, then iterates over all connections to that address to check theidleState()which tries to grab*ClientConn.mu. This blocks since (2) is already holding this lock.
Stack trace
goroutine 45 [semacquire]: sync.runtime_SemacquireMutex(0xc000194f54, 0xc00013e000) /home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/runtime/sema.go:71 +0x3d sync.(*Mutex).Lock(0xc000194f50) /home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/sync/mutex.go:134 +0x109 github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2.(*ClientConn).idleState(0xc000194f00, 0xc000130000) /home/prashant/go/src/github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2/transport.go:718 +0x41 github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2.(*clientConnPool).getClientConn(0xc000134fc0, 0xc000204300, 0xc00013e080, 0xf, 0xc00013a501, 0xc0001a5a18, 0x7c1030, 0xc00013a4e0) /home/prashant/go/src/github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2/client_conn_pool.go:89 +0xeb github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2.(*clientConnPool).GetClientConn(0xc000134fc0, 0xc000204300, 0xc00013e080, 0xf, 0xc00013e080, 0xf, 0x2) /home/prashant/go/src/github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2/client_conn_pool.go:47 +0x4e github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2.(*Transport).RoundTripOpt(0xc00013a4e0, 0xc000204300, 0xc000118100, 0xc0001165d0, 0xc00013e06c, 0xc00013e070) /home/prashant/go/src/github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2/transport.go:440 +0x105 github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2.(*Transport).RoundTrip(0xc00013a4e0, 0xc000204300, 0xc00013a4e0, 0xbf490e456749078a, 0xe341ebd66) /home/prashant/go/src/github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2/transport.go:408 +0x3a net/http.send(0xc000204200, 0x93f060, 0xc00013a4e0, 0xbf490e456749078a, 0xe341ebd66, 0xc6c560, 0xc00011c080, 0xbf490e456749078a, 0x1, 0x0) /home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/net/http/client.go:250 +0x461 net/http.(*Client).send(0xc000134db0, 0xc000204200, 0xbf490e456749078a, 0xe341ebd66, 0xc6c560, 0xc00011c080, 0x0, 0x1, 0xc000020a80) /home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/net/http/client.go:174 +0xfb net/http.(*Client).do(0xc000134db0, 0xc000204200, 0x0, 0x0, 0x0) /home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/net/http/client.go:641 +0x279 net/http.(*Client).Do(...) /home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/net/http/client.go:509 net/http.(*Client).Get(0xc000134db0, 0xc000136180, 0x16, 0x8b330c, 0xf, 0x93f020) /home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/net/http/client.go:398 +0x9e created by github.com/prashantv/go-http2-stuck-repro.TestSendLargeUnreadPayload /home/prashant/go/src/github.com/prashantv/go-http2-stuck-repro/repro_test.go:46 +0x402
- Now any calls made using the same
*clientConnPool(e.g., any call through the same client/transport) block on trying to get*clientConnPool.mu.
Stack trace
3 @ 0x430b8f 0x440d79 0x440d4f 0x440aed 0x4663d9 0x78e6c6 0x78e62e 0x7a8b85 0x7a86ba 0x6af511 0x6aef0b 0x6b0519 0x6b004e 0x6b003b 0x45d481 # 0x440aec sync.runtime_SemacquireMutex+0x3c /home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/runtime/sema.go:71 # 0x4663d8 sync.(*Mutex).Lock+0x108 /home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/sync/mutex.go:134 # 0x78e6c5 github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2.(*clientConnPool).getClientConn+0x65 /home/prashant/go/src/github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2/client_conn_pool.go:87 # 0x78e62d github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2.(*clientConnPool).GetClientConn+0x4d /home/prashant/go/src/github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2/client_conn_pool.go:47 # 0x7a8b84 github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2.(*Transport).RoundTripOpt+0x104 /home/prashant/go/src/github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2/transport.go:440 # 0x7a86b9 github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2.(*Transport).RoundTrip+0x39 /home/prashant/go/src/github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2/transport.go:408 # 0x6af510 net/http.send+0x460 /home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/net/http/client.go:250 # 0x6aef0a net/http.(*Client).send+0xfa /home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/net/http/client.go:174 # 0x6b0518 net/http.(*Client).do+0x278 /home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/net/http/client.go:641 # 0x6b004d net/http.(*Client).Do+0x9d /home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/net/http/client.go:509 # 0x6b003a net/http.(*Client).Get+0x8a /home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/net/http/client.go:398
I’ve reduced the size of TCP read/write buffers to trigger the issue more quickly.
The repro includs a flag for controlling steps 2/3. Setting the --num-stuck-calls to 2 or higher will reproduce the issue (the echo calls will fail), while 0 or 1 will cause the test to pass.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 1
- Comments: 17 (16 by maintainers)
Commits related to this issue
- turn off http2 for TLS setups for now due to lots of issues with x/net/http2, as well as the hundled h2_bundle.go in the go runtime should be avoided for now. https://github.com/golang/go/issues/235... — committed to harshavardhana/minio by harshavardhana 3 years ago
- turn off http2 for TLS setups for now due to lots of issues with x/net/http2, as well as the bundled h2_bundle.go in the go runtime should be avoided for now. https://github.com/golang/go/issues/235... — committed to harshavardhana/minio by harshavardhana 3 years ago
- turn off http2 for TLS setups for now (#11523) due to lots of issues with x/net/http2, as well as the bundled h2_bundle.go in the go runtime should be avoided for now. https://github.com/golang/... — committed to minio/minio by harshavardhana 3 years ago
- Rebase to upstream master (#9) * fix: metacache should only rename entries during cleanup (#11503) To avoid large delays in metacache cleanup, use rename instead of recursive delete calls, rename... — committed to poornas/minio by krisis 3 years ago
- turn off http2 for TLS setups for now (#11523) due to lots of issues with x/net/http2, as well as the bundled h2_bundle.go in the go runtime should be avoided for now. https://github.com/golang/... — committed to minio/minio by harshavardhana 3 years ago
- turn off http2 for TLS setups for now (#11523) due to lots of issues with x/net/http2, as well as the bundled h2_bundle.go in the go runtime should be avoided for now. https://github.com/golang/go/i... — committed to kerneltime/minio by harshavardhana 3 years ago
- turn off http2 for TLS setups for now (#11523) (#11569) due to lots of issues with x/net/http2, as well as the bundled h2_bundle.go in the go runtime should be avoided for now. https://github.co... — committed to minio/minio by kerneltime 3 years ago
I believe the problem of new requests being blocked by stuck network writes in old ones should have been fixed by CL 353390 and CL 349594. (And maybe a couple other CLs in that general range.)
A bug in ping checking could result in a permanently write-blocked connection never being flagged as dead. CL 354389 should have fixed that.
There’s a remaining case where
Response.Body.Closecan block indefinitely on a write-blocked connection. That’s tracked in #48908.It might also be nice to be able to detect a write-blocked connection independently of a ping timeout. That’s #48830.
I think the cases described in this issue are now all either fixed or covered by more specific issues, so I’m going to close this one. Let me know if there’s anything I’m missing.