go: net/http: `readLoopPeekFailLocked: ` from client when server sends response immediately
What version of Go are you using (go version
)?
$ go version go version go1.12.1 linux/amd64
Does this issue reproduce with the latest release?
Yep. 1.14.1
What operating system and processor architecture are you using (go env
)?
go env
Output
$ go env GOARCH="amd64" GOBIN="" GOCACHE="/home/user/.cache/go-build" GOEXE="" GOFLAGS="" GOHOSTARCH="amd64" GOHOSTOS="linux" GOOS="linux" GOPATH="/home/user/go" GOPROXY="" GORACE="" GOROOT="/opt/go" GOTMPDIR="" GOTOOLDIR="/opt/go/pkg/tool/linux_amd64" GCCGO="/usr/bin/gccgo" CC="gcc" CXX="g++" CGO_ENABLED="1" GOMOD="" CGO_CFLAGS="-g -O2" CGO_CPPFLAGS="" CGO_CXXFLAGS="-g -O2" CGO_FFLAGS="-g -O2" CGO_LDFLAGS="-g -O2" PKG_CONFIG="pkg-config" GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build705707104=/tmp/go-build -gno-record-gcc-switches"
What did you do?
100% reproducible on Linux x64, and play.golang.org. The behavior is unpredictable and the code (see below) to reproduce should be executed few times to got the error and the log.
The error is:
Get http://127.0.0.1:9101/path: readLoopPeekFailLocked: <nil>
The log is:
transport.go:1829: Unsolicited response received on idle HTTP channel starting with "HTTP/1.1 502 Bad Gateway\r\n\r\n"; err=<nil>
What happens?
HTTP server sends response without reading request and without a delay. This approach is good for proxy servers where a client determined by his IP (i.e. using white-list). Thus, we haven’t to read request, if the user is not white-listed.
Reproducing
(try few times to got error)
package main
import (
"fmt"
"log"
"net"
"net/http"
"net/url"
"sync"
"time"
)
func mustParseURL(uri string) *url.URL {
x, err := url.Parse(uri)
if err != nil {
panic(err)
}
return x
}
func main() {
l, err := net.Listen("tcp", "127.0.0.1:9100")
if err != nil {
log.Fatal(err)
}
go func(l net.Listener) {
for {
c, err := l.Accept()
if err != nil {
log.Fatal(err)
}
go func(c net.Conn) {
defer c.Close()
fmt.Fprint(c, "HTTP/1.1 502 Bad Gateway\r\n\r\n")
}(c)
}
}(l)
time.Sleep(100 * time.Millisecond)
var cl http.Client
cl.Transport = &http.Transport{
Proxy: http.ProxyURL(mustParseURL("http://127.0.0.1:9100/")),
MaxConnsPerHost: 100,
}
const N = 100
var wg sync.WaitGroup
wg.Add(N)
for i := 0; i < N; i++ {
go func() {
defer wg.Done()
resp, err := cl.Get("http://127.0.0.1:9101/path")
if err != nil {
log.Print("error:", err)
return
}
if resp.StatusCode != 502 {
log.Println("status:", resp.StatusCode)
}
resp.Body.Close()
}()
}
wg.Wait()
}
Also, playing with the N constant I’ve got this error for N=10, N=100 and N=1000. For N=1000 this error occurs every time. For N=10 rarely. Seems, it depends on goroutines scheduling.
What did you expect to see?
Got this error every time or never.
What did you see instead?
It’s unpredictable. That makes testing a bit harder.
Also
About this issue
- Original URL
- State: open
- Created 5 years ago
- Reactions: 4
- Comments: 20 (4 by maintainers)
Commits related to this issue
- avoided https://github.com/golang/go/issues/31259 — committed to lamg/proxy by lamg 5 years ago
- fix(trace): remove GotConn delay The delay causes a race condition in the go transport that results in a 502 Bad Gateway with: `endpoint_failure (readLoopPeekFailLocked: %!w(<nil>))`. This happens... — committed to sap-contributions/gorouter by maxmoehl a year ago
- fix(trace): remove GotConn delay The delay causes a race condition in the go transport that results in a 502 Bad Gateway with: `endpoint_failure (readLoopPeekFailLocked: %!w(<nil>))`. This happens... — committed to cloudfoundry/gorouter by maxmoehl a year ago
- fix : https://github.com/golang/go/issues/31259 — committed to yaklang/yaklang by Go0p a year ago
- fix : https://github.com/golang/go/issues/31259 — committed to yaklang/yaklang by Go0p a year ago
For those who are interested, I have finally figured out the reason for the
readLoopPeekFailLocked: <nil>
errors occurred in the production environment.TL;DR: An abnormal 204 response, which includes a body, will cause the
readLoopPeekFailLocked: <nil>
error.Background
We have deployed two types of backends behind Caddy (a web server in Go):
But only HTTP requests in the 2nd case (i.e. Caddy <-> Erlang Backend) are encountering the
readLoopPeekFailLocked: <nil>
errors occasionally.Root Cause
The Erlang Backend is using Cowboy 2.7.0, which unfortunately allows developers to construct a 204 response with a body. And the fact is that our Erlang developer does do this!
After correcting these abnormal 204 responses, all
readLoopPeekFailLocked: <nil>
errors disappear.NOTE: Cowboy 2.8.0 has fixed this issue (commit).
Reproducing Code
I can report that we’re seeing this issue in production at Let’s Encrypt, using https://github.com/letsencrypt/boulder/. Specifically, there are some “hasty” HTTP servers out there that will send an HTTP response as soon as the TCP connection completes. When we make multiple concurrent requests to such a server, we get
Unsolicited response
in our stdout logs.The reason we’d like to see this fixed: In general we try to route all log output through our logging system, and this is the one place in
net/http
that regularly prints to stdout. I have a minimized repro case below that is a bit shorter than ones posted to this thread. Reproduces 100% on Go 1.14.2 (latest as of 2020-04-23).The output is like below (the number of errors varies from run to run, but there’s almost always at least one error):
Expected behavior:
No errors output.
Additional detail:
This test case also reproduces when I set
http.DisableKeepAlives
, and when I setConnection:Close
on the response from the hasty server. Also, each connection yields at most 1 HTTP response, so this is not really an issue about an unsolicited response on an idle channel as the message suggests, but seems like a race between roundTrip incrementing numExpectedResponses and readLoop checking whether numExpectedResponses is 0. I don’t know this code very well but I wonder if it makes sense to treat every connection as starting with numExpectedResponses at 1?I’ve tried to figure out whether hasty responses are allowed or disallowed by RFC 7230. The closest language I can find is:
Which seems to implicitly allow them. At any rate it’s not clear that it would be possible to consistently reject hasty responses, since there’s an unavoidable network race between writing a request and reading a response. Given that, I think it probably makes sense to make the current behavior (accepting hasty responses) more consistent.
I can confirm that this patch, which starts persistConns with numExpectedResponses = 1, causes the repro code above to run without errors.
0001-Start-persistConns-with-numExpectedResponses-1.patch.txt