go: net/http: `readLoopPeekFailLocked: ` from client when server sends response immediately

What version of Go are you using (`go version`)?

$ go version
go version go1.12.1 linux/amd64

Does this issue reproduce with the latest release?

Yep. 1.14.1

What operating system and processor architecture are you using (`go env`)?

go env Output

$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/user/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/user/go"
GOPROXY=""
GORACE=""
GOROOT="/opt/go"
GOTMPDIR=""
GOTOOLDIR="/opt/go/pkg/tool/linux_amd64"
GCCGO="/usr/bin/gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build705707104=/tmp/go-build -gno-record-gcc-switches"

What did you do?

100% reproducible on Linux x64, and play.golang.org. The behavior is unpredictable and the code (see below) to reproduce should be executed few times to got the error and the log.

The error is:

Get http://127.0.0.1:9101/path: readLoopPeekFailLocked: <nil>

The log is:

transport.go:1829: Unsolicited response received on idle HTTP channel starting with "HTTP/1.1 502 Bad Gateway\r\n\r\n"; err=<nil>

What happens?

HTTP server sends response without reading request and without a delay. This approach is good for proxy servers where a client determined by his IP (i.e. using white-list). Thus, we haven’t to read request, if the user is not white-listed.

Reproducing

(try few times to got error)

play.golang.org

package main

import (
	"fmt"
	"log"
	"net"
	"net/http"
	"net/url"
	"sync"
	"time"
)

func mustParseURL(uri string) *url.URL {
	x, err := url.Parse(uri)
	if err != nil {
		panic(err)
	}
	return x
}

func main() {

	l, err := net.Listen("tcp", "127.0.0.1:9100")
	if err != nil {
		log.Fatal(err)
	}
	go func(l net.Listener) {
		for {
			c, err := l.Accept()
			if err != nil {
				log.Fatal(err)
			}
			go func(c net.Conn) {
				defer c.Close()
				fmt.Fprint(c, "HTTP/1.1 502 Bad Gateway\r\n\r\n")
			}(c)
		}
	}(l)
	time.Sleep(100 * time.Millisecond)

	var cl http.Client
	cl.Transport = &http.Transport{
		Proxy:           http.ProxyURL(mustParseURL("http://127.0.0.1:9100/")),
		MaxConnsPerHost: 100,
	}

	const N = 100

	var wg sync.WaitGroup
	wg.Add(N)
	for i := 0; i < N; i++ {
		go func() {
			defer wg.Done()
			resp, err := cl.Get("http://127.0.0.1:9101/path")
			if err != nil {
				log.Print("error:", err)
				return
			}
			if resp.StatusCode != 502 {
				log.Println("status:", resp.StatusCode)
			}
			resp.Body.Close()
		}()
	}

	wg.Wait()

}

Also, playing with the N constant I’ve got this error for N=10, N=100 and N=1000. For N=1000 this error occurs every time. For N=10 rarely. Seems, it depends on goroutines scheduling.

What did you expect to see?

Got this error every time or never.

What did you see instead?

It’s unpredictable. That makes testing a bit harder.

Also

Probably related issues #15253, #15446

About this issue

Original URL
State: open
Created 5 years ago
Reactions: 4
Comments: 20 (4 by maintainers)

Commits related to this issue

avoided https://github.com/golang/go/issues/31259 — committed to lamg/proxy by lamg 5 years ago
fix(trace): remove GotConn delay The delay causes a race condition in the go transport that results in a 502 Bad Gateway with: `endpoint_failure (readLoopPeekFailLocked: %!w(<nil>))`. This happens... — committed to sap-contributions/gorouter by maxmoehl a year ago
fix(trace): remove GotConn delay The delay causes a race condition in the go transport that results in a 502 Bad Gateway with: `endpoint_failure (readLoopPeekFailLocked: %!w(<nil>))`. This happens... — committed to cloudfoundry/gorouter by maxmoehl a year ago
fix : https://github.com/golang/go/issues/31259 — committed to yaklang/yaklang by Go0p a year ago
fix : https://github.com/golang/go/issues/31259 — committed to yaklang/yaklang by Go0p a year ago

Most upvoted comments

For those who are interested, I have finally figured out the reason for the readLoopPeekFailLocked: <nil> errors occurred in the production environment.

TL;DR: An abnormal 204 response, which includes a body, will cause the readLoopPeekFailLocked: <nil> error.

Background

We have deployed two types of backends behind Caddy (a web server in Go):

Client -> Caddy -> Go Backend
Client -> Caddy -> Erlang Backend (Cowboy based)

But only HTTP requests in the 2nd case (i.e. Caddy <-> Erlang Backend) are encountering the readLoopPeekFailLocked: <nil> errors occasionally.

Root Cause

The Erlang Backend is using Cowboy 2.7.0, which unfortunately allows developers to construct a 204 response with a body. And the fact is that our Erlang developer does do this!

After correcting these abnormal 204 responses, all readLoopPeekFailLocked: <nil> errors disappear.

NOTE: Cowboy 2.8.0 has fixed this issue (commit).

Reproducing Code

package main

import (
	"log"
	"net/http"
	"time"
)

const port = ":8081"

func main() {
	go abnormalHTTPServer()
	time.Sleep(100 * time.Millisecond)

	fetchLoop()
}

func fetchLoop() {
	client := &http.Client{Timeout: 2 * time.Second}
	for i := 0; i < 2; i++ {
		resp, err := client.Get("http://localhost" + port)
		if err != nil {
			log.Println(err)
			continue
		}
		resp.Body.Close()
	}
}

// abnormalHTTPServer runs an HTTP server that sends a 204 response with a body.
func abnormalHTTPServer() {
	http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
		hj, ok := w.(http.Hijacker)
		if !ok {
			panic("impossible for HTTP/1.x")
		}

		conn, _, err := hj.Hijack()
		if err != nil {
			panic(err)
		}

		// Mimic the abnormal 204 response from Cowboy 2.7.0.
		conn.Write([]byte("HTTP/1.1 204 No Content\r\n\r\n{}"))
		//conn.Close()
	})
	log.Fatal(http.ListenAndServe(port, nil))
}

RussellLuo on Sep 23, 2021

I can report that we’re seeing this issue in production at Let’s Encrypt, using https://github.com/letsencrypt/boulder/. Specifically, there are some “hasty” HTTP servers out there that will send an HTTP response as soon as the TCP connection completes. When we make multiple concurrent requests to such a server, we get Unsolicited response in our stdout logs.

The reason we’d like to see this fixed: In general we try to route all log output through our logging system, and this is the one place in net/http that regularly prints to stdout. I have a minimized repro case below that is a bit shorter than ones posted to this thread. Reproduces 100% on Go 1.14.2 (latest as of 2020-04-23).

package main

import (
	"log"
	"net"
	"net/http"
	"sync"
	"time"
)

const port = ":8081"

var wg sync.WaitGroup

func main() {
	go hastyHTTPServer()
	time.Sleep(100 * time.Millisecond)
	for i := 0; i < 2; i++ {
		wg.Add(1)
		go fetchLoop()
	}
	wg.Wait()
}

func fetchLoop() {
	for i := 0; i < 1e3; i++ {
		resp, err := http.Get("http://localhost" + port)
		if err != nil {
			log.Println(err)
			continue
		}
		resp.Body.Close()
	}
	wg.Done()
}

// hastyHTTPServer runs an HTTP server that sends a response and closes the
// connection as soon as a client makes a TCP connection, rather than waiting
// for the client to actually make an HTTP request.
// A real-world example of this type of server can be seen at 8849777.com
func hastyHTTPServer() {
	ln, err := net.Listen("tcp", port)
	if err != nil {
		log.Fatal(err)
	}
	for {
		conn, err := ln.Accept()
		if err != nil {
			log.Fatal(err)
		}
		go func() {
			conn.Write([]byte("HTTP/1.0 200 OK\r\n\r\nQuick response!"))
			conn.Close()
		}()
	}
}

The output is like below (the number of errors varies from run to run, but there’s almost always at least one error):

$ go run hastyserver.go 
2020/04/23 16:26:37 Unsolicited response received on idle HTTP channel starting with "HTTP/1.0 200 OK\r\n\r\nQuick response!"; err=<nil>
2020/04/23 16:26:37 Get "http://localhost:8081": readLoopPeekFailLocked: <nil>
2020/04/23 16:26:37 Unsolicited response received on idle HTTP channel starting with "HTTP/1.0 200 OK\r\n\r\nQuick response!"; err=<nil>
2020/04/23 16:26:37 Get "http://localhost:8081": readLoopPeekFailLocked: <nil>
2020/04/23 16:26:37 Unsolicited response received on idle HTTP channel starting with "HTTP/1.0 200 OK\r\n\r\nQuick response!"; err=<nil>
2020/04/23 16:26:37 Get "http://localhost:8081": readLoopPeekFailLocked: <nil>

Expected behavior:

No errors output.

Additional detail:

This test case also reproduces when I set http.DisableKeepAlives, and when I set Connection:Close on the response from the hasty server. Also, each connection yields at most 1 HTTP response, so this is not really an issue about an unsolicited response on an idle channel as the message suggests, but seems like a race between roundTrip incrementing numExpectedResponses and readLoop checking whether numExpectedResponses is 0. I don’t know this code very well but I wonder if it makes sense to treat every connection as starting with numExpectedResponses at 1?

I’ve tried to figure out whether hasty responses are allowed or disallowed by RFC 7230. The closest language I can find is:

HTTP only presumes a reliable transport with in-order delivery of requests and the corresponding in-order delivery of responses. The mapping of HTTP request and response structures onto the data units of an underlying transport protocol is outside the scope of this specification.

Which seems to implicitly allow them. At any rate it’s not clear that it would be possible to consistently reject hasty responses, since there’s an unavoidable network race between writing a request and reading a response. Given that, I think it probably makes sense to make the current behavior (accepting hasty responses) more consistent.

jsha on Apr 24, 2020

I can confirm that this patch, which starts persistConns with numExpectedResponses = 1, causes the repro code above to run without errors.

0001-Start-persistConns-with-numExpectedResponses-1.patch.txt