go: x/net/http2: investigate data race in Transport Request.Body reuse when GetBody is not defined

What version of Go are you using (go version)?

$ go version go version go1.10.1 linux/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

GOARCH=“amd64” GOBIN=“” GOCACHE=“/home/test/.cache/go-build” GOEXE=“” GOHOSTARCH=“amd64” GOHOSTOS=“linux” GOOS=“linux” GORACE=“” GOROOT=“/home/test/go” GOTMPDIR=“” GOTOOLDIR=“/home/test/go/pkg/tool/linux_amd64” GCCGO=“gccgo” CC=“gcc” CXX=“g++” CGO_ENABLED=“1” CGO_CFLAGS=“-g -O2” CGO_CPPFLAGS=“” CGO_CXXFLAGS=“-g -O2” CGO_FFLAGS=“-g -O2” CGO_LDFLAGS=“-g -O2” PKG_CONFIG=“pkg-config” GOGCCFLAGS=“-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build614182611=/tmp/go-build -gno-record-gcc-switches”

and nginx version: nginx/1.13.3

What did you do?

Here is my client:

package main

import (
    "fmt"
    "net/http"
    "sync"
    "time"
    "errors"
    "bytes"
    "io/ioutil"
)

type Client struct {
    urlBase    string
    httpClient *http.Client
}

func NewClient(url string) *Client {
    cc := &Client{
        urlBase:    url,
        httpClient: &http.Client{Transport: &http.Transport{MaxIdleConnsPerHost: 100, MaxIdleConns: 100, IdleConnTimeout: 90 * time.Second, ExpectContinueTimeout: 5 * time.Second}},
    }
    cc.httpClient.Timeout = 5 * time.Second
    return cc
}

func (self *Client) Send(content []byte) error {
    if content == nil {
        return errors.New("nil content")
    }

    body := &bytes.Buffer{}
    body.Write(content)

    begin := time.Now()
    req, err := http.NewRequest("POST", self.urlBase, body)
    if err != nil {
        return fmt.Errorf("Send %+v (from %v to %v)", err, begin, time.Now())
    }
    req.Header = make(http.Header)
    req.Header.Set("Content-Type", "application/json")
    begin = time.Now()
    resp, err := self.httpClient.Do(req)
    if err != nil {
        return fmt.Errorf("Send %+v (from %v to %v)", err, begin, time.Now())
    }
    defer resp.Body.Close()

    if resp.StatusCode >= http.StatusBadRequest {
        bs, _ := ioutil.ReadAll(resp.Body)
        return fmt.Errorf("Send : HTTP : %d , BODY : %s , REQ : %s", resp.StatusCode, string(bs), string(content))
    }

    bs, err := ioutil.ReadAll(resp.Body)
    if err != nil {
        return fmt.Errorf("Send : ERR : %v", err)
    }
    fmt.Println("Send : Resp : ", string(bs))
    return nil
}

func main() {
    client := NewClient(
        "https://test.cn", // I use /etc/hosts to bind the domain
    )
    var content = []byte(`{"test":1}`)
    var wg sync.WaitGroup
    for i := 0; i < 100; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            client.Send(content)
        }()
    }
    wg.Wait()
}

And here is my nginx configure of http2:

    http2_max_concurrent_streams 40960;
    http2_max_requests 3;
    http2_max_field_size  8k;
    http2_max_header_size  32k;
    http2_recv_buffer_size 1024k;
    http2_recv_timeout  120s;

What did you expect to see?

I expect all requests are successfully.

What did you see instead?

But I got 400 occasionally and found client prematurely closed stream: only 0 out of 10 bytes of request body received line in nginx log.

I used GODEBUG to try to figure out, and found data not sent after END_HEADERS

2018/04/23 18:32:25 http2: Transport encoding header ":method" = "POST"
2018/04/23 18:32:25 http2: Transport encoding header ":path" = "/"
2018/04/23 18:32:25 http2: Transport encoding header ":scheme" = "https"
2018/04/23 18:32:25 http2: Transport encoding header "content-type" = "application/json"
2018/04/23 18:32:25 http2: Transport encoding header "content-length" = "10"
2018/04/23 18:32:25 http2: Transport encoding header "accept-encoding" = "gzip"
2018/04/23 18:32:25 http2: Transport encoding header "user-agent" = "Go-http-client/2.0"
2018/04/23 18:32:25 http2: Framer 0xc4205480e0: wrote HEADERS flags=END_HEADERS stream=5 len=8
2018/04/23 18:32:25 http2: Framer 0xc4205480e0: wrote DATA flags=END_STREAM stream=5 len=0 data=""

That’s the reason why nginx responsed 400 directly.

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 69 (60 by maintainers)

Commits related to this issue

Most upvoted comments

I can confirm that my go-issue-25009 repro still fails on go1.12 and go1.13 for some reason. Seems like issue is not fixed (or there is a regression), I will investigate it further.

@odeke-em I’ve created an example via docker-compose. Please take a look at https://github.com/ernado/go-issue-25009

Hi @ernado,

I am still seeing this issue. I have tested on both Go 1.12.9 and 1.13

It seems that others are as well in https://github.com/golang/go/issues/32441

Gentle ping for @bradfitz @ianlancetaylor Please take a look, seems like we are close to resolving that issue.

I agree that you found the bug. The fix while it works doesn’t seem right. @bradfitz : The bug is when http2 fails to obtain a connection and leaves http2/transport.go(RoundTripOpt:389), the new request is not returned to be retried from http/transport.go (roundtrip:463) instead the original one is used with a body that has already been read.

Something to consider is the way h2 is integrated with h1 in the std lib to handle upgrades (I’m not used to this setup as I mostly use h2 lib directly). In the configureTransport we set the h2 transport with a special version of the pool named noDialClientConnPool which is not able to dial on call to GetClientConn. Instead, new connections are only added to the pool on upgrade, by promoting an existing TLS connection created by the h1 Transport.

I’m not sure what happen in the case this connection is becoming invalid while the request is being sent. As seen, h2 may decide to retry, but if the pool does not contain a valid connection anymore and is not able to dial, you will get the http2: no cached connection was available you are seeing. I guess the proper way to retry would be to go back to h1 and go over the full upgrade process all over again.

So basically, without GetBody set on request, the request can’t be retried and when set, it can’t either because it can’t reconnect, giving a different error. Nginx might send 400 back because Go decides to not send the DATA frames after the HEADERS because it received a GOAWAY frame in the meantime and failed to properly cancel the stream. Note that I did not verify those assumptions, I’m just interpreting what I’m reading in the code.

Thank you @ictar for the report, we’ll investigate it.

@agnivade thank you for handling this. I just wanted to add that I can relate to @ictar’s need to have nginx for a repro(many companies that I know heavily use it) and that the question of absolving the issue only because Go’s servers can’t reproduce it might be tricky. This is because many companies will usually put nginx or some other web server as their frontend to their web apps, and from there proxy the content to polyglot services on their backend e.g Go, Node.js, Python(etc) web server. Interoperability between protocols and servers matters a lot and discrepancies between Go clients talking to various servers creep in 😃