go: net/http: Dial I/O Timeout even when request context is not canceled

Seems related to https://github.com/golang/go/commit/869e576517f825aecdc8730b0d22f8d6b59bd749 (c.f. https://github.com/kubernetes-sigs/aws-encryption-provider/issues/61).

Maybe this is an expected behavior… Is it possible to see dial timeouts even when the request context has not been canceled? I am using https://golang.org/pkg/net/http/#RoundTripper with dial timeout 30 seconds and request context is 300µs. If this is possible, can you help understand why this happens?

What version of Go are you using (go version)?

$ go version

go version go1.13.6 darwin/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env

GO111MODULE=“” GOARCH=“amd64” GOBIN=“” GOCACHE=“/Users/leegyuho/Library/Caches/go-build” GOENV=“/Users/leegyuho/Library/Application Support/go/env” GOEXE=“” GOFLAGS=“” GOHOSTARCH=“amd64” GOHOSTOS=“darwin” GONOPROXY=“" GONOSUMDB="” GOOS=“darwin” GOPATH=“/Users/leegyuho/go” GOPRIVATE=“*” GOPROXY=“https://proxy.golang.org,direct” GOROOT=“/usr/local/go” GOSUMDB=“sum.golang.org” GOTMPDIR=“” GOTOOLDIR=“/usr/local/go/pkg/tool/darwin_amd64” GCCGO=“gccgo” AR=“ar” CC=“clang” CXX=“clang++” CGO_ENABLED=“1” CGO_CFLAGS=“-g -O2” CGO_CPPFLAGS=“” CGO_CXXFLAGS=“-g -O2” CGO_FFLAGS=“-g -O2” CGO_LDFLAGS=“-g -O2” PKG_CONFIG=“pkg-config” GOGCCFLAGS=“-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/y_/_dn293xd5kn7xlg6jvp7jpmxs99pm9/T/go-build128444364=/tmp/go-build -gno-record-gcc-switches -fno-common”

What did you do?

package main

import (
	"context"
	"fmt"
	"net/http"
	"net/http/httptest"
	"strings"
	"time"
)

func main() {
	// this happens about 50% of the time, so try it a few times.
	for i := 0; i < 10; i++ {
		fmt.Println("trying", i)
		try()
	}
}

func try() {
	mux := http.NewServeMux()
	mux.HandleFunc("/test", func(w http.ResponseWriter, req *http.Request) {
		switch req.Method {
		case "GET":
			fmt.Fprint(w, `test`)
		default:
			http.Error(w, "Method Not Allowed", 405)
		}
	})

	ts := httptest.NewServer(mux)
	defer ts.Close()

	u := ts.URL + "/test"

        // default dial timeout is 30-second
        // https://golang.org/pkg/net/http/#RoundTripper
	cli := http.DefaultClient

	timeout := 300 * time.Microsecond
	reqs := 20
	errc := make(chan error, reqs)
	for i := 0; i < reqs; i++ {
		go func() {
			ctx, cancel := context.WithTimeout(context.TODO(), timeout)
			defer cancel()

			req, err := http.NewRequest(http.MethodGet, u, nil)
			if err != nil {
				errc <- err
				return
			}
			_, err = cli.Do(req.WithContext(ctx))
			if err != nil {
				// can be: ctx.Err() == nil && err == "i/o timeout"
				// Q. but how is that possible?
				fmt.Println("Do failed with", err, "/ context error:", ctx.Err())
			}
			errc <- err
		}()
	}

	// "context deadline exceeded" for requests after timeout
	exp := `context deadline`
	for i := 0; i < reqs; i++ {
		err := <-errc
		if err == nil {
			continue
		}
		fmt.Println("error:", err)
		if !strings.Contains(err.Error(), exp) {
			panic(fmt.Sprintf("#%d: expected %q, got %v", i, exp, err))
		}
	}
}

What did you expect to see?

No panic.

What did you see instead?

trying 0
Do failed with Get http://127.0.0.1:54098/test: dial tcp 127.0.0.1:54098: i/o timeout / context error: <nil>
error: Get http://127.0.0.1:54098/test: dial tcp 127.0.0.1:54098: i/o timeout
Do failed with Get http://127.0.0.1:54098/test: dial tcp 127.0.0.1:54098: i/o timeout / context error: <nil>
Do failed with Get http://127.0.0.1:54098/test: dial tcp 127.0.0.1:54098: i/o timeout / context error: <nil>
Do failed with Get http://127.0.0.1:54098/test: dial tcp 127.0.0.1:54098: i/o timeout / context error: <nil>
Do failed with Get http://127.0.0.1:54098/test: dial tcp 127.0.0.1:54098: i/o timeout / context error: <nil>
Do failed with Get http://127.0.0.1:54098/test: dial tcp 127.0.0.1:54098: i/o timeout / context error: <nil>
Do failed with Get http://127.0.0.1:54098/test: dial tcp 127.0.0.1:54098: i/o timeout / context error: <nil>
Do failed with Get http://127.0.0.1:54098/test: dial tcp 127.0.0.1:54098: i/o timeout / context error: <nil>
Do failed with Get http://127.0.0.1:54098/test: dial tcp 127.0.0.1:54098: i/o timeout / context error: <nil>
Do failed with Get http://127.0.0.1:54098/test: context deadline exceeded / context error: context deadline exceeded
Do failed with Get http://127.0.0.1:54098/test: context deadline exceeded / context error: context deadline exceeded
Do failed with Get http://127.0.0.1:54098/test: context deadline exceeded / context error: context deadline exceeded
Do failed with Get http://127.0.0.1:54098/test: dial tcp 127.0.0.1:54098: i/o timeout / context error: <nil>
panic: #0: expected "context deadline", got Get http://127.0.0.1:54098/test: dial tcp 127.0.0.1:54098: i/o timeout

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Reactions: 1
  • Comments: 17 (8 by maintainers)

Commits related to this issue

Most upvoted comments

@networkimprov Yes, I should’ve done it in the first place, thanks for asking. The following code prints errors that are not deadline exceeded for each request done concurrently as explained in the previous comment.

package main

import (
	"context"
	"fmt"
	"net"
	"net/http"
	"strconv"
	"strings"
	"sync"
	"time"
)

const (
	n       = 1000
	timeout = 10000 * time.Microsecond
)

func main() {
	ctx, cancel := context.WithDeadline(context.Background(), time.Now().Add(timeout))
	defer cancel()
	doConcurrentRequests(ctx, n)
}

func doConcurrentRequests(ctx context.Context, n int) {
	// Explicitly configure Dial Timeout to 30 seconds
	// (this is not necessary since the default client already has this configuration)
	transport := http.DefaultTransport
	transport.(*http.Transport).DialContext = (&net.Dialer{
		Timeout:   30 * time.Second,
		KeepAlive: 30 * time.Second,
		DualStack: true,
	}).DialContext
	client := http.DefaultClient
	client.Transport = transport

	wg := sync.WaitGroup{}
	for i := 0; i < n; i++ {
		wg.Add(1)
		go func(id int) {
			defer wg.Done()
			now := time.Now()
			err := doRequest(client, ctx)
			if err != nil && !strings.Contains(err.Error(), "deadline") {
				fmt.Println(strconv.Itoa(id) + " - Request took:" + time.Since(now).String() + " " + err.Error())
			}
		}(i)
	}
	wg.Wait()
}

func doRequest(client *http.Client, ctx context.Context) error {
	req, err := http.NewRequest("GET", "http://example.com", nil)
	if err != nil {
		return fmt.Errorf("while making new request: %w", err)
	}

	_, err = client.Do(req.WithContext(ctx))
	if err != nil {
		return fmt.Errorf("while executing request: %w", err)
	}
	return nil
}

(Btw I know this is far from efficient code 😄 )

Thank you for the clarification with clearer wording Liam, and my apologies for taking it the wrong way!

On Mon, Feb 3, 2020 at 3:08 PM Liam notifications@github.com wrote:

@odeke-em https://github.com/odeke-em I think you misread my comment. This appears to be a question, not a bug report, and usually those are referred to the forums so as not to overburden folks like you!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/36848?email_source=notifications&email_token=ABFL3V572MSD2W57UETYXPDRBCPYBA5CNFSM4KMXB7WKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKVXVFY#issuecomment-581663383, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABFL3V6MLK5Q6FIL4XSS3MDRBCPYBANCNFSM4KMXB7WA .

Sorry for jumping on this one here but it seems to me this could highlight some kind of inconsistency.

I have a similar issue when performing multiple GET request, each from a separate go routine, each with a context derived from a parent context with a deadline of 2.5 seconds. If the machine is put under heavy load, some of the requests start failing with a deadline exceeded error, and some with i/o timeout which I find misleading since the Timeout set for the dial is 30 seconds (i am using http.DefaultClient and http.DefaultTransport).

I think it would be more consistent to have a deadline exceeded error in this case since the Dial was interrupted because of the deadline set instead of a timeout on a dial as the error would suggest. Does this make sense?

(Sorry if it doesn’t, first time commenting on an issue, please be kind 😄)

In our case this made the difference between investigating a network issue and a performance issue.

(Edited to add comment about impact on investigation)