go: net/http: Dial I/O Timeout even when request context is not canceled
Seems related to https://github.com/golang/go/commit/869e576517f825aecdc8730b0d22f8d6b59bd749 (c.f. https://github.com/kubernetes-sigs/aws-encryption-provider/issues/61).
Maybe this is an expected behavior… Is it possible to see dial timeouts even when the request context has not been canceled? I am using https://golang.org/pkg/net/http/#RoundTripper with dial timeout 30 seconds and request context is 300µs. If this is possible, can you help understand why this happens?
What version of Go are you using (go version)?
$ go version go version go1.13.6 darwin/amd64
Does this issue reproduce with the latest release?
Yes.
What operating system and processor architecture are you using (go env)?
go env Output
$ go envGO111MODULE=“” GOARCH=“amd64” GOBIN=“” GOCACHE=“/Users/leegyuho/Library/Caches/go-build” GOENV=“/Users/leegyuho/Library/Application Support/go/env” GOEXE=“” GOFLAGS=“” GOHOSTARCH=“amd64” GOHOSTOS=“darwin” GONOPROXY=“" GONOSUMDB="” GOOS=“darwin” GOPATH=“/Users/leegyuho/go” GOPRIVATE=“*” GOPROXY=“https://proxy.golang.org,direct” GOROOT=“/usr/local/go” GOSUMDB=“sum.golang.org” GOTMPDIR=“” GOTOOLDIR=“/usr/local/go/pkg/tool/darwin_amd64” GCCGO=“gccgo” AR=“ar” CC=“clang” CXX=“clang++” CGO_ENABLED=“1” CGO_CFLAGS=“-g -O2” CGO_CPPFLAGS=“” CGO_CXXFLAGS=“-g -O2” CGO_FFLAGS=“-g -O2” CGO_LDFLAGS=“-g -O2” PKG_CONFIG=“pkg-config” GOGCCFLAGS=“-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/y_/_dn293xd5kn7xlg6jvp7jpmxs99pm9/T/go-build128444364=/tmp/go-build -gno-record-gcc-switches -fno-common”
What did you do?
package main
import (
"context"
"fmt"
"net/http"
"net/http/httptest"
"strings"
"time"
)
func main() {
// this happens about 50% of the time, so try it a few times.
for i := 0; i < 10; i++ {
fmt.Println("trying", i)
try()
}
}
func try() {
mux := http.NewServeMux()
mux.HandleFunc("/test", func(w http.ResponseWriter, req *http.Request) {
switch req.Method {
case "GET":
fmt.Fprint(w, `test`)
default:
http.Error(w, "Method Not Allowed", 405)
}
})
ts := httptest.NewServer(mux)
defer ts.Close()
u := ts.URL + "/test"
// default dial timeout is 30-second
// https://golang.org/pkg/net/http/#RoundTripper
cli := http.DefaultClient
timeout := 300 * time.Microsecond
reqs := 20
errc := make(chan error, reqs)
for i := 0; i < reqs; i++ {
go func() {
ctx, cancel := context.WithTimeout(context.TODO(), timeout)
defer cancel()
req, err := http.NewRequest(http.MethodGet, u, nil)
if err != nil {
errc <- err
return
}
_, err = cli.Do(req.WithContext(ctx))
if err != nil {
// can be: ctx.Err() == nil && err == "i/o timeout"
// Q. but how is that possible?
fmt.Println("Do failed with", err, "/ context error:", ctx.Err())
}
errc <- err
}()
}
// "context deadline exceeded" for requests after timeout
exp := `context deadline`
for i := 0; i < reqs; i++ {
err := <-errc
if err == nil {
continue
}
fmt.Println("error:", err)
if !strings.Contains(err.Error(), exp) {
panic(fmt.Sprintf("#%d: expected %q, got %v", i, exp, err))
}
}
}
What did you expect to see?
No panic.
What did you see instead?
trying 0
Do failed with Get http://127.0.0.1:54098/test: dial tcp 127.0.0.1:54098: i/o timeout / context error: <nil>
error: Get http://127.0.0.1:54098/test: dial tcp 127.0.0.1:54098: i/o timeout
Do failed with Get http://127.0.0.1:54098/test: dial tcp 127.0.0.1:54098: i/o timeout / context error: <nil>
Do failed with Get http://127.0.0.1:54098/test: dial tcp 127.0.0.1:54098: i/o timeout / context error: <nil>
Do failed with Get http://127.0.0.1:54098/test: dial tcp 127.0.0.1:54098: i/o timeout / context error: <nil>
Do failed with Get http://127.0.0.1:54098/test: dial tcp 127.0.0.1:54098: i/o timeout / context error: <nil>
Do failed with Get http://127.0.0.1:54098/test: dial tcp 127.0.0.1:54098: i/o timeout / context error: <nil>
Do failed with Get http://127.0.0.1:54098/test: dial tcp 127.0.0.1:54098: i/o timeout / context error: <nil>
Do failed with Get http://127.0.0.1:54098/test: dial tcp 127.0.0.1:54098: i/o timeout / context error: <nil>
Do failed with Get http://127.0.0.1:54098/test: dial tcp 127.0.0.1:54098: i/o timeout / context error: <nil>
Do failed with Get http://127.0.0.1:54098/test: context deadline exceeded / context error: context deadline exceeded
Do failed with Get http://127.0.0.1:54098/test: context deadline exceeded / context error: context deadline exceeded
Do failed with Get http://127.0.0.1:54098/test: context deadline exceeded / context error: context deadline exceeded
Do failed with Get http://127.0.0.1:54098/test: dial tcp 127.0.0.1:54098: i/o timeout / context error: <nil>
panic: #0: expected "context deadline", got Get http://127.0.0.1:54098/test: dial tcp 127.0.0.1:54098: i/o timeout
About this issue
- Original URL
- State: open
- Created 4 years ago
- Reactions: 1
- Comments: 17 (8 by maintainers)
@networkimprov Yes, I should’ve done it in the first place, thanks for asking. The following code prints errors that are not
deadline exceededfor each request done concurrently as explained in the previous comment.(Btw I know this is far from efficient code 😄 )
Thank you for the clarification with clearer wording Liam, and my apologies for taking it the wrong way!
On Mon, Feb 3, 2020 at 3:08 PM Liam notifications@github.com wrote:
Sorry for jumping on this one here but it seems to me this could highlight some kind of inconsistency.
I have a similar issue when performing multiple GET request, each from a separate go routine, each with a context derived from a parent context with a deadline of 2.5 seconds. If the machine is put under heavy load, some of the requests start failing with a
deadline exceedederror, and some withi/o timeoutwhich I find misleading since the Timeout set for the dial is 30 seconds (i am usinghttp.DefaultClientandhttp.DefaultTransport).I think it would be more consistent to have a
deadlineexceeded error in this case since the Dial was interrupted because of the deadline set instead of a timeout on a dial as the error would suggest. Does this make sense?(Sorry if it doesn’t, first time commenting on an issue, please be kind 😄)
In our case this made the difference between investigating a network issue and a performance issue.
(Edited to add comment about impact on investigation)