go: net, os: Set*Deadline() expiration error should be unique, as .Timeout() is true for keepalive, etc
This is a bug. A keepalive error is a connection failure, not a deadline event.
net Docs say:
A zero value for [deadline] means I/O operations will not time out.
A zero value for [read deadline] means Read will not time out.
KeepAlive specifies the keep-alive period for an active network connection. If zero, keep-alives are enabled if supported by the protocol and operating system. Network protocols or operating systems that do not support keep-alives ignore this field. If negative, keep-alives are disabled.
For a TLS connection that’s been severed, Conn.Read() returns a net.Error with .Timeout()==true due to KeepAlive failure. (Go 1.12, Linux, amd64)
The Error should give .Timeout()==false to comply with the docs. Code that checks for .Timeout()==true would generally assume that an explicit deadline had passed.
The .Error() string should mention keepalive. It’s currently:
“read tcp n.n.n.n:p->n.n.n.n:p: read: connection timed out”
Related: net.Dialer.KeepAlive gets 15*time.Second if unset/zero. This isn’t documented in package net.
cc @bradfitz
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 1
- Comments: 59 (30 by maintainers)
Commits related to this issue
- doc/go1.13: mention confusion between keep-alive timeout and deadline Updates #31449 Change-Id: I4d7075b20cd8171bc792e40b388f4215264a3317 Reviewed-on: https://go-review.googlesource.com/c/go/+/18881... — committed to golang/go by ianlancetaylor 5 years ago
- net: document that a keep-alive failure also returns a timeout Updates #31449 Change-Id: I76490c5e83eb2f7ba529b387a57ba088428aece5 Reviewed-on: https://go-review.googlesource.com/c/go/+/189757 Run-T... — committed to golang/go by ianlancetaylor 5 years ago
- os, net: define and use os.ErrDeadlineExceeded If an I/O operation fails because a deadline was exceeded, return os.ErrDeadlineExceeded. We used to return poll.ErrTimeout, an internal error, and told... — committed to xujianhai666/go-1 by ianlancetaylor 4 years ago
- net: consistently document deadline handling After CL 228645 some mentions of the Deadline methods referred to the Timeout method, and some to os.ErrDeadlineExceeded. Stop referring to the Timeout me... — committed to golang/go by ianlancetaylor 4 years ago
ETIMEDOUT.Timeout() has been true for many releases. It seems quite clear we cannot redefine that now.
I was sympathetic to CL 188657 until I searched the Linux kernel sources for ETIMEDOUT, as Filippo mentioned above. It gets returned from an enormous number of places, most of them having nothing to do with TCP, and many of them having nothing to do with networking. So translating ETIMEDOUT to ErrKeepAlive is clearly incorrect in the overwhelming majority of places where it appears in the kernel source. I don’t believe we can be more precise here without a tremendous amount of system-specific code.
Keepalives have been in previous releases and turned into errors with err.Timeout() == true. I realize they are a little more common now, but even so it does not seem terrible to continue the behavior of the past 13 releases into a 14th.
I suggest we leave this alone for Go 1.13.
For Go 1.14 maybe the answer is to change our net.Conn implementations to return an error that Is(context.DeadlineExceeded), to allow a more precise check than Timeout.
Go 1.13 also enables Keep-Alives by default on the net.Listen side (1abf3aa55bb8b346bb1575ac8db5022f215df65a), so this might be worth fixing now, before exposing a new wave of applications to it.
Above @rsc suggests that we change
net.Conn.Read/Writeto returncontext.DeadlineExceededif they time out due to exceeding a deadline. That will change the error when printed fromi/o timeouttocontext deadline exceeded. I don’t think that would be an ideal change, as I think it is confusing to refer to a context when no context is involved.I suggest that we instead add
net.ErrDeadlineExceededandos.ErrDeadlineExceededand return those.The text added to the release notes should also live in the docs, and be referenced by net.Dial() & .Listen(), net.Error.Timeout(), and net.Conn.Set*Deadline().
The docs are currently explicit that only deadlines cause “time out”. See quote in issue text.
In my opinion we should cover the issue in the release notes. I think many more people will be helped by turning on keep-alive by default than will be hurt by the confusion around the
Timeoutmethod.I feel like for this change to be complete, we should make
syscall.Errno.Timeout()always returnfalse, and makeinternal/poll.ErrTimeoutthe only error withTimeouttrue. That way the method can be used reliably to detect when deadlines are exceeded, which is the only practical use case for it that I can think of. (Said another way, you might want to handle differently deadline errors, as they can be reset, but AFAIK no others.)By the way, I noticed that keep-alive errors are also marked
Temporary, which is definitely not the case, and we should fix that. (Although I admit I don’t entirely grok whatTemporarymeans, asinternal/poll.ErrTimeoutis alsoTemporarybut surely retrying the operation without resetting the deadline would not work, andcrypto/tls.timeoutErrorhasTemporaryeven if a number of permanent reasons can cause a dial to timeout.)https://github.com/golang/go/blob/b41eee4c8a2fe692c1d9fb46972b9047b5dc02b7/src/syscall/syscall_unix.go#L138-L144
@costela, your keepalive patch will trigger this bug…
@gopherbot please open backport 1.12
It is documented that we enable keep-alives if that field is unset/zero. The choice of not specifying 15s in the docs was deliberate and is actually documented in the commit message: https://github.com/golang/go/commit/5bd7e9c54f946eec95d32762e7e9e1222504bfc1