go: net, os: Set*Deadline() expiration error should be unique, as .Timeout() is true for keepalive, etc

This is a bug. A keepalive error is a connection failure, not a deadline event.

net Docs say:

A zero value for [deadline] means I/O operations will not time out.

A zero value for [read deadline] means Read will not time out.

KeepAlive specifies the keep-alive period for an active network connection. If zero, keep-alives are enabled if supported by the protocol and operating system. Network protocols or operating systems that do not support keep-alives ignore this field. If negative, keep-alives are disabled.

For a TLS connection that’s been severed, Conn.Read() returns a net.Error with .Timeout()==true due to KeepAlive failure. (Go 1.12, Linux, amd64)

The Error should give .Timeout()==false to comply with the docs. Code that checks for .Timeout()==true would generally assume that an explicit deadline had passed.

The .Error() string should mention keepalive. It’s currently: “read tcp n.n.n.n:p->n.n.n.n:p: read: connection timed out”

Related: net.Dialer.KeepAlive gets 15*time.Second if unset/zero. This isn’t documented in package net.

cc @bradfitz

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 1
  • Comments: 59 (30 by maintainers)

Commits related to this issue

Most upvoted comments

ETIMEDOUT.Timeout() has been true for many releases. It seems quite clear we cannot redefine that now.

I was sympathetic to CL 188657 until I searched the Linux kernel sources for ETIMEDOUT, as Filippo mentioned above. It gets returned from an enormous number of places, most of them having nothing to do with TCP, and many of them having nothing to do with networking. So translating ETIMEDOUT to ErrKeepAlive is clearly incorrect in the overwhelming majority of places where it appears in the kernel source. I don’t believe we can be more precise here without a tremendous amount of system-specific code.

Keepalives have been in previous releases and turned into errors with err.Timeout() == true. I realize they are a little more common now, but even so it does not seem terrible to continue the behavior of the past 13 releases into a 14th.

I suggest we leave this alone for Go 1.13.

For Go 1.14 maybe the answer is to change our net.Conn implementations to return an error that Is(context.DeadlineExceeded), to allow a more precise check than Timeout.

Go 1.13 also enables Keep-Alives by default on the net.Listen side (1abf3aa55bb8b346bb1575ac8db5022f215df65a), so this might be worth fixing now, before exposing a new wave of applications to it.

Above @rsc suggests that we change net.Conn.Read/Write to return context.DeadlineExceeded if they time out due to exceeding a deadline. That will change the error when printed from i/o timeout to context deadline exceeded. I don’t think that would be an ideal change, as I think it is confusing to refer to a context when no context is involved.

I suggest that we instead add net.ErrDeadlineExceeded and os.ErrDeadlineExceeded and return those.

The text added to the release notes should also live in the docs, and be referenced by net.Dial() & .Listen(), net.Error.Timeout(), and net.Conn.Set*Deadline().

The docs are currently explicit that only deadlines cause “time out”. See quote in issue text.

In my opinion we should cover the issue in the release notes. I think many more people will be helped by turning on keep-alive by default than will be hurt by the confusion around the Timeout method.

I feel like for this change to be complete, we should make syscall.Errno.Timeout() always return false, and make internal/poll.ErrTimeout the only error with Timeout true. That way the method can be used reliably to detect when deadlines are exceeded, which is the only practical use case for it that I can think of. (Said another way, you might want to handle differently deadline errors, as they can be reset, but AFAIK no others.)

By the way, I noticed that keep-alive errors are also marked Temporary, which is definitely not the case, and we should fix that. (Although I admit I don’t entirely grok what Temporary means, as internal/poll.ErrTimeout is also Temporary but surely retrying the operation without resetting the deadline would not work, and crypto/tls.timeoutError has Temporary even if a number of permanent reasons can cause a dial to timeout.)

https://github.com/golang/go/blob/b41eee4c8a2fe692c1d9fb46972b9047b5dc02b7/src/syscall/syscall_unix.go#L138-L144

@costela, your keepalive patch will trigger this bug…

@gopherbot please open backport 1.12

Since 1.11, net.Dialer.KeepAlive gets 15*time.Second if unset/zero. This isn’t documented in package net.

It is documented that we enable keep-alives if that field is unset/zero. The choice of not specifying 15s in the docs was deliberate and is actually documented in the commit message: https://github.com/golang/go/commit/5bd7e9c54f946eec95d32762e7e9e1222504bfc1