go: time: time.Sleep and time.NewTimer: Fix duration parameter error

What version of Go are you using (go version)?

go version go1.7.3 windows/amd64

What operating system and processor architecture are you using (go env)?

set GOARCH=amd64 set GOHOSTARCH=amd64 set GOHOSTOS=windows set GOOS=windows

What did you do?

I wrote a test which is basically a loop consisting of two things: Some busy work for about 500us and a call to time.Sleep(1 * time.Millisecond).

Here’s a link to the test program: https://play.golang.org/p/VYGvIbTPo3

It can’t run on the playground, so you will have to grab it and run it yourself.

What did you expect to see?

  work(~500us) + time.Sleep(1ms): 1.500000ms    ... or slightly more
  work(~500us) + time.After(1ms): 1.500000ms    ... or slightly more

What did you see instead?

  work(~500us) + time.Sleep(1ms): 1.030103ms
  work(~500us) + time.After(1ms): 1.110111ms

Discussion

I believe that there are errors in the doc’s for time.Sleep and time.NewTimer. They both make similar statements regarding the duration.

Sleep pauses the current goroutine for at least the duration d.

NewTimer creates a new Timer that will send the current time on its channel after at least duration d.

If these are correct, then the test loops should take ~1.5ms or longer per pass. But they don’t.

I believe that at least the duration d was an attempt to convey that overhead might result in a longer duration than requested. But there are two other factors which might result in a shortened duration, sometime MUCH shorter. These factors are the resolution and phase of the time source used for these functions.

In Windows, the resolution is normally 1ms for Go programs. I’ll refer to this value as r, short for resolution. I’m not sure what it is on other platforms. If you run the test program it will measure and display it. Please let me know what you find.

I believe that in general the duration can be as short as d-r (rather than d). How short simply depends on what the phase of the time source when you use it. Thus if you ask for a 100ms delay you will end up with something between 99 and 100 (plus overhead). Thus as you use smaller values of d (approaching r) the problem gets worse. By itself, this isn’t too bad. But now the other shoe drops.

In the specific case that you ask for a delay of r (or anything between 1 and r), then the delay can be anything between 0 and r. Actually, for a reason I don’t quite understand, the lowest delay I’ve actually seen is about 50us, not 0. Even so, that is 20 times less than the 1ms requested. This was drastic enough to break some of my code. I changed d from 1ms to 2ms. The resulting delay is between 1 and 2ms (plus overhead).

Can this cause a problem? Generally, in a loop only the first pass is affected, because the loop gets in phase with the timing source. So this might or might not be bad for you. If it is, put a sleep 1ms call just before the loop.

But imagine that you are using time.After to set up a timeout, but it occurs too quickly. Perhaps 20 times too quickly. This can cause intermittent, non-deterministic, bogus timeouts, even on a single-threaded program. Ouch! This is what bit me.

Another gotcha: Using d of 10ms now, but Go Windows switches r from 1ms to 15.6ms. Instead of a delay of 9 - 10ms you now get 0 - 15.6ms.

Possible Solutions

The really bad case when d is between 1 and r could be detected causing d to be changed to r+1. Tacky, in my opinion. Alternatively, vet could look for this case.

Most programming languages I’ve worked in allow this to happen, but they don’t document it incorrectly. This seems like the best solution to me. Simply fix the doc’s to reflect reality.

Perhaps a paragraph at the top describing what the “resolution of the time source” means, what it is for some typical platforms, and an example of a simple routine to determine what yours is. Maybe even a system call to retrieve it.

Then, change the current time.Sleep description from

Sleep pauses the current goroutine for at least the duration d. A negative or zero duration causes Sleep to return immediately.

to something like

Sleep pauses the current goroutine for at least the duration d - 1. A duration less than or equal to the time source resolution can cause Sleep to return quickly. A negative or zero duration causes Sleep to return immediately.

And similar for time.NewTimer’s description.

About this issue

  • Original URL
  • State: open
  • Created 8 years ago
  • Comments: 89 (37 by maintainers)

Commits related to this issue

Most upvoted comments

Given an attempt to sleep for less time than the precision of the clock being used, the only possible option that ensures that we sleep at least that long is to extend the sleep to the precision of the clock. Currently we assume that the clock is sufficiently precise, as is generally true on Unix systems.

Perhaps we could change the Windows code to use QueryPerformanceCounter. That would give us a higher precision clock source in Windows.