go: runtime: Timer buckets may get "stuck" for long periods of time after Windows 8/10 systems wake from sleep
What version of Go are you using (go version
)?
PS C:\> go version go version go1.12.4 windows/amd64
Does this issue reproduce with the latest release?
Yes, and also with go 1.11
What operating system and processor architecture are you using (go env
)?
go env
Output
PS C:\> go env set GOARCH=amd64 set GOBIN= set GOCACHE=C:\Users\jordan\AppData\Local\go-build set GOEXE=.exe set GOFLAGS= set GOHOSTARCH=amd64 set GOHOSTOS=windows set GOOS=windows set GOPATH=c:\gopath set GOPROXY= set GORACE= set GOROOT=C:\projects\go1.12.4.windows-amd64\go set GOTMPDIR= set GOTOOLDIR=C:\projects\go1.12.4.windows-amd64\go\pkg\tool\windows_amd64 set GCCGO=gccgo set CC=gcc set CXX=g++ set CGO_ENABLED=1 set GOMOD= set CGO_CFLAGS=-g -O2 set CGO_CPPFLAGS= set CGO_CXXFLAGS=-g -O2 set CGO_FFLAGS=-g -O2 set CGO_LDFLAGS=-g -O2 set PKG_CONFIG=pkg-config set GOGCCFLAGS=-m64 -mthreads -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=C:\Users\jordan\AppData\Local\Temp\go-build618130038=/tmp/go-build -gno-record-gcc-switches
What did you do?
I ran this program on a Windows 10 amd64 VM: https://play.golang.org/p/Jyqki-iIF54. I put the Windows VM to sleep(right click Start button->“Shut down or sign out”->“Sleep”) right after seeing the first “ShortInterval 0 fired at…” print at 15:19:09. I then woke the machine up about 8 minutes later around 15:27:28 and the program continued. I later killed the program with ctrl+c around 15:34:30. Full program output copied below:
PS C:\> .\timertest.exe doStuffLongInterval 2 starting doStuffLongInterval 0 starting doStuffShortInterval 0 starting doStuffLongInterval 1 starting ShortInterval 0 fired at 2019-04-16 15:19:09.159805 -0700 PDT m=+30.006215401 after 30 seconds LongInterval 0 fired at 2019-04-16 15:27:28.4460785 -0700 PDT m=+529.330931901 after 5 minutes ShortInterval 0 fired at 2019-04-16 15:27:28.4468785 -0700 PDT m=+529.331733901 after 30 seconds LongInterval 2 fired at 2019-04-16 15:31:28.2059465 -0700 PDT m=+769.327050901 after 5 minutes ShortInterval 0 fired at 2019-04-16 15:31:28.2059465 -0700 PDT m=+769.327050901 after 30 seconds LongInterval 1 fired at 2019-04-16 15:31:28.2059465 -0700 PDT m=+769.327050901 after 5 minutes ShortInterval 0 fired at 2019-04-16 15:31:58.205199 -0700 PDT m=+799.330143301 after 30 seconds ShortInterval 0 fired at 2019-04-16 15:32:28.2045636 -0700 PDT m=+829.332021501 after 30 seconds LongInterval 0 fired at 2019-04-16 15:32:28.2053959 -0700 PDT m=+829.332853801 after 5 minutes ShortInterval 0 fired at 2019-04-16 15:32:58.2042094 -0700 PDT m=+859.333011301 after 30 seconds ShortInterval 0 fired at 2019-04-16 15:33:28.2022148 -0700 PDT m=+889.334911201 after 30 seconds ShortInterval 0 fired at 2019-04-16 15:33:58.1962279 -0700 PDT m=+919.335067001 after 30 seconds ShortInterval 0 fired at 2019-04-16 15:34:28.1926601 -0700 PDT m=+949.335907901 after 30 seconds
What did you expect to see?
I would expect that while the system is awake the ShortInterval function would regularly print approximately every 30 seconds. I would also expect the LogInterval prints to be consistently placed in time rather than “LongInterval 0 …” seemingly being staggered and firing at different times than “LongInterval 1 …” and “LongInterval 2 …” which started at approximately the same time.
What did you see instead?
After waking up the system the “ShortInterval 0 fired” message was printed once and then was not printed again for 4 minutes, meaning its 30 second timer fired about 3.5 minutes too late.
I believe this is due to the design of the timerBucket mechanism within the Go runtime. Each timerBucket is sorted such that the timer with the lowest firing time(the “when” field on the runtime.timer struct) is always at the front and the goroutines that handle firing of timers sleep until they are:
-
Awakened from sleep at the time to fire the timer
-
Awakened by a new timer being placed at the front of the timerBucket which causes the sleep time to be recalculated.
On Windows, the current time is retrieved using nanotime() which grabs the time from the KUSER_SHARED_DATA structure and which continues “ticking” even while the system is asleep. I’ve tested and observed this behavior both in a VMWare VM running Windows 10 and on a laptop running Windows 10. Relevant code here: https://golang.org/src/runtime/sys_windows_amd64.s?h=nanotime#L468
The timed wakeup, on the other hand, is implemented in semasleep() using WaitForSingleObject which the documentation states does not count time spent in a “low-power state”(i.e. asleep) on Windows 8 and 10. See here: https://golang.org/src/runtime/os_windows.go?h=semasleep#L600 and here: https://docs.microsoft.com/en-us/windows/desktop/api/synchapi/nf-synchapi-waitforsingleobject
What seems to happen here is that when a Windows 8 or 10 machine goes to sleep, the actual time returned by nanotime() can potentially surpass the “when” time of the timer at the front of the queue while the machine is asleep. Newly added timers will thus have a later “when” time and so will get queued behind the element at the front of the timerBucket and since WaitForSingleObject does not count this sleep time the wakeup will not come until the machine has actually been awake for the total amount of time originally specified for the timer, potentially making the new timers added to the timerBucket late by however long the original timer was for. All of this is invisible to a normal go program so from the user or programmer’s perspective, some timers simply stop firing.
In my testing I’ve noticed that this happens more often when machines have a low number of cores, due to the lower number of timerBuckets so most of my testing was done on a 2-core VM. That said, it does reproduce on real hardware if the number of “long interval” goroutines is increased.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 27 (9 by maintainers)
Commits related to this issue
- runtime: monitor for suspend/resume to kick timeouts Starting in Windows 8, the wait functions don't take into account suspend time, even though the monotonic counters do. This results in timer bucke... — committed to tomocy/go by zx2c4 5 years ago
- runtime: monitor for suspend/resume to kick timeouts Starting in Windows 8, the wait functions don't take into account suspend time, even though the monotonic counters do. This results in timer bucke... — committed to t4n6a1ka/go by zx2c4 5 years ago
- runtime: iterate ms via allm linked list to avoid race It's pointless to reach all ms via allgs, and doing so introduces a race, since the m member of a g can change underneath it. Instead iterate di... — committed to golang/go by zx2c4 5 years ago
- [release-branch.go1.13] runtime: monitor for suspend/resume to kick timeouts Starting in Windows 8, the wait functions don't take into account suspend time, even though the monotonic counters do. Thi... — committed to golang/go by zx2c4 5 years ago
- [release-branch.go1.12] runtime: monitor for suspend/resume to kick timeouts Starting in Windows 8, the wait functions don't take into account suspend time, even though the monotonic counters do. Thi... — committed to golang/go by zx2c4 5 years ago
See #35482