go: time: Timer reset broken under heavy use since go1.16 timer optimizations added [1.16 backport]
@ianlancetaylor requested issue #47329 to be considered for backport to the next 1.16 minor release.
@gopherbot Please open backport to 1.16.
This bug also exists in 1.16. It can cause programs that use
Timer.Reset
to fail to run a timer when it is ready.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 16 (12 by maintainers)
Commits related to this issue
- [release-branch.go1.16] runtime: don't clear timerModifiedEarliest if adjustTimers is 0 This avoids a race when a new timerModifiedEarlier timer is created by a different goroutine. For #47329 Fixes... — committed to golang/go by ianlancetaylor 3 years ago
- [release-branch.go1.16] runtime: remove adjustTimers counter In CL 336432 we changed adjusttimers so that it no longer cleared timerModifiedEarliest if there were no timersModifiedEarlier timers. Thi... — committed to golang/go by ianlancetaylor 3 years ago
Backport complete.
@lukestoward This issue is closed and as far as we can tell, it is fixed. The fix was in the Go 1.16.7 release. Although the fix may well have introduced the problem you describe, since the patch has been released we should treat this as a new problem. I encourage you to open a new issue, ideally with a case that we can use to reproduce the problem.
See also #47762.
FYI I’m seeing increased CPU usage on various x64 and arm64 linux nodes after rolling out ed8cbbc3ae96aef98d8f9e9e7003a99ed74992b5. I will rollback to 1.16.6 tomorrow to confirm and capture some cpu profiles.
EDIT: I confirmed reverting that single commit brought the cpu usage of my otherwise idle golang program from 100% back to 0%.
Here’s a 60s cpu profile with that commit included:
and back to normal after I reverted it:
Closed per Ian’s comment.
I’m uncertain whether CL 337309 should be cherry picked to the 1.16 branch. The bug fix (CLs 336432, 336689) fixed a real bug that could cause timers to fail to fire. CL 337309 is a performance fix. I’m mildly concerned that CL 337309 will introduce some other unforeseen performance problem. On the 1.16 release branch we needed the bug fix, but I don’t know how hard we want to change performance issues on that branch.