go: test: locklinear.go failures with "lockmany: too slow"
From the illumos-amd64-joyent
builder, in https://build.golang.org/log/b60463ada3a04bbfdb7f714e54d819176e60a833:
##### ../test
# go run run.go -- locklinear.go
exit status 1
panic: lockmany: too slow: 1000 ops: 22.164624ms; 2000 ops: 433.387975ms
1000 42.726699ms 2000 43.00828ms (1.0X)
1000 13.237367ms 2000 40.902635ms (3.1X)
1000 9.373097ms 2000 39.993182ms (4.3X)
1000 5.415665ms 2000 29.961344ms (5.5X)
1000 22.164624ms 2000 433.387975ms (19.6X)
goroutine 1 [running]:
main.checkLinear(0x4d8124, 0x8, 0x3e8, 0x4df5f0)
/tmp/workdir-host-illumos-amd64-joyent/go/test/locklinear.go:73 +0x590
main.main()
/tmp/workdir-host-illumos-amd64-joyent/go/test/locklinear.go:134 +0x79
exit status 2
FAIL locklinear.go 11.857s
See previously #19276.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 1
- Comments: 22 (19 by maintainers)
Commits related to this issue
- internal/testmath: add two-sample Welch's t-test for performance tests This CL copies code from github.com/aclements/go-moremath/stats and github.com/aclements/go-moremath/mathx for Welch's t-test. S... — committed to golang/go by mknyszek 2 years ago
- sync: move lock linearity test and treat it like a performance test This change moves test/locklinear.go into the sync package tests, and adds a bit of infrastructure since there are other linearity-... — committed to golang/go by mknyszek 2 years ago
It occurs to me that I never stated this plainly: I don’t think this failure represents a real regression. I think it’s very clear from all my runs that the invariant it intends to check is generally upheld, and I also don’t see some significant regression here in terms of how often it’s upheld. What I understand this issue to be about is a flaky test and trying to make it less flaky.
I think the test is useful, i.e. verifying that the asymptotic behavior of contended
Mutex
es. Currently, when it does fail, it has to fail the check at least 5 times in a row, which at face value seems like a high bar.Looking more closely at the test it already has a few workarounds for flakiness that appear to have been added in the few months after it was landed. There’s an inherent flakiness to this kind of test, especially since it calls
runtime.Gosched
.One thing that jumps out at me, though, is that the test doesn’t ever block to clean up goroutines after each attempt, so theoretically it’s possible that one failure could exacerbate subsequent ones if the problem is transient oversubscription (especially given the
runtime.Gosched
calls; the test creates 1000 or 2000 goroutines each time it tries). 5 is still a lot of consecutive failures for that to be the only culprit though.Another thing that seems kind of suspicious is that
lockone
generates contention onlocks[1]
andlocks[offset]
, butlockmany
generates contention onlocks[1]
andlocks[0]
.offset
is 251 which matches the size of the runtime’s semaphore table. This table is indexed by the lock’s 8-byte word address. Based on this information, neitherlocks[offset]
norlocks[0]
collide withlocks[1]
, which is likely intentional. However,locks[0]
andlocks[1]
are definitely in the same cache line, whereaslocks[1]
andlocks[offset]
are not, and given thatoffset
is prime, it’s very unlikely that there’s some kind of false sharing going on. Meanwhile,locks[0]
andlocks[1]
absolutely do share a cache line so the lock fast path will contend in cache coherency. Note that this test usesRWMutex
values which are 24 bytes in size. (This might be an oversight? The test doesn’t use the read-lock functionality at all, meanwhile aMutex
is exactly 8 bytes in size.)Maybe it’s possible that on these slow platforms there’s some kind of cache coherency breakdown? It’s hard for me to tell whether the
0
vs.offset
is intentional. There aren’t any comments in the review (https://go-review.googlesource.com/c/go/+/37103) pointing out the difference, the commit message doesn’t discuss it, and the test file itself doesn’t have any explanation.I’m inclined to do two things to make this test more robust:
offset
instead of0
inlockmany
.And then we wait to see if it fails again. Does this seem reasonable? If we don’t want to perturb things too much, we can just do (2).
I did a large number of all.bash runs on FreeBSD for other purposes, but got a fair number of locklinear failures in the process.
On freebsd-amd64-12_2 10 of 898 runs failed in locklinear (1.1%). On freebsd-amd64-12_3, 6 of 912 runs failed in locklinear (0.66%). Those are fairly different rates given the number of runs, which I find a little surprising, but I think give us a sense of the ambient failure rate of this test when it’s run in the context of everything else all.bash does.