go: test: locklinear.go failures with "lockmany: too slow"

From the illumos-amd64-joyent builder, in https://build.golang.org/log/b60463ada3a04bbfdb7f714e54d819176e60a833:

##### ../test
# go run run.go -- locklinear.go
exit status 1
panic: lockmany: too slow: 1000 ops: 22.164624ms; 2000 ops: 433.387975ms

1000 42.726699ms 2000 43.00828ms (1.0X)
1000 13.237367ms 2000 40.902635ms (3.1X)
1000 9.373097ms 2000 39.993182ms (4.3X)
1000 5.415665ms 2000 29.961344ms (5.5X)
1000 22.164624ms 2000 433.387975ms (19.6X)


goroutine 1 [running]:
main.checkLinear(0x4d8124, 0x8, 0x3e8, 0x4df5f0)
	/tmp/workdir-host-illumos-amd64-joyent/go/test/locklinear.go:73 +0x590
main.main()
	/tmp/workdir-host-illumos-amd64-joyent/go/test/locklinear.go:134 +0x79
exit status 2

FAIL	locklinear.go	11.857s

See previously #19276.

About this issue

Original URL
State: closed
Created 5 years ago
Reactions: 1
Comments: 22 (19 by maintainers)

Commits related to this issue

internal/testmath: add two-sample Welch's t-test for performance tests This CL copies code from github.com/aclements/go-moremath/stats and github.com/aclements/go-moremath/mathx for Welch's t-test. S... — committed to golang/go by mknyszek 2 years ago
sync: move lock linearity test and treat it like a performance test This change moves test/locklinear.go into the sync package tests, and adds a bit of infrastructure since there are other linearity-... — committed to golang/go by mknyszek 2 years ago

Most upvoted comments

It occurs to me that I never stated this plainly: I don’t think this failure represents a real regression. I think it’s very clear from all my runs that the invariant it intends to check is generally upheld, and I also don’t see some significant regression here in terms of how often it’s upheld. What I understand this issue to be about is a flaky test and trying to make it less flaky.

mknyszek on Jun 1, 2022

I think the test is useful, i.e. verifying that the asymptotic behavior of contended Mutexes. Currently, when it does fail, it has to fail the check at least 5 times in a row, which at face value seems like a high bar.

Looking more closely at the test it already has a few workarounds for flakiness that appear to have been added in the few months after it was landed. There’s an inherent flakiness to this kind of test, especially since it calls runtime.Gosched.

One thing that jumps out at me, though, is that the test doesn’t ever block to clean up goroutines after each attempt, so theoretically it’s possible that one failure could exacerbate subsequent ones if the problem is transient oversubscription (especially given the runtime.Gosched calls; the test creates 1000 or 2000 goroutines each time it tries). 5 is still a lot of consecutive failures for that to be the only culprit though.

Another thing that seems kind of suspicious is that lockone generates contention on locks[1] and locks[offset], but lockmany generates contention on locks[1] and locks[0]. offset is 251 which matches the size of the runtime’s semaphore table. This table is indexed by the lock’s 8-byte word address. Based on this information, neither locks[offset] nor locks[0] collide with locks[1], which is likely intentional. However, locks[0] and locks[1] are definitely in the same cache line, whereas locks[1] and locks[offset] are not, and given that offset is prime, it’s very unlikely that there’s some kind of false sharing going on. Meanwhile, locks[0] and locks[1] absolutely do share a cache line so the lock fast path will contend in cache coherency. Note that this test uses RWMutex values which are 24 bytes in size. (This might be an oversight? The test doesn’t use the read-lock functionality at all, meanwhile a Mutex is exactly 8 bytes in size.)

Maybe it’s possible that on these slow platforms there’s some kind of cache coherency breakdown? It’s hard for me to tell whether the 0 vs. offset is intentional. There aren’t any comments in the review (https://go-review.googlesource.com/c/go/+/37103) pointing out the difference, the commit message doesn’t discuss it, and the test file itself doesn’t have any explanation.

I’m inclined to do two things to make this test more robust:

Block on cleanup after obtaining the time-to-run data. Maybe also exclude the setup time from the timing call (just to narrow the scope of the test a bit more).
Use offset instead of 0 in lockmany.

And then we wait to see if it fails again. Does this seem reasonable? If we don’t want to perturb things too much, we can just do (2).

mknyszek on May 27, 2022

I did a large number of all.bash runs on FreeBSD for other purposes, but got a fair number of locklinear failures in the process.

On freebsd-amd64-12_2 10 of 898 runs failed in locklinear (1.1%). On freebsd-amd64-12_3, 6 of 912 runs failed in locklinear (0.66%). Those are fairly different rates given the number of runs, which I find a little surprising, but I think give us a sense of the ambient failure rate of this test when it’s run in the context of everything else all.bash does.

aclements on Jan 14, 2022