go: os/signal: TestAtomicStop failing on Illumos
The newly revived Illumos builder (run by @jclulow now, on different host/OS probably) is now failing with:
https://build.golang.org/log/47c0329f33a7c7bd68dc98c35021160b03a3c6a5
ok os 0.809s
ok os/exec 0.869s
--- FAIL: TestAtomicStop (2.15s)
signal_test.go:400: iteration 1: output lost signal on tries: 2
signal_test.go:408: iteration 1: lost signal
FAIL
FAIL os/signal 6.493s
ok os/user 0.020s
ok path 0.025s
FAIL
2019/10/22 14:36:24 Failed: exit status 1
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 27 (25 by maintainers)
I was just looking at this. It doesn’t, as far as I can tell, always fail – just most of the time. Having watched the buildlet a bit in the last few days while it was getting going, the load averages in the zone get up pretty high.
I think this is because the
-pflag for job parallelism defaults to the number of CPUs, so the test runner starts that many tests in parallel. Because this is a zone (i.e., a kind of container) you can actually see (and schedule on!) every CPU in the host; there are 48 in all. Though all of the CPUs are visible, the zone is capped in the amount of CPU time it is able to use – in effect, it’s able to use 2 CPUs worth of CPU time per second.If
TestAtomicStopis timing sensitive, it’s possible that when it fails it’s running alongside rather a lot of other tests, and the ~24X over subscription might mean this test isn’t on CPU to run soon enough to hit the timing window it’s expecting. When I run the test suite on a standalone machine which doesn’t have this property (i.e., a small system on my desk where I’m the only user) it seems to pass every time.