go: os/signal: TestAtomicStop failing on Illumos

The newly revived Illumos builder (run by @jclulow now, on different host/OS probably) is now failing with:

https://build.golang.org/log/47c0329f33a7c7bd68dc98c35021160b03a3c6a5

ok  	os	0.809s
ok  	os/exec	0.869s
--- FAIL: TestAtomicStop (2.15s)
    signal_test.go:400: iteration 1: output lost signal on tries: 2
    signal_test.go:408: iteration 1: lost signal
FAIL
FAIL	os/signal	6.493s
ok  	os/user	0.020s
ok  	path	0.025s
FAIL
2019/10/22 14:36:24 Failed: exit status 1

/cc @ianlancetaylor @bcmills

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 27 (25 by maintainers)

Commits related to this issue

Most upvoted comments

I was just looking at this. It doesn’t, as far as I can tell, always fail – just most of the time. Having watched the buildlet a bit in the last few days while it was getting going, the load averages in the zone get up pretty high.

I think this is because the -p flag for job parallelism defaults to the number of CPUs, so the test runner starts that many tests in parallel. Because this is a zone (i.e., a kind of container) you can actually see (and schedule on!) every CPU in the host; there are 48 in all. Though all of the CPUs are visible, the zone is capped in the amount of CPU time it is able to use – in effect, it’s able to use 2 CPUs worth of CPU time per second.

If TestAtomicStop is timing sensitive, it’s possible that when it fails it’s running alongside rather a lot of other tests, and the ~24X over subscription might mean this test isn’t on CPU to run soon enough to hit the timing window it’s expecting. When I run the test suite on a standalone machine which doesn’t have this property (i.e., a small system on my desk where I’m the only user) it seems to pass every time.