go: os/signal: TestTerminalSignal failures on Darwin builders

2020-02-19T20:37:54-b15fd6b/darwin-amd64-10_11 2019-10-31T21:47:08-ef03c44/darwin-amd64-race

--- FAIL: TestTerminalSignal (10.01s)
    signal_cgo_test.go:145: "PS1='prompt> '\r\n"
    signal_cgo_test.go:145: "bash-3.2$ PS1='prompt> '\r\n"
    signal_cgo_test.go:145: "prompt> GO_TEST_TERMINAL_SIGNALS=1 /var/folders/dx/k53rs1s93538b4x20g46cj_w00\r<GNALS=1 /var/folders/dx/k53rs1s93538b4x20g46cj_w000                         \b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b0gn/T/workdir-host-darwin\r<3rs1s93538b4x20g46cj_w0000gn/T/workdir-host-darwin-                         \b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b10_11/tmp/go-build2962210\r<gn/T/workdir-host-darwin-10_11/tmp/go-build29622109                         \b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b3/b143/signal.test -test.\r<0_11/tmp/go-build296221093/b143/signal.test -test.r                         \b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\bun=TestTerminalSignal\r\n"
    signal_cgo_test.go:145: "test program entering read\r\n"
    signal_cgo_test.go:145: "^Z\r\n"
    signal_cgo_test.go:145: "[1]+  Stopped                 GO_TEST_TERMINAL_SIGNALS=1 /var/folders/dx/k53rs1s93538b4x20g46cj_w0000gn/T/workdir-host-darwin-10_11/tmp/go-build296221093/b143/signal.test -test.run=TestTerminalSignal\r\n"
    signal_cgo_test.go:145: "prompt> fg\r\n"
    signal_cgo_test.go:145: "GO_TEST_TERMINAL_SIGNALS=1 /var/folders/dx/k53rs1s93538b4x20g46cj_w0000gn/T/workdir-host-darwin-10_11/tmp/go-build296221093/b143/signal.test -test.run=TestTerminalSignal\r\n"
    signal_cgo_test.go:145: "\r\n"
    signal_cgo_test.go:145: "[1]+  Stopped                 GO_TEST_TERMINAL_SIGNALS=1 /var/folders/dx/k53rs1s93538b4x20g46cj_w0000gn/T/workdir-host-darwin-10_11/tmp/go-build296221093/b143/signal.test -test.run=TestTerminalSignal\r\n"
    signal_cgo_test.go:145: "prompt> \r\n"
    signal_cgo_test.go:145: "prompt> exit $?\r\n"
    signal_cgo_test.go:145: "exit\r\n"
    signal_cgo_test.go:145: "There are stopped jobs.\r\n"
    signal_cgo_test.go:128: "prompt> "
    signal_cgo_test.go:237: subprogram failed: signal: killed
FAIL
FAIL	os/signal	15.203s

CC @ianlancetaylor @cherrymui @bradfitz

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 15 (13 by maintainers)

Commits related to this issue

Most upvoted comments

Now that we know where to look:

  • Changing foreground process group triggers a wakeup of readers
  • Which I believe gets us here. Note that we isbackground is never true except when the process is stopped, but I suspect that the stop itself doesn’t wake the reader, and this check runs before a check for stopped state (which is probably in a caller). Hence this still runs in stopped state.

I finally managed to recreate this test without bash. It should be more reliable now, though I’m not sure how valuable this test is now that we universally retry on EINTR rather than just on Darwin.

For posterity, a read from a PTY returns EINTR on Darwin if:

  1. The PTY foreground process group is in read of the PTY when it is stopped by write of ^Z to the PTY (SIGTSTP).
  2. The parent process group takes over as foreground process group of the PTY.
  3. The parent process group makes the child foreground again prior to continuing the child.
  4. The child is continued with SIGCONT.

i.e., the foreground process group of the PTY must be changed during the read. I didn’t actually double check if stopping the child is necessary, or if changing the foreground process group during read without a stop is sufficient.

I can reproduce this on a gomote (darwin-amd64-11-aws, specifically) simply by running the test many times: gomote run $(gomote-instance) ./go/bin/go test -run=TestTerminalSignal -v -count=100 -failfast os/signal.

IIUC, we are failing the race described here, and need to make that wait more robust.

That’s too bad, but thanks for taking a look.