go: runtime: hangs in TestGdbBacktrace on linux

2020-02-22T04:31:41-059a5ac/linux-mips64le-mengzhuo

goroutine 23401 [syscall, 11 minutes]:
syscall.Syscall6(0x1475, 0x1, 0x3b6f, 0xc000cad968, 0x1000004, 0x0, 0x0, 0x120153878, 0xc000cad960, 0x120080418)
	/tmp/workdir-host-linux-mipsle-mengzhuo/go/src/syscall/asm_linux_mips64x.s:40 +0x10 fp=0xc000cad910 sp=0xc000cad908 pc=0x1200cc678
os.(*Process).blockUntilWaitable(0xc00001ab70, 0x0, 0x1200103ec, 0x0)
	/tmp/workdir-host-linux-mipsle-mengzhuo/go/src/os/wait_waitid.go:31 +0x88 fp=0xc000cad9f8 sp=0xc000cad910 pc=0x1200e49e8
os.(*Process).wait(0xc00001ab70, 0x1202d17d0, 0x1202d17d8, 0x1202d17c8)
	/tmp/workdir-host-linux-mipsle-mengzhuo/go/src/os/exec_unix.go:22 +0x4c fp=0xc000cada68 sp=0xc000cad9f8 pc=0x1200df43c
os.(*Process).Wait(...)
	/tmp/workdir-host-linux-mipsle-mengzhuo/go/src/os/exec.go:125
os/exec.(*Cmd).Wait(0xc000ad8f20, 0x0, 0x0)
	/tmp/workdir-host-linux-mipsle-mengzhuo/go/src/os/exec/exec.go:502 +0x68 fp=0xc000cadad8 sp=0xc000cada68 pc=0x120153e18
os/exec.(*Cmd).Run(0xc000ad8f20, 0xc00009aff0, 0xc000ad8f20)
	/tmp/workdir-host-linux-mipsle-mengzhuo/go/src/os/exec/exec.go:340 +0x74 fp=0xc000cadaf8 sp=0xc000cadad8 pc=0x12015340c
os/exec.(*Cmd).CombinedOutput(0xc000ad8f20, 0x3, 0xc000cade78, 0xf, 0xf, 0xc000ad8f20)
	/tmp/workdir-host-linux-mipsle-mengzhuo/go/src/os/exec/exec.go:562 +0xbc fp=0xc000cadb20 sp=0xc000cadaf8 pc=0x1201541dc
runtime_test.TestGdbBacktrace(0xc00022a5a0)
	/tmp/workdir-host-linux-mipsle-mengzhuo/go/src/runtime/runtime-gdb_test.go:388 +0x6c4 fp=0xc000cadf80 sp=0xc000cadb20 pc=0x120202e64
testing.tRunner(0xc00022a5a0, 0x1202d2e00)
	/tmp/workdir-host-linux-mipsle-mengzhuo/go/src/testing/testing.go:992 +0xf8 fp=0xc000cadfc8 sp=0xc000cadf80 pc=0x12010a978
runtime.goexit()
	/tmp/workdir-host-linux-mipsle-mengzhuo/go/src/runtime/asm_mips64x.s:646 +0x4 fp=0xc000cadfc8 sp=0xc000cadfc8 pc=0x120084354
created by testing.(*T).Run
	/tmp/workdir-host-linux-mipsle-mengzhuo/go/src/testing/testing.go:1043 +0x378

CC @dr2chase @aclements @mengzhuo

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 24 (15 by maintainers)

Commits related to this issue

Most upvoted comments

Digging around a bit, it looks like signaling a zombie process will not change its exit status, so this is GDB failing to exit when its inferior exits.

Given the “[Inferior 1 (process 1173835) exited normally]” at the end of the GDB output, this is either a bug in GDB where it doesn’t properly exit, or the test is somehow missing the fact that GDB is exiting. I think the “gdb exited with error: signal: killed” indicates that the GDB process was still around to be killed, but I’m not entirely sure what happens if you send a signal to a zombie.

If this is a GDB bug, that’s unfortunate. We could work around it by looking at the GDB output as its running and killing it if it looks complete enough, or by just using a short timeout and accepting correct output even if it timed out.

Since we’ve at least made tangible progress on diagnosing the problem during the 1.18 cycle, I think it would be ok to move this back to the Backlog milestone and/or mark it WaitingForInfo while we wait for another repro.

It’s unfortunate but not terribly surprising for flaky tests not to reproduce as often during the code freeze, because the rate of test runs (especially for fast and/or scalable builders) tends to be much higher during the active development window.

We could implement our own timeout in TestGdbBacktrace so it can fail cleanly and print the output it has so far from GDB.

This turns out not to be specific to the mips64le builder. See also #39228 (occasional failures instead of hangs).

2021-01-23T19:46:06-9897655/linux-amd64-sid 2020-11-02T03:03:16-0387bed/linux-386-softfloat 2020-06-25T12:02:38-334752d/linux-amd64-staticlockranking