go: os: (*Process).Wait sometimes hangs on netbsd

greplogs --dashboard -md -l -e 'panic: test timed out.*\n\n(?:goroutine .*:\n(?:.+\n\t.+\n)+\n)*goroutine \d+ \[syscall, \d+ minutes\]:\n(?:.+\n\t.+\n)*os\.\(\*Process\)\.Wait(?:.*\n)+FAIL\s+cmd/link'

2021-12-12T06:14:07-9c6e8f6/netbsd-386-9_0-n2

goroutine 29 [syscall, 2 minutes]:
syscall.Syscall6(0x21a8, 0x892ee8c, 0x0, 0x8a2c1e0, 0x0, 0x0, 0x0)
	/tmp/workdir/go/src/syscall/asm_unix_386.s:43 +0x5 fp=0x892ee38 sp=0x892ee34 pc=0x80b9605
syscall.wait4(0x21a8, 0x892ee8c, 0x0, 0x8a2c1e0)
	/tmp/workdir/go/src/syscall/zsyscall_netbsd_386.go:34 +0x5b fp=0x892ee70 sp=0x892ee38 pc=0x80b737b
syscall.Wait4(0x21a8, 0x892eeb0, 0x0, 0x8a2c1e0)
	/tmp/workdir/go/src/syscall/syscall_bsd.go:144 +0x3b fp=0x892ee94 sp=0x892ee70 pc=0x80b558b
os.(*Process).wait(0x8a04660)
	/tmp/workdir/go/src/os/exec_unix.go:43 +0x82 fp=0x892eec8 sp=0x892ee94 pc=0x80de982
os.(*Process).Wait(...)
	/tmp/workdir/go/src/os/exec.go:132
os/exec.(*Cmd).Wait(0x8a18fd0)
	/tmp/workdir/go/src/os/exec/exec.go:507 +0x4d fp=0x892ef0c sp=0x892eec8 pc=0x816b07d
os/exec.(*Cmd).Run(0x8a18fd0)
	/tmp/workdir/go/src/os/exec/exec.go:341 +0x43 fp=0x892ef1c sp=0x892ef0c pc=0x816a463
os/exec.(*Cmd).CombinedOutput(0x8a18fd0)
	/tmp/workdir/go/src/os/exec/exec.go:567 +0x89 fp=0x892ef30 sp=0x892ef1c pc=0x816b549
cmd/link.TestContentAddressableSymbols(0x89290e0)
	/tmp/workdir/go/src/cmd/link/link_test.go:879 +0x136 fp=0x892ef9c sp=0x892ef30 pc=0x83824b6
testing.tRunner(0x89290e0, 0x842c054)
	/tmp/workdir/go/src/testing/testing.go:1410 +0x10d fp=0x892efe4 sp=0x892ef9c pc=0x813d19d
testing.(*T).Run.func1()
	/tmp/workdir/go/src/testing/testing.go:1457 +0x28 fp=0x892eff0 sp=0x892efe4 pc=0x813df78
runtime.goexit()
	/tmp/workdir/go/src/runtime/asm_386.s:1311 +0x1 fp=0x892eff4 sp=0x892eff0 pc=0x80ab211
created by testing.(*T).Run
	/tmp/workdir/go/src/testing/testing.go:1457 +0x36e

2021-10-29T18:34:24-903f313/netbsd-amd64-9_0 2021-10-01T15:59:38-e5ad363/netbsd-arm-bsiegert

goroutine 28 [syscall, 27 minutes]:
syscall.Syscall6(0x1c1, 0xd1f, 0xa09db4, 0x0, 0x9b27e0, 0x0, 0x0)
	/var/gobuilder/buildlet/go/src/syscall/asm_netbsd_arm.s:39 +0x8 fp=0xa09d5c sp=0xa09d58 pc=0x8d3f8
syscall.wait4(0xd1f, 0xa09db4, 0x0, 0x9b27e0)
	/var/gobuilder/buildlet/go/src/syscall/zsyscall_netbsd_arm.go:35 +0x54 fp=0xa09d94 sp=0xa09d5c pc=0x8a694
syscall.Wait4(0xd1f, 0xa09dd8, 0x0, 0x9b27e0)
	/var/gobuilder/buildlet/go/src/syscall/syscall_bsd.go:145 +0x3c fp=0xa09db8 sp=0xa09d94 pc=0x88c58
os.(*Process).wait(0x983290)
	/var/gobuilder/buildlet/go/src/os/exec_unix.go:44 +0x100 fp=0xa09df0 sp=0xa09db8 pc=0xb4f1c
os.(*Process).Wait(...)
	/var/gobuilder/buildlet/go/src/os/exec.go:132
os/exec.(*Cmd).Wait(0x98cc60)
	/var/gobuilder/buildlet/go/src/os/exec/exec.go:507 +0x50 fp=0xa09e2c sp=0xa09df0 pc=0x1482d0
os/exec.(*Cmd).Run(0x98cc60)
	/var/gobuilder/buildlet/go/src/os/exec/exec.go:341 +0x48 fp=0xa09e3c sp=0xa09e2c pc=0x147810
os/exec.(*Cmd).CombinedOutput(0x98cc60)
	/var/gobuilder/buildlet/go/src/os/exec/exec.go:567 +0x98 fp=0xa09e50 sp=0xa09e3c pc=0x14882c
cmd/link.TestIssue33979.func2({0x983200, 0x21}, {0x9ea0a0, 0x9, 0x9})
	/var/gobuilder/buildlet/go/src/cmd/link/link_test.go:199 +0x90 fp=0xa09ea8 sp=0xa09e50 pc=0x368e14
cmd/link.TestIssue33979.func3({0x9ea0a0, 0x9, 0x9})
	/var/gobuilder/buildlet/go/src/cmd/link/link_test.go:206 +0x60 fp=0xa09ecc sp=0xa09ea8 pc=0x368d5c
cmd/link.TestIssue33979(0x8834a0)
	/var/gobuilder/buildlet/go/src/cmd/link/link_test.go:239 +0x3bc fp=0xa09f98 sp=0xa09ecc pc=0x368790
testing.tRunner(0x8834a0, 0x41871c)
	/var/gobuilder/buildlet/go/src/testing/testing.go:1389 +0x118 fp=0xa09fe0 sp=0xa09f98 pc=0x1195d4
testing.(*T).Run.func1()
	/var/gobuilder/buildlet/go/src/testing/testing.go:1436 +0x30 fp=0xa09fec sp=0xa09fe0 pc=0x11a448
runtime.goexit()
	/var/gobuilder/buildlet/go/src/runtime/asm_arm.s:824 +0x4 fp=0xa09fec sp=0xa09fec pc=0x7d028
created by testing.(*T).Run
	/var/gobuilder/buildlet/go/src/testing/testing.go:1436 +0x3a0

2021-09-21T20:39:31-48cf96c/netbsd-arm-bsiegert 2021-09-14T14:27:57-181e8cd/netbsd-arm-bsiegert 2021-04-29T15:47:16-12eaefe/freebsd-amd64-11_4 2021-04-28T13:49:52-4fe324d/netbsd-386-9_0 2021-03-05T02:30:31-b62da08/netbsd-386-9_0 2021-02-19T00:40:05-95a44d2/netbsd-arm64-bsiegert 2019-09-04T21:52:18-aae0b5b/linux-ppc64le-power9osu

#44801 may be closely related.

Note that many of this failures are on architectures not believed to be affected by #49209.

@bsiegert, @coypoop: any ideas?

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 1
  • Comments: 55 (45 by maintainers)

Commits related to this issue

Most upvoted comments

Instead of updating the builder VMs to 9.3, should we be updating them to a more recent snapshot of NetBSD-9 then?

@bsiegert, FWIW I think it would be ok to reapply CL 315281 at this point.

I think we have enough evidence by now to show that both wait4 and wait6 lead to deadlocks; since it seems to be broken either way, probably we should use whichever of those system calls will be easier to report and/or debug upstream.

I have CL 370665 to apply timeouts to nearly every subprocess invocation in the runtime test (though wasn’t planning to land that until the tree opens). These failures are all in cmd/link or cmd/link/internal/ld. I could roll a CL to use RunWithTimeout in those tests.