go: all: test failures on `netbsd` blocked on waiting for subprocesses

#!watchflakes
post <- builder ~ `netbsd-.*` && (log ~ `^\s*os\.wait6` || log ~ `^\s*syscall.wait4` || `DETECTED A HANGING GO COMMAND` || `test timed out while running command` || `script_test\.go:\d+: .*: signal: killed`)

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Comments: 131 (17 by maintainers)

Commits related to this issue

Most upvoted comments

Thanks! Some quick followups:

Note that Go by default does not link against libpthread.so at all. It calls the system call lwp_create directly. That said, there are cases where it calls pthread_create, and I haven’t checked which case applies for these failures.

I examined a go1.20 executable on netbsd-9 with readelf --dynamic and it certainly looks like it’s linked against libpthread:

% readelf --dynamic go | grep NEEDED
 0x0000000000000001 (NEEDED)             Shared library: [libresolv.so.3]
 0x0000000000000001 (NEEDED)             Shared library: [libpthread.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.12]

If you’re calling the _lwp_create syscall directly instead of using the pthread_create symbol, but you’re calling any other functions via dynamic symbols, that will get you in trouble and having a newer libpthread.so with the rtld/pthread fix won’t help. There is probably lots of other trouble you can get into if you bypass the pthread symbols – I would have to think more about it.

Is there a document anywhere that lays out how this works on each OS, or roughly how things like the syscalls and cgo dynamic symbol calls are put together?

As far as I can tell, the child process has exited.

How do you tell?

That is a fair point. My only real reason for thinking that the problem is with wait6 is that we aren’t seeing the builder hang. We’re only seeing the builder time out while waiting for child processes.

Can you be more specific about what the difference is?

To be clear, I am suggesting that this is a NetBSD kernel bug in wait6.