go: runtime: netpoll port_getn() fails with impossible errno on illumos
What version of Go are you using (go version
)?
$ go version go version go1.16.3 illumos/amd64
Does this issue reproduce with the latest release?
I do not know, I found the issue running Navidrome
https://github.com/navidrome/navidrome
which enforces building only with Go 1.16
The relevant bits of netpoll_solaris.go
do not seem to be changed against latest.
What operating system and processor architecture are you using (go env
)?
go env
Output
$ go env GO111MODULE="" GOARCH="amd64" GOBIN="" GOCACHE="/home/whorfin/.cache/go-build" GOENV="/home/whorfin/.config/go/env" GOEXE="" GOFLAGS="" GOHOSTARCH="amd64" GOHOSTOS="illumos" GOINSECURE="" GOMODCACHE="/home/whorfin/go/pkg/mod" GONOPROXY="" GONOSUMDB="" GOOS="illumos" GOPATH="/home/whorfin/go" GOPRIVATE="" GOPROXY="https://proxy.golang.org,direct" GOROOT="/opt/ooce/go-1.16" GOSUMDB="sum.golang.org" GOTMPDIR="" GOTOOLDIR="/opt/ooce/go-1.16/pkg/tool/illumos_amd64" GOVCS="" GOVERSION="go1.16.3" GCCGO="gccgo" AR="ar" CC="gcc" CXX="g++" CGO_ENABLED="1" GOMOD="/home/whorfin/navidrome/go.mod" CGO_CFLAGS="-g -O2" CGO_CPPFLAGS="" CGO_CXXFLAGS="-g -O2" CGO_FFLAGS="-g -O2" CGO_LDFLAGS="-g -O2" PKG_CONFIG="pkg-config" GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build3292860960=/tmp/go-build -gno-record-gcc-switches"
What did you do?
Having built the latest navidrome (master from https://github.com/navidrome/navidrome), it will sporadically crash with:
runtime: port_getn on fd 4 failed (errno=0)
fatal error: runtime: netpoll failed
runtime stack:
runtime.throw(0xe01def, 0x17)
/opt/ooce/go-1.16/src/runtime/panic.go:1117 +0x72
runtime.netpoll(0x71c11d, 0x12202ce5e8f688)
/opt/ooce/go-1.16/src/runtime/netpoll_solaris.go:249 +0x505
runtime.findrunnable(0xc000034000, 0x0)
/opt/ooce/go-1.16/src/runtime/proc.go:2879 +0x3ee
runtime.schedule()
/opt/ooce/go-1.16/src/runtime/proc.go:3125 +0x2d7
runtime.park_m(0xc000001080)
/opt/ooce/go-1.16/src/runtime/proc.go:3274 +0x9d
runtime.mcall(0x57eac8)
/opt/ooce/go-1.16/src/runtime/asm_amd64.s:327 +0x64
goroutine 1 [chan receive, 26 minutes]:
github.com/oklog/run.(*Group).Run(0xc000052d20, 0xc0003fa0b0, 0x1)
/home/whorfin/go/pkg/mod/github.com/oklog/run@v1.1.0/group.go:43 +0xed
github.com/navidrome/navidrome/cmd.runNavidrome()
/home/whorfin/navidrome/cmd/root.go:59 +0x107
github.com/navidrome/navidrome/cmd.glob..func2(0x25e8540, 0x266d098, 0x0, 0x0)
/home/whorfin/navidrome/cmd/root.go:31 +0x25
github.com/spf13/cobra.(*Command).execute(0x25e8540, 0xc000092020, 0x0, 0x0, 0x25e8540, 0xc000092020)
/home/whorfin/go/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:856 +0x2c2
github.com/spf13/cobra.(*Command).ExecuteC(0x25e8540, 0xc000000180, 0x200000003, 0xc000000180)
/home/whorfin/go/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:960 +0x375
github.com/spf13/cobra.(*Command).Execute(...)
/home/whorfin/go/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:897
github.com/navidrome/navidrome/cmd.Execute()
/home/whorfin/navidrome/cmd/root.go:39 +0x65
main.main()
/home/whorfin/navidrome/main.go:11 +0x2b
That (errno=0)
appears to be impossible.
What did you expect to see?
Ideally, no crashes. If there were a crash, I’d expect a meaningful errno
.
What did you see instead?
A crash with errno=0
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 27 (12 by maintainers)
Here is a script that will trace some of the
port_getn()
calls for a process:e.g.,
some output from an
acme-dns
process:Critically this script looks at two things:
libc
-levelport_getn()
call, including the per-threaderrno
value that Go should be collecting viaerrno()
portfs
system call made byport_getn()
which is whattruss
is showing you as well, including theerrno
value that the kernel put in place for the system call return (this is separate but related to thelibc
errno
value)There is a conditional print if we detect a failure return (
-1
) whereerrno
is 0, which should print something like:It would be good to capture this sort of trace output while the problem is occurring, so that we can determine whether the C library is mishandling the error numbers here, or if it is some part of the Go machinery for fetching thread local errno after a C library call, or something else.