go: runtime: panics and/or deadlocks on FreeBSD 12.2-RELEASE

What version of Go are you using (go version)?

$ go version
go version go1.15.6 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE="on"
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/cs/.cache/go-build"
GOENV="/home/cs/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/cs/development/golang/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/cs/development/golang"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/cs/go1.15.6"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/cs/go1.15.6/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build804757202=/tmp/go-build -gno-record-gcc-switches"

What did you do?

I maintain zrepl, a tool for ZFS replication written in Go. Some users have been hitting Go runtime panics and/or lock-ups on FreeBSD.

The problems started with reports in July 2020 with a user on a development version of FreeBSD (12-STABLE, it must have been somewhere between 12.1-RELEASE and 12.2-RELEASE). The problem occurred both with Go 1.13.5 and 14.*. At the time I requested to reproduce the problem on an official binary release of FreeBSD but that did not happen.

After I updated my personal FreeBSD server to 12.2-RELEASE, I started to encounter similar issues as reported in July 2020. I have not yet encountered runtime panics. But several lock-ups (for lack of a better term) of the Go runtime. The last of the links above contains a stack trace of a goroutine blocked forever on runtime.newobject (stack obtained using dlv).

Summary of my triaging since July:

  • We have ruled out that it’s due to the one use of unsafe in zrepl by removing the unsafe code path in a test build. The panics / lock-ups still occurred.
  • The problems stop reproducibly when limiting the process to one CPU using the OS scheduler (cpuset).
  • Most often the problems happen while sockets are being used. It does not happen when the daemon is idle.
  • We have ruled out faulty hardware.
  • The issue has only occurred on Intel systems on bare metal so far. I was unable to reproduce it in a stress test between two FreeBSD VMs on a Ryzen 1700X.
  • I suspect the root cause is one of the following:
    • FreeBSD kernel bug introduced between 12.1-RELEASE and 12.2-RELEASE
    • Go runtime bug (would need to be present in multiple Go versions though)

It would be very helpful to get a quick explanation of what these panics mean so that I can narrow down my audit of the changes between FreeBSD 12.1-RELEASE and 12.2-RELEASE.

Also, I can offer a tmate or similar to a Go / FreeBSD developer to the system with the locked-up daemon. The lock-up usually occurs after 2-3 days on my system, sometimes sooner, but I can leave it in the locked-up state for a day or two.

Related zrepl issues:

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Comments: 55 (8 by maintainers)

Most upvoted comments

FYI, the issue I have been seeing with zrepl crashing on FreeBSD RELENG_12 seems to have been fixed (or vastly improved at least) with https://cgit.freebsd.org/src/commit/?h=stable/12&id=1820ca2154611d6f27ce5a5fdd561a16ac54fdd8 5 days without a crash and I would normally have seen a few by now

@egonelbre yes we use 1.16rc1 after ruling out that it’s a regression in Go.

(See https://github.com/golang/go/issues/43873 on the Go issue tracker as well )

All right, thanks for clearing this up. I’m “looking forward” to the results 😉