go: log/syslog, net: goroutine permanently stuck in waitWrite() to /dev/log using a unix datagram socket
Writes to syslog using a unix datagram socket (the default behavior on Ubuntu 14.04, probably more) may become stuck indefinitely if the syslog daemon closes /dev/log.
The following reproduces the issue. This code is derived from a reproducer found in the comments section of https://github.com/golang/go/issues/5932 - https://play.golang.org/p/vp_e6n8VJuX
To reproduce:
- Execute the above reproducer binary several times in the background (3-6 times is usually good).
- Restart your syslog daemon. I’m using syslog-ng, so
sudo service syslog-ng restart
suffices. - Repeat this exact experiment a few times until you observe some of the runs never exit. Kill them with -SIQUIT and hopefully see the same stack pasted below. It sometimes helps to stagger running the the background runs, restart the daemon, wait a few seconds, run some more in the background, restart the daemon again. Get aggressive. Anger the system.
We first observed this bad behavior in a production environment last week during a DNS outage. The outage prevented our syslog-ng daemon from reloading configuration properly, since it exits uncleanly on configuration reload when DNS is unavailable. We then observed a large portion of production boxes hung completely, with one goroutine waiting on syslog and the rest waiting to acquire the log package mutex. I believe the syslog-ng daemon restart caused the bad behavior to be exposed in the syslog and/or net packages. Our syslog write volume is fairly high.
After that, I reproduced the issue locally using the above reproducer. I think the badness has to do with missing a socket close event, or some other invalid state transition in the net code.
Goroutine stack after killing the stuck process with -SIGQUIT: https://pastebin.com/GuV5JZDS
Environment: $ go version go version go1.7 linux/amd64 $ go env GOARCH=“amd64” GOBIN=“” GOEXE=“” GOHOSTARCH=“amd64” GOHOSTOS=“linux” GOOS=“linux” GOPATH=“/home/jesmet/.gvm/pkgsets/go1.7/global” GORACE=“” GOROOT=“/home/jesmet/.gvm/gos/go1.7” GOTOOLDIR=“/home/jesmet/.gvm/gos/go1.7/pkg/tool/linux_amd64” CC=“gcc” GOGCCFLAGS=“-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build126114309=/tmp/go-build -gno-record-gcc-switches” CXX=“g++” CGO_ENABLED=“1”
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 20 (13 by maintainers)
Reply on netdev maililng list, with kernel patch: https://www.spinics.net/lists/netdev/msg512686.html
Closing this issue since there it doesn’t seem to be a Go bug.