go: runtime: fatal error: checkdead: runnable g

What version of Go are you using (go version)?

$ go version
go version go1.14.6 linux/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/prime/.cache/go-build"
GOENV="/home/prime/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/prime/Code/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/home/prime/Code/go/src/a.yandex-team.ru/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build172964829=/tmp/go-build -gno-record-gcc-switches"

What did you do?

I encountered the following error. Panic reliably happens in about one of 10000 jobs on a mapreduce cluster. My binary is built with CGO_ENABLED=0 and is started with GOMAXPROCS=1.

runtime: checkdead: find g 1 in status 1
fatal error: checkdead: runnable g

runtime stack:
runtime.throw(0xb1ac74, 0x15)
	/usr/local/go/src/runtime/panic.go:1116 +0x72 fp=0x7ffef3709728 sp=0x7ffef37096f8 pc=0x438e02
runtime.checkdead()
	/usr/local/go/src/runtime/proc.go:4407 +0x3c9 fp=0x7ffef37097a8 sp=0x7ffef3709728 pc=0x4457a9
runtime.mput(0xf2e7e0)
	/usr/local/go/src/runtime/proc.go:4824 +0x50 fp=0x7ffef37097c8 sp=0x7ffef37097a8 pc=0x446ee0
runtime.stopm()
	/usr/local/go/src/runtime/proc.go:1832 +0x7d fp=0x7ffef37097f0 sp=0x7ffef37097c8 pc=0x43f03d
runtime.exitsyscall0(0xc000000180)
	/usr/local/go/src/runtime/proc.go:3268 +0xc6 fp=0x7ffef3709820 sp=0x7ffef37097f0 pc=0x442b86
runtime.mcall(0xf2e280)
	/usr/local/go/src/runtime/asm_amd64.s:318 +0x5b fp=0x7ffef3709830 sp=0x7ffef3709820 pc=0x46919b

goroutine 1 [runnable]:
syscall.Syscall(0x1, 0x4, 0xc0000f4000, 0x1000, 0x1000, 0x1000, 0x0)
	/usr/local/go/src/syscall/asm_linux_amd64.s:18 +0x5 fp=0xc0000bc6e8 sp=0xc0000bc6e0 pc=0x48cb05
syscall.write(0x4, 0xc0000f4000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
	/usr/local/go/src/syscall/zsyscall_linux_amd64.go:914 +0xb9 fp=0xc0000bc768 sp=0xc0000bc6e8 pc=0x48ade9
syscall.Write(0x4, 0xc0000f4000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
	/usr/local/go/src/syscall/syscall_unix.go:214 +0x70 fp=0xc0000bc7c8 sp=0xc0000bc768 pc=0x488ff0
internal/poll.(*FD).Write(0xc000054600, 0xc0000f4000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
	/usr/local/go/src/internal/poll/fd_unix.go:268 +0x2b7 fp=0xc0000bc900 sp=0xc0000bc7c8 pc=0x4b0a77
os.(*File).write(0xc00000e7a0, 0xc0000f4000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
	/usr/local/go/src/os/file_unix.go:280 +0x81 fp=0xc0000bc978 sp=0xc0000bc900 pc=0x4bba21
os.(*File).Write(0xc00000e7a0, 0xc0000f4000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
	/usr/local/go/src/os/file.go:153 +0xe9 fp=0xc0000bc9f8 sp=0xc0000bc978 pc=0x4b89c9
bufio.(*Writer).Flush(0xc0000ee180, 0x0, 0x0)
	/usr/local/go/src/bufio/bufio.go:591 +0x141 fp=0xc0000bcaf0 sp=0xc0000bc9f8 pc=0x543a81
bufio.(*Writer).WriteString(0xc0000ee180, 0xc00047bfc2, 0x1a, 0x0, 0x0, 0x0)
	/usr/local/go/src/bufio/bufio.go:694 +0x24b fp=0xc0000bcbd8 sp=0xc0000bcaf0 pc=0x544b0b
a.yandex-team.ru/yt/go/yson.(*Writer).str(0xc0000ee1c0, 0xc00047bfb0, 0x2c)
	/home/prime/Code/go/src/a.yandex-team.ru/yt/go/yson/writer.go:242 +0x29e fp=0xc0000bcce0 sp=0xc0000bcbd8 pc=0x65292e
a.yandex-team.ru/yt/go/yson.(*Writer).String(0xc0000ee1c0, 0xc00047bfb0, 0x2c)
	/home/prime/Code/go/src/a.yandex-team.ru/yt/go/yson/writer.go:371 +0x107 fp=0xc0000bcd40 sp=0xc0000bcce0 pc=0x653aa7
a.yandex-team.ru/yt/go/yson.encodeAny(0xc0000ee1c0, 0xa57740, 0xc000222a38, 0x0, 0x0)
	/home/prime/Code/go/src/a.yandex-team.ru/yt/go/yson/marshal.go:200 +0x20c1 fp=0xc0000bd170 sp=0xc0000bcd40 pc=0x63faf1
a.yandex-team.ru/yt/go/yson.encodeReflectStruct.func1(0xc0000d5d00, 0xa, 0x10, 0x0, 0x0)
	/home/prime/Code/go/src/a.yandex-team.ru/yt/go/yson/marshal.go:463 +0x316 fp=0xc0000bd2b8 sp=0xc0000bd170 pc=0x656dc6
a.yandex-team.ru/yt/go/yson.encodeReflectStruct(0xc0000ee1c0, 0xaf1e40, 0xc000222a10, 0x99, 0x0, 0x0)
	/home/prime/Code/go/src/a.yandex-team.ru/yt/go/yson/marshal.go:487 +0x33d fp=0xc0000bd3d0 sp=0xc0000bd2b8 pc=0x6414ad
a.yandex-team.ru/yt/go/yson.encodeReflect(0xc0000ee1c0, 0xaf1e40, 0xc000222a10, 0x99, 0x0, 0x0)
	/home/prime/Code/go/src/a.yandex-team.ru/yt/go/yson/marshal.go:350 +0x71b fp=0xc0000bd4c8 sp=0xc0000bd3d0 pc=0x64045b
a.yandex-team.ru/yt/go/yson.encodeAny(0xc0000ee1c0, 0xaf1e40, 0xc000222a10, 0x0, 0x0)
	/home/prime/Code/go/src/a.yandex-team.ru/yt/go/yson/marshal.go:326 +0x9a4 fp=0xc0000bd8f8 sp=0xc0000bd4c8 pc=0x63e3d4
a.yandex-team.ru/yt/go/yson.(*Writer).Any(0xc0000ee1c0, 0xaf1e40, 0xc000222a10)
	/home/prime/Code/go/src/a.yandex-team.ru/yt/go/yson/writer.go:668 +0x5d fp=0xc0000bd940 sp=0xc0000bd8f8 pc=0x655d3d
a.yandex-team.ru/yt/go/mapreduce.(*writer).Write(0xc0000e6630, 0xaf1e40, 0xc000222a10, 0x0, 0x0)
	/home/prime/Code/go/src/a.yandex-team.ru/yt/go/mapreduce/writer.go:34 +0x89 fp=0xc0000bd988 sp=0xc0000bd940 pc=0x8cb2c9
a.yandex-team.ru/yt/go/mapreduce.(*writer).MustWrite(0xc0000e6630, 0xaf1e40, 0xc000222a10)
	/home/prime/Code/go/src/a.yandex-team.ru/yt/go/mapreduce/writer.go:39 +0x43 fp=0xc0000bd9d8 sp=0xc0000bd988 pc=0x8cb363
a.yandex-team.ru/yt/jaeger.RebuildIndexJob.Do(0xbb5920, 0xc00006f940, 0xbbec80, 0xc0000b0480, 0xc00000da80, 0x2, 0x2, 0x0, 0x0)
	/home/prime/Code/go/src/a.yandex-team.ru/yt/jaeger/rebuild_index.go:28 +0x2e8 fp=0xc0000bdbf8 sp=0xc0000bd9d8 pc=0xa04998
a.yandex-team.ru/yt/jaeger.(*RebuildIndexJob).Do(0xf58688, 0xbb5920, 0xc00006f940, 0xbbec80, 0xc0000b0480, 0xc00000da80, 0x2, 0x2, 0x0, 0x0)
	<autogenerated>:1 +0xa9 fp=0xc0000bdc68 sp=0xc0000bdbf8 pc=0xa084e9
a.yandex-team.ru/yt/go/mapreduce.JobMain(0x0)
	/home/prime/Code/go/src/a.yandex-team.ru/yt/go/mapreduce/main.go:72 +0x9d1 fp=0xc0000bdf60 sp=0xc0000bdc68 pc=0x8c4d61
main.main()
	/home/prime/Code/go/src/a.yandex-team.ru/yt/jaeger/yt-jaeger/main.go:12 +0x35 fp=0xc0000bdf88 sp=0xc0000bdf60 pc=0xa0c6e5
runtime.main()
	/usr/local/go/src/runtime/proc.go:203 +0x1c8 fp=0xc0000bdfe0 sp=0xc0000bdf88 pc=0x43b4a8
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1373 +0x1 fp=0xc0000bdfe8 sp=0xc0000bdfe0 pc=0x46b141

goroutine 2 [force gc (idle)]:
runtime.gopark(0xb3b238, 0xf2ccc0, 0x1411, 0x1)
	/usr/local/go/src/runtime/proc.go:304 +0xcb fp=0xc00003af88 sp=0xc00003af58 pc=0x43b86b
runtime.goparkunlock(0xf2ccc0, 0x1411, 0x1)
	/usr/local/go/src/runtime/proc.go:310 +0x53 fp=0xc00003afb8 sp=0xc00003af88 pc=0x43b923
runtime.forcegchelper()
	/usr/local/go/src/runtime/proc.go:253 +0xaa fp=0xc00003afe0 sp=0xc00003afb8 pc=0x43b70a
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1373 +0x1 fp=0xc00003afe8 sp=0xc00003afe0 pc=0x46b141
created by runtime.init.6
	/usr/local/go/src/runtime/proc.go:242 +0x35

goroutine 3 [GC sweep wait]:
runtime.gopark(0xb3b238, 0xf2d300, 0x46140c, 0x1)
	/usr/local/go/src/runtime/proc.go:304 +0xcb fp=0xc00003b780 sp=0xc00003b750 pc=0x43b86b
runtime.goparkunlock(0xf2d300, 0xba140c, 0x1)
	/usr/local/go/src/runtime/proc.go:310 +0x53 fp=0xc00003b7b0 sp=0xc00003b780 pc=0x43b923
runtime.bgsweep(0xc000052000)
	/usr/local/go/src/runtime/mgcsweep.go:89 +0x101 fp=0xc00003b7d8 sp=0xc00003b7b0 pc=0x427161
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1373 +0x1 fp=0xc00003b7e0 sp=0xc00003b7d8 pc=0x46b141
created by runtime.gcenable
	/usr/local/go/src/runtime/mgc.go:214 +0x5c

goroutine 4 [sleep]:
runtime.gopark(0xb3b238, 0xf2d2c0, 0xc000011313, 0x2)
	/usr/local/go/src/runtime/proc.go:304 +0xcb fp=0xc00003bf20 sp=0xc00003bef0 pc=0x43b86b
runtime.goparkunlock(0xf2d2c0, 0x2c99138c01313, 0x2)
	/usr/local/go/src/runtime/proc.go:310 +0x53 fp=0xc00003bf50 sp=0xc00003bf20 pc=0x43b923
runtime.scavengeSleep(0x928f8, 0x13b74e7)
	/usr/local/go/src/runtime/mgcscavenge.go:214 +0x83 fp=0xc00003bf80 sp=0xc00003bf50 pc=0x4255b3
runtime.bgscavenge(0xc000052000)
	/usr/local/go/src/runtime/mgcscavenge.go:337 +0x1ce fp=0xc00003bfd8 sp=0xc00003bf80 pc=0x4257ae
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1373 +0x1 fp=0xc00003bfe0 sp=0xc00003bfd8 pc=0x46b141
created by runtime.gcenable
	/usr/local/go/src/runtime/mgc.go:215 +0x7e

goroutine 5 [finalizer wait]:
runtime.gopark(0xb3b238, 0xf585d0, 0xf51410, 0x1)
	/usr/local/go/src/runtime/proc.go:304 +0xcb fp=0xc00003a728 sp=0xc00003a6f8 pc=0x43b86b
runtime.goparkunlock(0xf585d0, 0xb31410, 0x1)
	/usr/local/go/src/runtime/proc.go:310 +0x53 fp=0xc00003a758 sp=0xc00003a728 pc=0x43b923
runtime.runfinq()
	/usr/local/go/src/runtime/mfinal.go:175 +0x96 fp=0xc00003a7e0 sp=0xc00003a758 pc=0x41cec6
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1373 +0x1 fp=0xc00003a7e8 sp=0xc00003a7e0 pc=0x46b141
created by runtime.createfing
	/usr/local/go/src/runtime/mfinal.go:156 +0x61

goroutine 6 [GC worker (idle)]:
runtime.gopark(0xb3b0b8, 0xc0004340f0, 0x1418, 0x0)
	/usr/local/go/src/runtime/proc.go:304 +0xcb fp=0xc00003c758 sp=0xc00003c728 pc=0x43b86b
runtime.gcBgMarkWorker(0xc00002c000)
	/usr/local/go/src/runtime/mgc.go:1873 +0x119 fp=0xc00003c7d8 sp=0xc00003c758 pc=0x420659
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1373 +0x1 fp=0xc00003c7e0 sp=0xc00003c7d8 pc=0x46b141
created by runtime.gcBgMarkStartWorkers
	/usr/local/go/src/runtime/mgc.go:1821 +0x77

I’m unable to reproduce this issue locally, but I was able to collect a coredump. Binary and core dump are inside checkdead.zip archive. My binary is reading data from stdin and writing data to stdout, without spawning any additional goroutines.

I would be glad to hear any suggestions on how to futher diagnose this issue.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 27 (19 by maintainers)

Commits related to this issue

Most upvoted comments

@prattmic I think you’ve found the problem. sysmon calls startm which acquires the only P and releases the scheduler lock. exitsyscall0 acquires the scheduler lock, finds no P, puts the G on the global run queue, releases the lock, calls stopm which calls checkdead. At this point there is no P available, because startm snagged it. But there is also no M available, because startm hasn’t yet made it to the line sched.mnext++ in mcommoninit, and it will never get there because checkdead is holding the scheduler lock.

In short, the problem is that sysmon -> startm -> newm can acquire a P before creating an M to run on that P. When GOMAXPROCS == 1, we are in a situation where there are no P’s and no M’s, so things look bleak.