go: runtime: "runtime·lock: lock count" fatal error when cgo is enabled

What version of Go are you using (go version)?

$ go version
go version go1.19.2 linux/amd64

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/ubuntu/.cache/go-build"
GOENV="/home/ubuntu/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/ubuntu/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/ubuntu/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.19.2"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build1867075057=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Ran tests for containerd/containerd#7513

~/containerd$ go test -c ./snapshots/overlay
~/containerd$ sudo ./overlay.test -test.run -test.root TestOverlay/no_opt/128LayersMount

(Unfortunately, root is required as the test issues many mount syscalls. I have not had success creating a more minimal reproducer, but this test case takes only a few seconds to run to completion and reproduces the runtime errors fairly reliably.)

What did you expect to see?

The test either passes or fails.

What did you see instead?

  • Crashes reminiscent of #25128

fatal error: runtime·lock: lock count followed by hundreds (thousands?) of lines of fatal error: runtime·unlock: lock count. Sometimes these are followed by other runtime errors, such as:

fatal: morestack on g0

fatal: systemstack called from unexpected goroutineTrace/breakpoint trap

An "impossible" segfault in perfectly ordinary Go code.
unexpected fault address 0x0
fatal error: fault
[signal SIGSEGV: segmentation violation code=0x80 addr=0x0 pc=0x784dcf]
(gdb) disass 0x784dcf
Dump of assembler code for function github.com/containerd/continuity.(*resource).Path:
   0x0000000000784dc0 <+0>:	mov    (%rax),%rcx
   0x0000000000784dc3 <+3>:	cmpq   $0x0,0x8(%rax)
   0x0000000000784dc8 <+8>:	jne    0x784dcf <github.com/containerd/continuity.(*resource).Path+15>
   0x0000000000784dca <+10>:	xor    %eax,%eax
   0x0000000000784dcc <+12>:	xor    %ebx,%ebx
   0x0000000000784dce <+14>:	ret
   0x0000000000784dcf <+15>:	mov    (%rcx),%rax
   0x0000000000784dd2 <+18>:	mov    0x8(%rcx),%rbx
   0x0000000000784dd6 <+22>:	ret
End of assembler dump.

https://github.com/containerd/continuity/blob/5ad51c7aca47b8e742f5e6e7dc841d50f5f6affd/resource.go#L270

A slice with length > 0 somehow had a nil data pointer… or rcx got clobbered in the middle of the function. No unsafe type-punning is used to construct the slice and go test -race does not complain.

<tt>fatal error: malloc deadlock</tt> / <tt>panic during panic</tt> followed by what appeared to be two interleaved stack dumps
fatal error: runtime·unlock: lock count
fatal error: runtime·unlock: lock count
fatal error: runtime·unlock: lock count
fatal error: malloc deadlock
panic during panic

runtime stack: runtime.throw({0x87177d?, 0x7f7d35498848?}) /usr/local/go/src/runtime/panic.go:1047 +0x5d fp=0x7f7d35498820 sp=0x7f7d354987f0 pc=0x4399fd runtime.unlock2(0x1b?) /usr/local/go/src/runtime/lock_futex.go:127 +0x7a fp=0x7f7d35498840 sp=0x7f7d35498820 pc=0x40db9a runtime.unlockWithRank(…) /usr/local/go/src/runtime/lockrank_off.go:32 runtime.unlock(…) /usr/local/go/src/runtime/lock_futex.go:112 runtime.printunlock() /usr/local/go/src/runtime/print.go:80 +0x3b fp=0x7f7d35498858 sp=0x7f7d35498840 pc=0x43b41b runtime.throw.func1() /usr/local/go/src/runtime/panic.go:1044 +0x55 fp=0x7f7d35498888 sp=0x7f7d35498858 pc=0x439a75 runtime.throw({0x87177d?, 0x7f7d354988e0?}) /usr/local/go/src/runtime/panic.go:1043 +0x46 fp=0x7f7d354988b8 sp=0x7f7d35498888 pc=0x4399e6 runtime.unlock2(0x1b?) /usr/local/go/src/runtime/lock_futex.go:127 +0x7a fp=0x7f7d354988d8 sp=0x7f7d354988b8 pc=0x40db9a runtime.unlockWithRank(…) /usr/local/go/src/runtime/lockrank_off.go:32 runtime.unlock(…) /usr/local/go/src/runtime/lock_futex.go:112 runtime.printunlock() /usr/local/go/src/runtime/print.go:80 +0x3b fp=0x7f7d354988f0 sp=0x7f7d354988d8 pc=0x43b41b runtime.throw.func1() /usr/local/go/src/runtime/panic.go:1044 +0x55 fp=0x7f7d35498920 sp=0x7f7d354988f0 pc=0x439a75 runtime.throw({0x87177d?, 0x7f7d35498978?}) /usr/local/go/src/runtime/panic.go:1043 +0x46 fp=0x7f7d35498950 sp=0x7f7d35498920 pc=0x4399e6 runtime.unlock2(0x1b?) /usr/local/go/src/runtime/lock_futex.go:127 +0x7a fp=0x7f7d35498970 sp=0x7f7d35498950 pc=0x40db9a runtime.unlockWithRank(…) /usr/local/go/src/runtime/lockrank_off.go:32 runtime.unlock(…) /usr/local/go/src/runtime/lock_futex.go:112 runtime.printunlock() /usr/local/go/src/runtime/print.go:80 +0x3b fp=0x7f7d35498988 sp=0x7f7d35498970 pc=0x43b41b runtime.throw.func1() /usr/local/go/src/runtime/panic.go:1044 +0x55 fp=0x7f7d354989b8 sp=0x7f7d35498988 pc=0x439a75 runtime.throw({0x87177d?, 0x7f7d35498a10?}) /usr/local/go/src/runtime/panic.go:1043 +0x46 fp=0x7f7d354989e8 sp=0x7f7d354989b8 pc=0x4399e6 runtime.unlock2(0x1b?) /usr/local/go/src/runtime/lock_futex.go:127 +0x7a fp=0x7f7d35498a08 sp=0x7f7d354989e8 pc=0x40db9a runtime.unlockWithRank(…) /usr/local/go/src/runtime/lockrank_off.go:32 runtime.unlock(…) /usr/local/go/src/runtime/lock_futex.go:112 runtime.printunlock() /usr/local/go/src/runtime/print.go:80 +0x3b fp=0x7f7d35498a20 sp=0x7f7d35498a08 pc=0x43b41b runtime.throw.func1() /usr/local/go/src/runtime/panic.go:1044 +0x55 fp=0x7f7d35498a50 sp=0x7f7d35498a20 pc=0x439a75 runtime.throw({0x87177d?, 0x7f7d35498aa8?}) /usr/local/go/src/runtime/panic.go:1043 +0x46 fp=0x7f7d35498a80 sp=0x7f7d35498a50 pc=0x4399e6 runtime.unlock2(0x1b?) /usr/local/go/src/runtime/lock_futex.go:127 +0x7a fp=0x7f7d35498aa0 sp=0x7f7d35498a80 pc=0x40db9a runtime.unlockWithRank(…) /usr/local/go/src/runtime/lockrank_off.go:32 runtime.unlock(…) /usr/local/go/src/runtime/lock_futex.go:112 runtime.printunlock() /usr/local/go/src/runtime/print.go:80 +0x3b fp=0x7f7d35498ab8 sp=0x7f7d35498aa0 pc=0x43b41b runtime.throw.func1() /usr/local/go/src/runtime/panic.go:1044 +0x55 fp=0x7f7d35498ae8 sp=0x7f7d35498ab8 pc=0x439a75 runtime.throw({0x87177d?, 0x7f7d35498b40?}) /usr/local/go/src/runtime/panic.go:1043 +0x46 fp=0x7f7d35498b18 sp=0x7f7d35498ae8 pc=0x4399e6 runtime.unlock2(0x1b?) /usr/local/go/src/runtime/lock_futex.go:127 +0x7a fp=0x7f7d35498b38 sp=0x7f7d35498b18 pc=0x40db9a runtime.unlockWithRank(…) /usr/local/go/src/runtime/lockrank_off.go:32 runtime.unlock(…) /usr/local/go/src/runtime/lock_futex.go:112 runtime.printunlock() /usr/local/go/src/runtime/print.go:80 +0x3b fp=0x7f7d35498b50 sp=0x7f7d35498b38 pc=0x43b41b runtime.throw.func1() /usr/local/go/src/runtime/panic.go:1044 +0x55 fp=0x7f7d35498b80 sp=0x7f7d35498b50 pc=0x439a75 runtime.throw({0x87177d?, 0x7f7d35498bd8?}) /usr/local/go/src/runtime/panic.go:1043 +0x46 fp=0x7f7d35498bb0 sp=0x7f7d35498b80 pc=0x4399e6 runtime.unlock2(0x1b?) /usr/local/go/src/runtime/lock_futex.go:127 +0x7a fp=0x7f7d35498bd0 sp=0x7f7d35498bb0 pc=0x40db9a runtime.unlockWithRank(…) /usr/local/go/src/runtime/lockrank_off.go:32 runtime.unlock(…) /usr/local/go/src/runtime/lock_futex.go:112 runtime.printunlock() /usr/local/go/src/runtime/print.go:80 +0x3b fp=0x7f7d35498be8 sp=0x7f7d35498bd0 pc=0x43b41b runtime.throw.func1() /usr/local/go/src/runtime/panic.go:1044 +0x55 fp=0x7f7d35498c18 sp=0x7f7d35498be8 pc=0x439a75 runtime.throw({0x87177d?, 0x7f7d35498c70?}) /usr/local/go/src/runtime/panic.go:1043 +0x46 fp=0x7f7d35498c48 sp=0x7f7d35498c18 pc=0x4399e6 runtime.unlock2(0x1b?) /usr/local/go/src/runtime/lock_futex.go:127 +0x7a fp=0x7f7d35498c68 sp=0x7f7d35498c48 pc=0x40db9a runtime.unlockWithRank(…) /usr/local/go/src/runtime/lockrank_off.go:32 runtime.unlock(…) /usr/local/go/src/runtime/lock_futex.go:112 runtime.printunlock() /usr/local/go/src/runtime/print.go:80 +0x3b fp=0x7f7d35498c80 sp=0x7f7d35498c68 pc=0x43b41b runtime.throw.func1() /usr/local/go/src/runtime/panic.go:1044 +0x55 fp=0x7f7d35498cb0 sp=0x7f7d35498c80 pc=0x439a75 runtime.throw({0x87177d?, 0x7f7d35498d08?}) /usr/local/go/src/runtime/panic.go:1043 +0x46 fp=0x7f7d35498ce0 sp=0x7f7d35498cb0 pc=0x4399e6 runtime.unlock2(0x1b?) /usr/local/go/src/runtime/lock_futex.go:127 +0x7a fp=0x7f7d35498d00 sp=0x7f7d35498ce0 pc=0x40db9a runtime.unlockWithRank(…) /usr/local/go/src/runtime/lockrank_off.go:32 runtime.unlock(…) /usr/local/go/src/runtime/lock_futex.go:112 runtime.printunlock() /usr/local/go/src/runtime/print.go:80 +0x3b fp=0x7f7d35498d18 sp=0x7f7d35498d00 pc=0x43b41b runtime.throw.func1() /usr/local/go/src/runtime/panic.go:1044 +0x55 fp=0x7f7d35498d48 sp=0x7f7d35498d18 pc=0x439a75 runtime.throw({0x87177d?, 0x7f7d35498da0?}) /usr/local/go/src/runtime/panic.go:1043 +0x46 fp=0x7f7d35498d78 sp=0x7f7d35498d48 pc=0x4399e6 runtime.unlock2(0x1b?) /usr/local/go/src/runtime/lock_futex.go:127 +0x7a fp=0x7f7d35498d98 sp=0x7f7d35498d78 pc=0x40db9a runtime.unlockWithRank(…) /usr/local/go/src/runtime/lockrank_off.go:32 runtime.unlock(…) /usr/local/go/src/runtime/lock_futex.go:112 runtime.printunlock() /usr/local/go/src/runtime/print.go:80 +0x3b fp=0x7f7d35498db0 sp=0x7f7d35498d98 pc=0x43b41b runtime.throw.func1() /usr/local/go/src/runtime/panic.go:1044 +0x55 fp=0x7f7d35498de0 sp=0x7f7d35498db0 pc=0x439a75 runtime.throw({0x87177d?, 0x7f7d35498e38?}) /usr/local/go/src/runtime/panic.go:1043 +0x46 fp=0x7f7d35498e10 sp=0x7f7d35498de0 pc=0x4399e6 runtime.unlock2(0x1b?) /usr/local/go/src/runtime/lock_futex.go:127 +0x7a fp=0x7f7d35498e30 sp=0x7f7d35498e10 pc=0x40db9a runtime.unlockWithRank(…) /usr/local/go/src/runtime/lockrank_off.go:32 runtime.unlock(…) /usr/local/go/src/runtime/lock_futex.go:112 runtime.printunlock() /usr/local/go/src/runtime/print.go:80 +0x3b fp=0x7f7d35498e48 sp=0x7f7d35498e30 pc=0x43b41b runtime.throw.func1() /usr/local/go/src/runtime/panic.go:1044 +0x55 fp=0x7f7d35498e78 sp=0x7f7d35498e48 pc=0x439a75 runtime.throw({0x87177d?, 0x7f7d35498ed0?}) /usr/local/go/src/runtime/panic.go:1043 +0x46 fp=0x7f7d35498ea8 sp=0x7f7d35498e78 pc=0x4399e6 runtime.unlock2(0x1b?) /usr/local/go/src/runtime/lock_futex.go:127 +0x7a fp=0x7f7d35498ec8 sp=0x7f7d35498ea8 pc=0x40db9a runtime.unlockWithRank(…) /usr/local/go/src/runtime/lockrank_off.go:32 runtime.unlock(…) /usr/local/go/src/runtime/lock_futex.go:112 runtime.printunlock() /usr/local/go/src/runtime/print.go:80 +0x3b fp=0x7f7d35498ee0 sp=0x7f7d35498ec8 pc=0x43b41b runtime.throw.func1() /usr/local/go/src/runtime/panic.go:1044 +0x55 fp=0x7f7d35498f10 sp=0x7f7d35498ee0 pc=0x439a75 runtime.throw({0x87177d?, 0x7f7d35498f68?}) /usr/local/go/src/runtime/panic.go:1043 +0x46 fp=0x7f7d35498f40 sp=0x7f7d35498f10 pc=0x4399e6 runtime.unlock2(0x1b?) /usr/local/go/src/runtime/lock_futex.go:127 +0x7a fp=0x7f7d35498f60 sp=0x7f7d35498f40 pc=0x40db9a runtime.unlockWithRank(…) /usr/local/go/src/runtime/lockrank_off.go:32 runtime.unlock(…) /usr/local/go/src/runtime/lock_futex.go:112 runtime.printunlock() /usr/local/go/src/runtime/print.go:80 +0x3b fp=0x7f7d35498f78 sp=0x7f7d35498f60 pc=0x43b41b runtime.throw.func1() /usr/local/go/src/runtime/panic.go:1044 +0x55 fp=0x7f7d35498fa8 sp=0x7f7d35498f78 pc=0x439a75 runtime.throw({0x87177d?, 0x7f7d35499000?}) /usr/local/go/src/runtime/panic.go:1043 +0x46 fp=0x7f7d35498fd8 sp=0x7f7d35498fa8 pc=0x4399e6 runtime.unlock2(0x1b?) /usr/local/go/src/runtime/lock_futex.go:127 +0x7a fp=0x7f7d35498ff8 sp=0x7f7d35498fd8 pc=0x40db9a runtime.unlockWithRank(…) /usr/local/go/src/runtime/lockrank_off.go:32 runtime.unlock(…) /usr/local/go/src/runtime/lock_futex.go:112 runtime.printunlock() /usr/local/go/src/runtime/print.go:80 +0x3b fp=0x7f7d35499010 sp=0x7f7d35498ff8 pc=0x43b41b runtime.throw.func1() /usr/local/go/src/runtime/panic.go:1044 +0x55 fp=0x7f7d35499040 sp=0x7f7d35499010 pc=0x439a75 runtime.throw({0x87177d?, 0x7f7d35499098?}) /usr/local/go/src/runtime/panic.go:1043 +0x46 fp=0x7f7d35499070 sp=0x7f7d35499040 pc=0x4399e6 runtime.unlock2(0x1b?) /usr/local/go/src/runtime/lock_futex.go:127 +0x7a fp=0x7f7d35499090 sp=0x7f7d35499070 pc=0x40db9a runtime.unlockWithRank(…) /usr/local/go/src/runtime/lockrank_off.go:32 runtime.unlock(…) /usr/local/go/src/runtime/lock_futex.go:112 runtime.printunlock() /usr/local/go/src/runtime/print.go:80 +0x3b fp=0x7f7d354990a8 sp=0x7f7d35499090 pc=0x43b41b runtime.throw.func1() /usr/local/go/src/runtime/panic.go:1044 +0x55 fp=0x7f7d354990d8 sp=0x7f7d354990a8 pc=0x439a75 runtime.throw({0x87177d?, 0x7f7d35499130?}) /usr/local/go/src/runtime/panic.go:1043 +0x46 fp=0x7f7d35499108 sp=0x7f7d354990d8 pc=0x4399e6 runtime.unlock2(0x1b?) /usr/local/go/src/runtime/lock_futex.go:127 +0x7a fp=0x7f7d35499128 sp=0x7f7d35499108 pc=0x40db9a runtime.unlockWithRank(…) /usr/local/go/src/runtime/lockrank_off.go:32 runtime.unlock(…) /usr/local/go/src/runtime/lock_futex.go:112 runtime.printunlock() /usr/local/go/src/runtime/print.go:80 +0x3b fp=0x7f7d35499140 sp=0x7f7d35499128 pc=0x43b41b runtime.throw.func1() /usr/local/go/src/runtime/panic.go:1044 +0x55 fp=0x7f7d35499170 sp=0x7f7d35499140 pc=0x439a75 runtime.throw({0x87177d?, 0x7f7d354991c8?}) /usr/local/go/src/runtime/panic.go:1043 +0x46 fp=0x7f7d354991a0 sp=0x7f7d35499170 pc=0x4399e6 runtime.unlock2(0x1b?) /usr/local/go/src/runtime/lock_futex.go:127 +0x7a fp=0x7f7d354991c0 sp=0x7f7d354991a0 pc=0x40db9a runtime.unlockWithRank(…) /usr/local/go/src/runtime/lockrank_off.go:32 runtime.unlock(…) /usr/local/go/src/runtime/lock_futex.go:112 runtime.printunlock() /usr/local/go/src/runtime/print.go:80 +0x3b fp=0x7f7d354991d8 sp=0x7f7d354991c0 pc=0x43b41b runtime.throw.func1() /usr/local/go/src/runtime/panic.go:1044 +0x55 fp=0x7f7d35499208 sp=0x7f7d354991d8 pc=0x439a75 runtime.throw({0x87177d?, 0x7f7d35499260?}) /usr/local/go/src/runtime/panic.go:1043 +0x46 fp=0x7f7d35499238 sp=0x7f7d35499208 pc=0x4399e6 runtime.unlock2(0x1b?) /usr/local/go/src/runtime/lock_futex.go:127 +0x7a fp=0x7f7d35499258 sp=0x7f7d35499238 pc=0x40db9a runtime.unlockWithRank(…) /usr/local/go/src/runtime/lockrank_off.go:32 runtime.unlock(…) /usr/local/go/src/runtime/lock_futex.go:112 runtime.printunlock() /usr/local/go/src/runtime/print.go:80 +0x3b fp=0x7f7d35499270 sp=0x7f7d35499258 pc=0x43b41b runtime.throw.func1() /usr/local/go/src/runtime/panic.go:1044 +0x55 fp=0x7f7d354992a0 sp=0x7f7d35499270 pc=0x439a75 runtime.throw({0x87177d?, 0x7f7d354992f8?}) /usr/local/go/src/runtime/panic.go: goroutine 8 [running]: runtime.throw({0x86b044?, 0xc0000b0f30?}) /usr/local/go/src/runtime/panic.go:1047 +0x5d fp=0xc0000b0ee8 sp=0xc0000b0eb8 pc=0x4399fd runtime.mallocgc(0x78, 0x83a4a0, 0x1) /usr/local/go/src/runtime/malloc.go:913 +0x8ac fp=0xc0000b0f60 sp=0xc0000b0ee8 pc=0x40f70c runtime.newobject(0x136e7fad0?) /usr/local/go/src/runtime/malloc.go:1192 +0x27 fp=0xc0000b0f88 sp=0xc0000b0f60 pc=0x40f847 crypto/sha256.New() /usr/local/go/src/crypto/sha256/sha256.go:166 +0x25 fp=0xc0000b0fb0 sp=0xc0000b0f88 pc=0x53df45 crypto.Hash.New(0x7f6820?) /usr/local/go/src/crypto/crypto.go:131 +0x4a fp=0xc0000b0ff8 sp=0xc0000b0fb0 pc=0x53bb2a github.com/opencontainers/go-digest.Algorithm.Hash({0x867270, 0x6}) /home/ubuntu/containerd/vendor/github.com/opencontainers/go-digest/algorithm.go:135 +0x97 fp=0xc0000b1040 sp=0xc0000b0ff8 pc=0x77a197 github.com/opencontainers/go-digest.Algorithm.Digester(…) /home/ubuntu/containerd/vendor/github.com/opencontainers/go-digest/algorithm.go:112 github.com/containerd/continuity.simpleDigester.Digest({{0x867270?, 0x800da0?}}, {0x90bbe0?, 0xc0000143b0?}) /home/ubuntu/containerd/vendor/github.com/containerd/continuity/digests.go:42 +0x3f fp=0xc0000b10c0 sp=0xc0000b1040 pc=0x781c5f github.com/containerd/continuity.(*simpleDigester).Digest(0x40d45d?, {0x90bbe0?, 0xc0000143b01043 +0x46? fp=}0x7f7d354992d0) sp=0x7f7d354992a0 pc=<autogenerated>0x4399e6: 1runtime.unlock2 +0x45 fp=(0xc0000b10f00x1b sp=?0xc0000b10c0) pc= 0x7862e5/usr/local/go/src/runtime/lock_futex.go :github.com/containerd/continuity.(*context).digest127 +(0x7a0xc0001a2a50 fp=, 0x7f7d354992f0{ sp=0xc0002faa000x7f7d354992d0, pc=0xf0x40db9a} ) runtime.unlockWithRank(…) /home/ubuntu/containerd/vendor/github.com/containerd/continuity/context.go: 634/usr/local/go/src/runtime/lockrank_off.go +:0x18f32 fp= 0xc0000b1170runtime.unlock sp=(…) 0xc0000b10f0 pc=/usr/local/go/src/runtime/lock_futex.go0x78190f: 112github.com/containerd/continuity.(*context).Resource runtime.printunlock(0xc0001a2a50(, ) {0xc0002faa00/usr/local/go/src/runtime/print.go, :0xf80} +, 0x3b{ fp=0x90ef680x7f7d35499308, sp=0xc0000e66800x7f7d354992f0} pc=) 0x43b41b /home/ubuntu/containerd/vendor/github.com/containerd/continuity/context.goruntime.throw.func1:(161)

  • 0x1fc/usr/local/go/src/runtime/panic.go fp=:0xc0000b13c81044 sp= +0xc0000b11700x55 pc= fp=0x77d6bc0x7f7d35499338 sp=github.com/containerd/continuity.BuildManifest.func10x7f7d35499308( pc={0x439a750xc0002faa00 , runtime.throw0xf(}{, 0x87177d{?0x90ef68, , 0x7f7d354993900xc0000e6680?}}, ) { 0x0/usr/local/go/src/runtime/panic.go?:, 10430x0 +?0x46} fp=) 0x7f7d35499368 sp=/home/ubuntu/containerd/vendor/github.com/containerd/continuity/manifest.go0x7f7d35499338: pc=950x4399e6 + 0xc7runtime.unlock2 fp=(0xc0000b14580x1b sp=0xc0000b13c8?) pc=0x783267 /usr/local/go/src/runtime/lock_futex.gogithub.com/containerd/continuity.(*context).Walk.func1:127( +{0x7a0xc00011d740 fp=?0x7f7d35499388, sp=0xc0000e66800x7f7d35499368? pc=}0x40db9a, {runtime.unlockWithRank0x90ef68, (…) 0xc0000e6680}/usr/local/go/src/runtime/lockrank_off.go, :{320xc0000b14e8 ?runtime.unlock, (…) 0x46d747 ?}/usr/local/go/src/runtime/lock_futex.go) :112 /home/ubuntu/containerd/vendor/github.com/containerd/continuity/context.goruntime.printunlock:(596) +0x70 fp=/usr/local/go/src/runtime/print.go0xc0000b14a0: sp=800xc0000b1458 + pc=0x3b0x781470 fp= 0x7f7d354993a0path/filepath.walk sp=0x7f7d35499388( pc={0x43b41b0xc00011d740 , runtime.throw.func10x3f(}) , { 0x90ef68/usr/local/go/src/runtime/panic.go, :0xc0000e66801044} +, 0x550xc000183b90 fp=) /usr/local/go/src/path/filepath/path.go:433 +0x123 fp=0xc0000b1568 sp=0xc0000b14a0 pc=0x500e03 path/filepath.walk({0xc0001dbb80, 0x38}, {0x90ef68, 0xc0000cd380}, 0xc000183b90) /usr/local/go/src/path/filepath/path.go:457 +0x285 fp=0xc0000b1630 sp=0xc0000b1568 pc=0x500f65 path/filepath.walk({0x7f7d354993d00xc00002eb10 sp=, 0x7f7d354993a00x30 pc=}0x439a75, {0x90ef68runtime.throw, (0xc0000cd2b0{}0x87177d, ?0xc000183b90, ) 0x7f7d35499428 ?/usr/local/go/src/path/filepath/path.go}😃 457 +/usr/local/go/src/runtime/panic.go0x285: fp=10430xc0000b16f8 + sp=0xc0000b1630 pc=0x500f65 path/filepath.Walk({0xc00002eb10, 0x300x46 fp=0x7f7d35499400 sp=}0x7f7d354993d0, pc=0xc000183b900x4399e6)

    runtime.unlock2/usr/local/go/src/path/filepath/path.go:520 +(0x1b?) /usr/local/go/src/runtime/lock_futex.go:127 +0x7a fp=0x7f7d35499420 sp=0x7f7d35499400 pc=0x40db9a runtime.unlockWithRank(…) /usr/local/go/src/runtime/lockrank_off.go:32 runtime.unlock(…) /usr/local/go/src/runtime/lock_futex.go:112 runtime.printunlock() /usr/local/go/src/runtime/print.go:80 +0x3b fp=0x7f7d35499438 sp=0x7f7d35499420 pc=0x6c0x43b41b fp= runtime.throw.func10xc0000b1748( sp=) 0xc0000b16f8 pc=/usr/local/go/src/runtime/panic.go0x5010cc: 1044github.com/containerd/continuity/pathdriver.(*pathDriver).Walk +0x55( fp=0x84be400x7f7d35499468, sp={0x7f7d354994380xc00002eb10 pc=0x439a75 runtime.throw({0x87177d?, 0x7f7d354994c0?}) /usr/local/go/src/runtime/panic.go:1043 +0x46 fp=0x7f7d35499498 sp=0x7f7d35499468 pc=0x4399e6 runtime.unlock2(0x1b?) /usr/local/go/src/runtime/lock_futex.go:127 +0x7a fp=0x7f7d354994b8 sp=0x7f7d35499498 pc=0x40db9a runtime.unlockWithRank?, 0x40f847?}, 0x28?) /home/ubuntu/containerd/vendor/github.com/containerd/continuity/pathdriver/path_driver.go:88 +0x27 fp=0xc0000b1770 sp=0xc0000b1748 pc=0x779c47 github.com/containerd/continuity.(*context).Walk(0xc0001a2a50, 0xc000183b60) (…) /home/ubuntu/containerd/vendor/github.com/containerd/continuity/context.go :/usr/local/go/src/runtime/lockrank_off.go594: +320x12b fp=runtime.unlock0xc0000b17b0 sp=(…) 0xc0000b1770 pc=/usr/local/go/src/runtime/lock_futex.go0x7813ab: 112github.com/containerd/continuity.BuildManifest runtime.printunlock({(0x90e248) /usr/local/go/src/runtime/print.go:80 +0x3b fp=0x7f7d354994d0 sp=0x7f7d354994b8 pc=0x43b41b runtime.throw.func1() /usr/local/go/src/runtime/panic.go:1044 +0x55 fp=0x7f7d35499500 sp=0x7f7d354994d0 pc=0x439a75 runtime.throw({0x87177d?, 0x7f7d35499558?}) /usr/local/go/src/runtime/panic.go:1043 +0x46 fp=0x7f7d35499530 sp=0x7f7d35499500 pc=0x4399e6 runtime.unlock2(0x1b?) /usr/local/go/src/runtime/lock_futex.go:127 +0x7a fp=0x7f7d35499550 sp=0x7f7d35499530 pc=0x40db9a runtime.unlockWithRank(…) /usr/local/go/src/runtime/lockrank_off.go:32 runtime.unlock(…) /usr/local/go/src/runtime/lock_futex.go:112 runtime.printunlock() /usr/local/go/src/runtime/print.go:80 +0x3b fp=0x7f7d35499568 sp=0x7f7d35499550 pc=0x43b41b? , runtime.throw.func10xc0001a2a50(}) ) /usr/local/go/src/runtime/panic.go/home/ubuntu/containerd/vendor/github.com/containerd/continuity/manifest.go::104485 + +0x550x111 fp= fp=0x7f7d354995980xc0000b18d8 sp= sp=0x7f7d354995680xc0000b17b0 pc= pc=0x439a750x782ed1

runtime.throwgithub.com/containerd/continuity/fs/fstest.CheckDirectoryEqual({0x87177d?, 0x7f7d354995f0?}) ({ 0xc00011d280/usr/local/go/src/runtime/panic.go, :0x3c1043} +, 0x46{ fp=0xc00002eb100x7f7d354995c8, sp=0x300x7f7d35499598} pc=) 0x4399e6 runtime.unlock2/home/ubuntu/containerd/vendor/github.com/containerd/continuity/fs/fstest/compare.go:(440x1b +?0x1ce) fp= 0xc0000b1a38/usr/local/go/src/runtime/lock_futex.go sp=:0xc0000b18d8127 pc= +0x786dee0x7a fp=github.com/containerd/containerd/snapshots/testsuite.check128LayersMount.func10x7f7d354995e8 sp=0x7f7d354995c8( pc={0x40db9a0x90e1d8 , runtime.unlockWithRank0xc00017b350(…) } , /usr/local/go/src/runtime/lockrank_off.go0xc00012eea0, :{320x90ff20 , runtime.unlock0xc0000690e0(…) } , /usr/local/go/src/runtime/lock_futex.go{:0xc00002ea50112, 0x2bruntime.printunlock}) () /home/ubuntu/containerd/snapshots/testsuite/testsuite.go :/usr/local/go/src/runtime/print.go942: +800x14d4 + fp=0x3b0xc0000b1df0 fp= sp=0x7f7d354996000xc0000b1a38 sp= pc=0x7f7d354995e80x79fbb4 pc= 0x43b41bgithub.com/containerd/containerd/snapshots/testsuite.makeTest.func1 runtime.throw.func1(() 0xc00012eea0) /usr/local/go/src/runtime/panic.go/home/ubuntu/containerd/snapshots/testsuite/testsuite.go:1044: +1170x55 + fp=0x4740x7f7d35499630 fp=0xc0000b1f70 sp=0x7f7d35499600 pc=0x439a75 sp=runtime.throw0xc0000b1df0 pc=(0x794fd4{0x87177d?, 0x7f7d35499688?}) /usr/local/go/src/runtime/panic.go:1043 +0x46 fp=0x7f7d35499660 sp=0x7f7d35499630 pc=0x4399e6 runtime.unlock2(0x1b?) /usr/local/go/src/runtime/lock_futex.go:127 +0x7a fp=0x7f7d35499680 sp=0x7f7d35499660 pc=0x40db9a runtime.unlockWithRank(…) /usr/local/go/src/runtime/lockrank_off.go:32

runtime.unlock(…) testing.tRunner (/usr/local/go/src/runtime/lock_futex.go0xc00012eea0:, 1120xc00017a9c0 ) runtime.printunlock (/usr/local/go/src/testing/testing.go) /usr/local/go/src/runtime/print.go::80 +0x3b fp=0x7f7d35499698 sp=14460x7f7d35499680 pc= +0x43b41b0x10b fp=runtime.throw.func10xc0000b1fc0 sp=(0xc0000b1f70) pc=0x5137cb /usr/local/go/src/runtime/panic.gotesting.(*T).Run.func1:(1044 +) 0x55 fp=/usr/local/go/src/testing/testing.go0x7f7d354996c8: sp=14930x7f7d35499698 + pc=0x2a0x439a75 fp= 0xc0000b1fe0 sp=0xc0000b1fc0goroutine pc=80x51466a [ runningruntime.goexit]: (runtime.systemstack_switch) (/usr/local/go/src/runtime/asm_amd64.s) : 1594/usr/local/go/src/runtime/asm_amd64.s +:0x1459 fp= fp=0xc0000b1fe80xc0000b0e78 sp= sp=0xc0000b1fe00xc0000b0e70 pc= pc=0x46d6010x46b3e0

created by runtime.fatalthrowtesting.(*T).Run( 0xb0ec0 ?) /usr/local/go/src/testing/testing.go :/usr/local/go/src/runtime/panic.go1493: +0x35f1122

The crashes also consistently occur on GitHub Actions CI runners, which rules out hardware as a candidate.

Compiling with Cgo is a necessary condition to reproduce the issue. There is no user Cgo code in the built test binary, only runtime and std.

~/containerd$ go test -tags osusergo,netgo ./snapshots/overlay
~/containerd$ ldd overlay.test
	not a dynamic executable

I could not reproduce the issue on a pure-Go build.

I loaded up some core dumps into gdb and noticed a consistent pattern to the state of the process at the time of the crash.

  • Most threads were blocked on a futex, epollwait or usleep
  • One thread was blocked on a syscall
  • One thread was a freshly clone3()'d child, without having executed a single instruction (pc pointed to the instruction following the syscall, rax = 0 and rsp was set to exactly .stack + .stack_size of the clone_args struct pointed to by rdi.)
  • One thread was getting into trouble while in the process of exiting

I saw no evidence suggesting heap corruption when examining the core dumps. I learned that curg().m.locks was always set to -1 when the fatal runtime.lock call was made. On a hunch I patched one of the few unguarded and unbalanced decrements of an m.locks, runtime.releasem():

--- a/runtime/runtime1.go
+++ b/runtime/runtime1.go
@@ -482,6 +482,9 @@ func acquirem() *m {
 //go:nosplit
 func releasem(mp *m) {
        _g_ := getg()
+       if mp.locks == 0 {
+               crash()
+       }
        mp.locks--
        if mp.locks == 0 && _g_.preempt {
                // restore the preemption request in case we've cleared it in newstack

and was able to get clean stack traces without the recursive panicking.

(gdb) bt
#0  runtime.raise () at /usr/local/go/src/runtime/sys_linux_amd64.s:159
#1  0x0000000000450945 in runtime.dieFromSignal (sig=6)
    at /usr/local/go/src/runtime/signal_unix.go:870
#2  0x000000000045127e in runtime.sigfwdgo (sig=6, info=<optimized out>,
    ctx=<optimized out>, ~r0=<optimized out>)
    at /usr/local/go/src/runtime/signal_unix.go:1086
#3  0x000000000044f5e7 in runtime.sigtrampgo (sig=0, info=0x0,
    ctx=0x46f521 <runtime.raise+33>)
    at /usr/local/go/src/runtime/signal_unix.go:432
#4  0x000000000046f826 in runtime.sigtramp ()
    at /usr/local/go/src/runtime/sys_linux_amd64.s:359
#5  <signal handler called>
#6  runtime.raise () at /usr/local/go/src/runtime/sys_linux_amd64.s:159
#7  0x0000000000450945 in runtime.dieFromSignal (sig=6)
    at /usr/local/go/src/runtime/signal_unix.go:870
#8  0x000000000044b9ac in runtime.crash ()
    at /usr/local/go/src/runtime/signal_unix.go:962
#9  runtime.releasem (mp=0xc000154400)
    at /usr/local/go/src/runtime/runtime1.go:486
#10 0x0000000000440985 in runtime.startm (_p_=0xc000034000, spinning=false)
    at /usr/local/go/src/runtime/proc.go:2339
#11 0x0000000000440cee in runtime.handoffp (_p_=0x0)
    at /usr/local/go/src/runtime/proc.go:2352
#12 0x000000000043f597 in runtime.mexit (osStack=true)
    at /usr/local/go/src/runtime/proc.go:1537
#13 0x000000000043f1e9 in runtime.mstart0 ()
    at /usr/local/go/src/runtime/proc.go:1391
#14 0x000000000046b905 in runtime.mstart ()
    at /usr/local/go/src/runtime/asm_amd64.s:390
#15 0x0000000000401888 in runtime/cgo(.text) ()
#16 0x00007f94950c1920 in ?? ()
#17 0x00007f94bc5eb850 in ?? () at ./nptl/pthread_create.c:321
   from /lib/x86_64-linux-gnu/libc.so.6
#18 0x0000000000000000 in ?? ()
(gdb) info threads
  Id   Target Id                          Frame
* 1    Thread 0x7f9492565640 (LWP 186374) runtime.raise ()
    at /usr/local/go/src/runtime/sys_linux_amd64.s:159
  2    Thread 0x7f94bc554740 (LWP 186359) runtime.futex ()
    at /usr/local/go/src/runtime/sys_linux_amd64.s:560
  3    Thread 0x7f949371f640 (LWP 186363) runtime.futex ()
    at /usr/local/go/src/runtime/sys_linux_amd64.s:560
  4    Thread 0x7f9494721640 (LWP 186389) runtime.epollwait ()
    at /usr/local/go/src/runtime/sys_linux_amd64.s:706
  5    Thread 0x7f94950c2640 (LWP 186360) runtime.usleep ()
    at /usr/local/go/src/runtime/sys_linux_amd64.s:140
  6    Thread 0x7f9492d66640 (LWP 186390) runtime/internal/syscall.Syscall6
    () at /usr/local/go/src/runtime/internal/syscall/asm_linux_amd64.s:36
  7    Thread 0x7f9493f20640 (LWP 186391) clone3 ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:62

Every core dump I examined has the same traceback in the crashing thread. It’s always a pthreads thread in the process of cleaning up and exiting, calling releasem() while its curg().m.locks == -1.

The garbage collector is also seemingly necessary to cause crashes. Setting GOGC=0 produces more reliable crashes, while I have yet to get a crash with GOGC=off. There seems to be some aspect of timing, as well. Turning the test verbosity on or off affects the probability of a crash, and I have yet to get a crash when running a race-enabled build or under strace.

(cc @cpuguy83)

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 21 (14 by maintainers)

Commits related to this issue

Most upvoted comments

Change https://go.dev/cl/443716 mentions this issue: runtime: always keep global reference to mp until mexit completes

The M structure is allocated on the heap just like any typical object. I believe the problem is that the M is “dying” too early, allowing the GC to free it and potentially reallocate this.

M’s all reachable to the GC via runtime.allm. mexit removes the M from allm quite a while before it is done with it. I’m not seeing another way that the M would be reachable (or GC blocked) for the remainder of mexit, so I’m pretty sure this is unsafe.

This isn’t the patch we’d want, but the following diff keeps the Ms alive forever, and I cannot reproduce failures with it applied:

diff --git a/src/runtime/proc.go b/src/runtime/proc.go
index 629f1f8d8f..feb974db8e 100644
--- a/src/runtime/proc.go
+++ b/src/runtime/proc.go
@@ -1578,6 +1578,8 @@ func mexit(osStack bool) {
 
        // Remove m from allm.
        lock(&sched.lock)
+       mp.deadlink = deadm
+       deadm = mp // keepalive
        for pprev := &allm; *pprev != nil; pprev = &(*pprev).alllink {
                if *pprev == mp {
                        *pprev = mp.alllink
diff --git a/src/runtime/runtime2.go b/src/runtime/runtime2.go
index 5b55b55ce1..115960c4e4 100644
--- a/src/runtime/runtime2.go
+++ b/src/runtime/runtime2.go
@@ -557,6 +557,7 @@ type m struct {
        cgoCallers    *cgoCallers   // cgo traceback if crashing in cgo call
        park          note
        alllink       *m // on allm
+       deadlink      *m // on deadm
        schedlink     muintptr
        lockedg       guintptr
        createstack   [32]uintptr // stack that created this thread.
@@ -1124,6 +1125,7 @@ func (w waitReason) isMutexWait() bool {
 
 var (
        allm       *m
+       deadm      *m
        gomaxprocs int32
        ncpu       int32
        forcegc    forcegcstate

I can reproduce this very consistently.

$ git clone https://github.com/containerd/containerd
$ cd containerd
$ git remote add cpuguy83 https://github.com/cpuguy83/containerd
$ git fetch cpuguy83
$ git checkout nix_mount_fork
$ go test -c ./snapshots/overlay
$ sudo taskset -a 1 ./overlay.test -test.root -test.run TestOverlay/no_opt/128LayersMount