go: runtime: fatal error: found bad pointer in Go heap

What version of Go are you using (go version)?

$ go version
go version go1.11.2 darwin/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/force/Library/Caches/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/force/.golang"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/Cellar/go/1.11.2/libexec"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.11.2/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/n5/c1hcw05942s2pfbdf3pgjnh80000gq/T/go-build447375688=/tmp/go-build -gno-record-gcc-switches -fno-common"

We use GOOS=linux GOARCH=amd64 when build. We ran the program on the other machine.

Running machine info
$ cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 9 (stretch)"
NAME="Debian GNU/Linux"
VERSION_ID="9"
VERSION="9 (stretch)"
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

$ uname -a Linux (hostname) 4.9.0-5-amd64 #1 SMP Debian 4.9.65-3+deb9u2 (2018-01-04) x86_64 GNU/Linux

What did you do?

We’re building online game server with Go. We faced a random crash like this.

crash report
runtime: pointer 0xc009a038ca to unused region of span span.base()=0xc0035c2000 span.limit=0xc0035c4000 span.state=1
fatal error: found bad pointer in Go heap (incorrect use of unsafe or cgo?)
runtime stack:
runtime.throw(0xc046cf, 0x3e)
        /usr/local/Cellar/go/1.11.2/libexec/src/runtime/panic.go:608 +0x72 fp=0xc0000abf00 sp=0xc0000abed0 pc=0x42bf02
runtime.findObject(0xc009a038ca, 0x0, 0x0, 0xc0024b5380, 0x7f67d219edc0, 0xd)
        /usr/local/Cellar/go/1.11.2/libexec/src/runtime/mbitmap.go:399 +0x3b6 fp=0xc0000abf50 sp=0xc0000abf00 pc=0x413bf6
runtime.wbBufFlush1(0xc000086a00)
        /usr/local/Cellar/go/1.11.2/libexec/src/runtime/mwbbuf.go:252 +0xd1 fp=0xc0000abfb8 sp=0xc0000abf50 pc=0x428121
runtime.wbBufFlush.func1()
        /usr/local/Cellar/go/1.11.2/libexec/src/runtime/mwbbuf.go:195 +0x3a fp=0xc0000abfd0 sp=0xc0000abfb8 pc=0x457e1a
runtime.systemstack(0x0)
        /usr/local/Cellar/go/1.11.2/libexec/src/runtime/asm_amd64.s:351 +0x66 fp=0xc0000abfd8 sp=0xc0000abfd0 pc=0x459af6
runtime.mstart()
        /usr/local/Cellar/go/1.11.2/libexec/src/runtime/proc.go:1229 fp=0xc0000abfe0 sp=0xc0000abfd8 pc=0x4307d0
goroutine 143 [running]:
runtime.systemstack_switch()
        /usr/local/Cellar/go/1.11.2/libexec/src/runtime/asm_amd64.s:311 fp=0xc0053e7d38 sp=0xc0053e7d30 pc=0x459a80
runtime.wbBufFlush(0x0, 0x0)
        /usr/local/Cellar/go/1.11.2/libexec/src/runtime/mwbbuf.go:194 +0x4e fp=0xc0053e7d58 sp=0xc0053e7d38 pc=0x427fbe
runtime.typeBitsBulkBarrier(0xaf3640, 0xc006019b80, 0xc0053e7f28, 0x10)
        /usr/local/Cellar/go/1.11.2/libexec/src/runtime/mbitmap.go:737 +0x111 fp=0xc0053e7dc0 sp=0xc0053e7d58 pc=0x4145c1
runtime.sendDirect(0xaf3640, 0xc00298c600, 0xc0053e7f28)
        /usr/local/Cellar/go/1.11.2/libexec/src/runtime/chan.go:312 +0x50 fp=0xc0053e7df8 sp=0xc0053e7dc0 pc=0x4053d0
runtime.send(0xc0023b4f60, 0xc00298c600, 0xc0053e7f28, 0xc0053e7e88, 0x3)
        /usr/local/Cellar/go/1.11.2/libexec/src/runtime/chan.go:283 +0xde fp=0xc0053e7e28 sp=0xc0053e7df8 pc=0x40533e
runtime.chansend(0xc0023b4f60, 0xc0053e7f28, 0x1, 0x755280, 0x1)
        /usr/local/Cellar/go/1.11.2/libexec/src/runtime/chan.go:191 +0x4df fp=0xc0053e7ea8 sp=0xc0053e7e28 pc=0x40514f
runtime.chansend1(0xc0023b4f60, 0xc0053e7f28)
        /usr/local/Cellar/go/1.11.2/libexec/src/runtime/chan.go:125 +0x35 fp=0xc0053e7ee0 sp=0xc0053e7ea8 pc=0x404c65
(snip)

Other info:

  • We don’t use CGO.
  • Crash happens even with -race flag, without any race report.
  • We don’t use many “unsafe” libraries. This is our glide.lock. Some packages including go-sqlite3 in the glide.lock are not used in the game server.
  • span.state is always 1.
  • Stacktrace is always runtime.wbBufFlush()runtime.findObject(). But caller of wbBufFlush() is various.
  • Crash happens more often when GOMAXPROCS=1. It happens less than 30 min.
  • Crash happens with Go 1.11.2, 1.11.3, 1.11.4, and 1.12beta1.
  • Crash doesn’t happen when GOMAXPROCS=1 GODEBUG=invalidptr=0, more than 6 hours (false positive?)
  • Crash doesn’t happen with Go 1.10.5 and GOMAXPROCS=1 more than 9 hours. (Go 1.11 regression?)
  • No crash with GODEBUG=gcstoptheworld=1

I don’t think this issue is same to #26243 because stack trace and environment are different.

Can we do anything to investigate this issue?

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 30 (26 by maintainers)

Commits related to this issue

Most upvoted comments

@aclements Would you take a look? I’m not sure about Go’s GC implementation.

In case of Keep.func1, stkmap.nbit is 0. But stkmap.bytedata[0] is 0xf. So I think we should test stkmap.nbit>0 before calling bulkBarrierBitmap()

diff --git a/src/runtime/proc.go b/src/runtime/proc.go
index bdf73e0412..67467b2ee5 100644
--- a/src/runtime/proc.go
+++ b/src/runtime/proc.go
@@ -3303,9 +3303,11 @@ func newproc1(fn *funcval, argp *uint8, narg int32, callergp *g, callerpc uintpt
                if writeBarrier.needed && !_g_.m.curg.gcscandone {
                        f := findfunc(fn.fn)
                        stkmap := (*stackmap)(funcdata(f, _FUNCDATA_ArgsPointerMaps))
-                       // We're in the prologue, so it's always stack map index 0.
-                       bv := stackmapdata(stkmap, 0)
-                       bulkBarrierBitmap(spArg, spArg, uintptr(narg), 0, bv.bytedata)
+                       if stkmap.nbit > 0 {
+                               // We're in the prologue, so it's always stack map index 0.
+                               bv := stackmapdata(stkmap, 0)
+                               bulkBarrierBitmap(spArg, spArg, uintptr(narg), 0, bv.bytedata)
+                       }
                }
        }