go: runtime: unexpected return pc crash on linux-amd64-alpine builder

The revived linux-amd64-alpine builder has flaked twice in its short new lifetime with ‘unexpected return pc’ crashes during the cgo tests.

Here is a repro case using a gomote (note that if you ssh in, you have to set up your environment manually, and in particular you have to put /workdir/go/bin at the front of PATH and have to set GOROOT_BOOTSTRAP=/workdir/go1.4). Not sure why the environment is so messed up on Alpine. gomote run does not have these problems, only gomote ssh.

VM=$(gomote create linux-amd64-alpine)
gomote push $VM
gomote run $VM go/src/make.bash
gomote put -mode 0777 $VM - try.sh <<'EOF'
#!/bin/bash
cd /workdir/go/misc/cgo/test
for i in $(seq 100); do 
    date
    if ! /workdir/go/bin/go test >log 2>&1; then
        cat log
    fi
done
EOF
gomote run $VM try.sh

You may need to repeat the try.sh a few times depending on how flaky the machine is feeling but most runs get at least one failure.

Here are some failures from that script:

runtime: g 3: unexpected return pc for runtime.gcenable.func1 called from 0x0
stack: frame={sp:0xc0000557c8, fp:0xc0000557e0} stack=[0xc000055000,0xc000055800)
0x000000c0000556c8:  0x000000c000055750  0x000000000040d21d <runtime.chansend+0x000000000000055d> 
0x000000c0000556d8:  0x0000000000581220  0x000000c00007e060 
0x000000c0000556e8:  0x00000000005e9f78  0x0000000000000000 
0x000000c0000556f8:  0x0000000000000000  0x0000000000000000 
0x000000c000055708:  0x0000000000000000  0x0000000000000000 
0x000000c000055718:  0x000000c00007e058  0x0000000000000000 
0x000000c000055728:  0x0000000000000000  0x0000000000000000 
0x000000c000055738:  0x0000000000000000  0x0000000000000000 
0x000000c000055748:  0x0000000000000000  0x000000c000055780 
0x000000c000055758:  0x000000000040cc9d <runtime.chansend1+0x000000000000001d>  0x000000c00007e000 
0x000000c000055768:  0x0000000000440bb6 <runtime.gopark+0x00000000000000d6>  0x0000000000000001 
0x000000c000055778:  0x0000000000000000  0x000000c0000557b8 
0x000000c000055788:  0x000000000042ba2e <runtime.bgsweep+0x000000000000008e>  0x0000000000000000 
0x000000c000055798:  0x0000000000000000  0x0000000000000000 
0x000000c0000557a8:  0x0000000000000000  0x000000c00007e000 
0x000000c0000557b8:  0x000000c0000557d0  0x0000000000420706 <runtime.gcenable.func1+0x0000000000000026> 
0x000000c0000557c8: <0x00007f8a890934b6  0x00007f8a61816b64 
0x000000c0000557d8: !0x0000000000000000 >0x0000000000000000 
0x000000c0000557e8:  0x0000000000000000  0x00007f8a890d3600 
0x000000c0000557f8:  0x00007f8a89092acf 
fatal error: unknown caller pc
runtime: g 19: unexpected return pc for runtime.gcenable.func2 called from 0x0
stack: frame={sp:0xc000050fc8, fp:0xc000050fe0} stack=[0xc000050800,0xc000051000)
0x000000c000050ec8:  0x000000000000000e  0x000000c0000061a0 
0x000000c000050ed8:  0x000000c000050f60  0x000000000040d265 <runtime.chansend+0x00000000000005a5> 
0x000000c000050ee8:  0x0000000000000050  0x000000c00009c000 
0x000000c000050ef8:  0x0000000000000000  0x0000010000000000 
0x000000c000050f08:  0x0000000000000003  0x0000000000000030 
0x000000c000050f18:  0x0000000000000000  0x0000000000000050 
0x000000c000050f28:  0x000000c000096058  0x000000c00007e000 
0x000000c000050f38:  0x0000000000000000  0x0000000000000000 
0x000000c000050f48:  0x0000000000440bb6 <runtime.gopark+0x00000000000000d6>  0x000000000040d320 <runtime.chansend.func1+0x0000000000000000> 
0x000000c000050f58:  0x000000c000096000  0x000000c000050f90 
0x000000c000050f68:  0x0000000000429ad3 <runtime.(*scavengerState).park+0x0000000000000053>  0x000000c000096000 
0x000000c000050f78:  0x00000000005e9f78  0x0000000000000001 
0x000000c000050f88:  0x0000000000000000  0x000000c000050fb8 
0x000000c000050f98:  0x000000000042a0a5 <runtime.bgscavenge+0x0000000000000045>  0x00000000006f9960 
0x000000c000050fa8:  0x0000000000000000  0x000000c000096000 
0x000000c000050fb8:  0x000000c000050fd0  0x00000000004206a6 <runtime.gcenable.func2+0x0000000000000026> 
0x000000c000050fc8: <0x00007f47256144b6  0x00007f46fdea3b64 
0x000000c000050fd8: !0x0000000000000000 >0x0000000000000000 
0x000000c000050fe8:  0x0000000000000000  0x00007f4725654600 
0x000000c000050ff8:  0x00007f4725613acf 
fatal error: unknown caller pc

This one did not happen during garbage collection:

runtime: g 20: unexpected return pc for testing.tRunner called from 0x7feeabb0dacf
stack: frame={sp:0xc000051770, fp:0xc0000517c0} stack=[0xc000051000,0xc000051800)
0x000000c000051670:  0x000000012a05f200  0x000000c0000880a0 
0x000000c000051680:  0x000000c000094180  0x000000c0000516f8 
0x000000c000051690:  0x000000c000102b80  0x000000c000102b60 
0x000000c0000516a0:  0x0000000000000000  0x00000000005890c0 
0x000000c0000516b0:  0x00000000006d7d50  0x0000000000000000 
0x000000c0000516c0:  0x0000000000000000  0x0000000000000000 
0x000000c0000516d0:  0x0000000000000000  0x000000c000051730 
0x000000c0000516e0:  0x0000000000454a36 <runtime.sigpanic+0x00000000000002f6>  0x00000000005890c0 
0x000000c0000516f0:  0x00000000006d7d50  0x000000c000051748 
0x000000c000051700:  0x0000000000561ceb <misc/cgo/test.testSetgid+0x00000000000000ab>  0x000000c0001121e0 
0x000000c000051710:  0x000000c000102b60  0x0000000000000001 
0x000000c000051720:  0x00000000006ea660  0x00000000005eb418 
0x000000c000051730:  0x000000c000051760  0x0000000000478bfe <sync.(*RWMutex).Lock+0x000000000000001e> 
0x000000c000051740:  0x0000000000000000  0x000000c000051760 
0x000000c000051750:  0x0000000000526bd9 <misc/cgo/test.TestSetgid+0x0000000000000019>  0x000000c0001029c0 
0x000000c000051760:  0x000000c0000517b0  0x00000000004d6d15 <testing.tRunner+0x0000000000000115> 
0x000000c000051770: <0x0000000000000000  0x0300000000000000 
0x000000c000051780:  0x00000000004d6d80 <testing.tRunner.func2+0x0000000000000000>  0x00007feeabb0e4b6 
0x000000c000051790:  0x00007feeabb4ed8c  0x0000000000000000 
0x000000c0000517a0:  0x0000000000000000  0x0000000000000000 
0x000000c0000517b0:  0x00007feeabb4e600 !0x00007feeabb0dacf 
0x000000c0000517c0: >0x0000000000000000  0x00000000ffffffff 
0x000000c0000517d0:  0x0000000000000000  0x00000000004710a1 <runtime.goexit+0x0000000000000001> 
0x000000c0000517e0:  0x0000000000000000  0x0000000000000000 
0x000000c0000517f0:  0x0000000000000000  0x00007feeabb0e5d2 
fatal error: unknown caller pc

runtime stack:
runtime.throw({0x5ae5a1?, 0x6ea660?})
	/workdir/go/src/runtime/panic.go:1047 +0x5d fp=0x7fee843e3648 sp=0x7fee843e3618 pc=0x43de7d
runtime.gentraceback(0x100000000467aba?, 0xc000100000?, 0xc000102b60?, 0x7fee843e3a18?, 0x0, 0x0, 0x7fffffff, 0x7fee843e3a08, 0x0?, 0x0)
	/workdir/go/src/runtime/traceback.go:258 +0x1cf7 fp=0x7fee843e39b8 sp=0x7fee843e3648 pc=0x4658b7
runtime.addOneOpenDeferFrame.func1()
	/workdir/go/src/runtime/panic.go:645 +0x6b fp=0x7fee843e3a30 sp=0x7fee843e39b8 pc=0x43d00b
runtime.systemstack()
	/workdir/go/src/runtime/asm_amd64.s:492 +0x49 fp=0x7fee843e3a38 sp=0x7fee843e3a30 pc=0x46eee9

goroutine 20 [running]:
runtime.systemstack_switch()
	/workdir/go/src/runtime/asm_amd64.s:459 fp=0xc0000515e8 sp=0xc0000515e0 pc=0x46ee80
runtime.addOneOpenDeferFrame(0xc0000221e0?, 0xc000094180?, 0xc000112180?)
	/workdir/go/src/runtime/panic.go:644 +0x69 fp=0xc000051628 sp=0xc0000515e8 pc=0x43cf49
panic({0x5890c0, 0x6d7d50})
	/workdir/go/src/runtime/panic.go:844 +0x112 fp=0xc0000516e8 sp=0xc000051628 pc=0x43d792
runtime.panicmem(...)
	/workdir/go/src/runtime/panic.go:260
runtime.sigpanic()
	/workdir/go/src/runtime/signal_unix.go:837 +0x2f6 fp=0xc000051740 sp=0xc0000516e8 pc=0x454a36
sync.(*RWMutex).Lock(0x0?)
	/workdir/go/src/sync/rwmutex.go:147 +0x1e fp=0xc000051770 sp=0xc000051740 pc=0x478bfe

Here are the two build dashboard failures:

https://build.golang.org/log/658036e08c7a1d218c33808fdd1d6612b40502d8

runtime: g 2: unexpected return pc for runtime.forcegchelper called from 0x0
stack: frame={sp:0xc000056fb0, fp:0xc000056fe0} stack=[0xc000056800,0xc000057000)
0x000000c000056eb0:  0x0000000000000000  0x0000000000000000 
0x000000c000056ec0:  0x0000000000000000  0x0000000000000000 
0x000000c000056ed0:  0x0000000000000000  0x0000000000000000 
0x000000c000056ee0:  0x0000000000000000  0x0000000000000000 
0x000000c000056ef0:  0x0000000000000000  0x0000000000000000 
0x000000c000056f00:  0x0000000000000000  0x0000000000000000 
0x000000c000056f10:  0x0000000000000000  0x0000000000000000 
0x000000c000056f20:  0x0000000000000000  0x0000000000000000 
0x000000c000056f30:  0x0000000000000000  0x0000000000000000 
0x000000c000056f40:  0x0000000000000000  0x0000000000000000 
0x000000c000056f50:  0x0000000000000000  0x0000000000000000 
0x000000c000056f60:  0x0000000000000000  0x0000000000000000 
0x000000c000056f70:  0x0000000000000000  0x0000000000000000 
0x000000c000056f80:  0x0000000000000000  0x00005637530dbdb6 <runtime.gopark+0x00000000000000d6> 
0x000000c000056f90:  0x0000000000000000  0x0000000000000000 
0x000000c000056fa0:  0x000000c000056fd0  0x00005637530dbc4d <runtime.forcegchelper+0x00000000000000ad> 
0x000000c000056fb0: <0x0000000000000000  0x0000000000000000 
0x000000c000056fc0:  0x0000000000000000  0x00007efee325e4b6 
0x000000c000056fd0:  0x00007efebba04b64 !0x0000000000000000 
0x000000c000056fe0: >0x0000000000000000  0x0000000000000000 
0x000000c000056ff0:  0x00007efee329e600  0x00007efee325dacf 
fatal error: unknown caller pc

and

https://build.golang.org/log/94cf14d78b116487dc76a921baf6ba76480a4c7a

runtime: g 5: unexpected return pc for runtime.sigpanic called from 0x7f52c162dd8c
stack: frame={sp:0xc000058700, fp:0xc000058758} stack=[0xc000058000,0xc000058800)
0x000000c000058600:  0x0000564cf403107b <runtime.write+0x000000000000003b>  0x0000000000000002 
0x000000c000058610:  0x000000c000058648  0x0000564cf40109ce <runtime.recordForPanic+0x000000000000004e> 
0x000000c000058620:  0x0000564cf403107b <runtime.write+0x000000000000003b>  0x0000000000000002 
0x000000c000058630:  0x0000564cf4144017  0x0000000000000001 
0x000000c000058640:  0x0000000000000001  0x000000c000058680 
0x000000c000058650:  0x0000564cf4010cd2 <runtime.gwrite+0x00000000000000f2>  0x0000564cf4144017 
0x000000c000058660:  0x0000000000000001  0x0000000000000001 
0x000000c000058670:  0x000000c0000586e2  0x000000000000000e 
0x000000c000058680:  0x0000564cf4040210 <runtime.systemstack+0x0000000000000030>  0x0000564cf400f3cc <runtime.fatalthrow+0x000000000000006c> 
0x000000c000058690:  0x000000c0000586a0  0x000000c000007ba0 
0x000000c0000586a0:  0x0000564cf400f400 <runtime.fatalthrow.func1+0x0000000000000000>  0x000000c000007ba0 
0x000000c0000586b0:  0x0000564cf400f07f <runtime.throw+0x000000000000005f>  0x000000c0000586d0 
0x000000c0000586c0:  0x000000c0000586f0  0x0000564cf400f07f <runtime.throw+0x000000000000005f> 
0x000000c0000586d0:  0x000000c0000586d8  0x0000564cf400f0a0 <runtime.throw.func1+0x0000000000000000> 
0x000000c0000586e0:  0x0000564cf414445e  0x0000000000000005 
0x000000c0000586f0:  0x000000c000058748  0x0000564cf4025ca5 <runtime.sigpanic+0x00000000000002c5> 
0x000000c000058700: <0x0000564cf414445e  0x000000c0000161e0 
0x000000c000058710:  0x000000c000058728  0x0000000000000001 
0x000000c000058720:  0x00007f52c162dd8c  0x000000c000007ba0 
0x000000c000058730:  0x0000564cf41800e0  0x0000564cf40a7e14 <testing.tRunner+0x0000000000000034> 
0x000000c000058740:  0x0000000000000000  0x00007f52c15ed4b6 
0x000000c000058750: !0x00007f52c162dd8c >0x0000000000000000 
0x000000c000058760:  0x0000000000000000  0x0000000000000000 
0x000000c000058770:  0x00007f52c162d600  0x00007f52c15ecacf 
0x000000c000058780:  0x0000000000000000  0x00000000ffffffff 
0x000000c000058790:  0x0000564cf40a7fa0 <testing.tRunner.func1+0x0000000000000000>  0x000000c000007a00 
0x000000c0000587a0:  0x000000c000058780  0x000000c000058790 
0x000000c0000587b0:  0x000000c0000587d0  0x00007f52c15ed5d2 
0x000000c0000587c0:  0x00007f52c15f0080  0x00007f52c162d600 
0x000000c0000587d0:  0x00000000ffffffff  0x00007f52c15efbbb 
0x000000c0000587e0:  0x0000000000000000  0x00007f52c15efb6d 
0x000000c0000587f0:  0x00007f52c162d604  0x0000000000000000 

Perhaps this is Alpine-specific, or perhaps it is musl-related. The Alpine image may have an old Linux kernel; maybe we should update it.

There are a few other open ‘unexpected return pc’ issues. Maybe they are all stale:

  • #47003 is Go 1.16 on Ubuntu.
  • #35005 is Go 1.13 on Alpine 3.10 (but disappears on Debian and on Alpine 3.9.4).
  • #40401 is Go 1.14.6 on Windows
  • #40469 is Go 1.13.14 on Windows
  • #51707 is Go 1.16.2 on an unspecified system.
  • #43496 is Go 1.15.6 on Debian (Docker golang image).

#35005 is the most interesting one but the repro case is a very large program running under Docker.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 17 (16 by maintainers)

Commits related to this issue

Most upvoted comments

To summarize:

  • Go uses small goroutines stacks, so there is no guarantee that there is enough space on the stack for signal context and frame at all times.
  • To handle this, Go creates a separate signal stack for each thread installed with sigaltstack. All signal handlers must set SA_ONSTACK to use the signal stack and avoid smashing the goroutine stack.
  • To try to cooperate with libc, at startup Go inspects all signal handlers (even ones it doesn’t care to handle), and adds SA_ONSTACK if it is not already set.
  • musl uses signal 34 for the various setxid calls, but does not install the handler at startup. Instead, it is temporarily installed on each call to the setxid functions (in __synccall).
  • As a result, Go never has a chance to add SA_ONSTACK.

I don’t see how we can work around this in Go given that we can’t adjust the signal handler flags, nor does __synccall respect flags from an existing signal handler. We would have to make goroutine stacks much larger, which would be a significant increase in stack allocations.

There are several changes on the musl side that could address this:

  • musl could install the signal 34 handler once at startup so that Go can adjust the flag.
  • Or, __synccall could query for an existing signal handler, and if it has SA_ONSTACK then keep that flag for their handler. In this case, Go would install a dummy signal 34 handler at startup just to expose SA_ONSTACK.
  • Or, even simpler, according to man 2 sigaction’s SA_ONSTACK description: “If an alternate stack is not available, the default stack will be used.” If this is accurate (I haven’t verified), then __synccall could set SA_ONSTACK unconditionally, which would normally make no difference, but would use Go’s sigaltstack when linked with Go.