go: runtime: "invalid pc-encoded table" throw caused by bad cgo traceback
When executing cgo code, the signal handler will call cgoTraceback
via x_cgo_callers
in order to call a traceback from the executing cgo code. This calls the C traceback function provided by the application from runtime.SetCgoTraceback
.
The frames returned from cgoTraceback
are placed on the top of the recorded stack, followed by a Go runtime-provided trace of the preceding Go callers [1].
Later (in the case of CPU profiling), Frames.Next
will call cgoSymbolizer
to symbolize C frames and use funcInfo
to symbolize Go frames.
… at least, that is how it is supposed to work. In practice, there are no guarantees on what cgoTraceback
returns. Though it should only return non-Go PCs, there is nothing preventing it from returning a Go PC.
Generally, that would work OK (i.e., not crash, though the actual stack trace may not make sense): Frames.Next
will simply follow the Go path and symbolize the PC as normal.
However, if this PC fell in the alignment region between functions (filled with 0xcc
, int 3
on amd64), then:
findfunc
will find afuncInfo
for this PC (the preceding function), asfuncInfo
s cover the entire range from the start of one function to the start of the next, including the alignment region.- If this
funcInfo
has inline data, we’ll do a PCDATA lookup for our PC. PCDATA only cover the actually function range, so that will cause a throw like this:
$ ./cgo_traceback
runtime: invalid pc-encoded table f=internal/cpu.doinit pc=0x402f65 targetpc=0x402f6d tab=[0/0]0x0
value=-1 until pc=0x402d3e
value=0 until pc=0x402d45
value=-1 until pc=0x402d4b
value=1 until pc=0x402d52
value=-1 until pc=0x402d5a
value=0 until pc=0x402d60
value=-1 until pc=0x402d68
value=2 until pc=0x402d6f
value=-1 until pc=0x402db4
value=3 until pc=0x402dbb
value=-1 until pc=0x402dff
value=4 until pc=0x402e03
value=-1 until pc=0x402e05
value=5 until pc=0x402e0c
value=-1 until pc=0x402e0e
value=2 until pc=0x402e12
value=-1 until pc=0x402f65
fatal error: invalid runtime symbol table
goroutine 6 [running]:
runtime.throw(0x4ebfa5, 0x1c)
/usr/lib/google-golang/src/runtime/panic.go:1123 +0x72 fp=0xc00005ea18 sp=0xc00005e9e8 pc=0x436952
runtime.pcvalue(0x5572b8, 0x587a40, 0x1f3, 0x402f6d, 0x0, 0x50df01, 0x50df6b, 0x13)
/usr/lib/google-golang/src/runtime/symtab.go:827 +0x5ae fp=0xc00005ead8 sp=0xc00005ea18 pc=0x45434e
runtime.pcdatavalue(0x5572b8, 0x587a40, 0x2, 0x402f6d, 0x0, 0x7fb05d417500)
/usr/lib/google-golang/src/runtime/symtab.go:936 +0x7b fp=0xc00005eb28 sp=0xc00005ead8 pc=0x454c3b
runtime.(*Frames).Next(0xc0001088f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/usr/lib/google-golang/src/runtime/symtab.go:105 +0x495 fp=0xc00005ec38 sp=0xc00005eb28 pc=0x452915
runtime/pprof.allFrames(0x402f6e, 0xc00009c030, 0x402f6e, 0x5bfd00, 0x0)
/usr/lib/google-golang/src/runtime/pprof/proto.go:285 +0xfc fp=0xc00005ed98 sp=0xc00005ec38 pc=0x4b4cfc
runtime/pprof.(*profileBuilder).appendLocsForStack(0xc000098000, 0x0, 0x0, 0x0, 0xc000126000, 0x6, 0x6, 0xe0323c7f6117d1bb, 0xc00007e180, 0xc00005eee8)
/usr/lib/google-golang/src/runtime/pprof/proto.go:482 +0x2e6 fp=0xc00005ee48 sp=0xc00005ed98 pc=0x4b6086
runtime/pprof.(*profileBuilder).build(0xc000098000)
/usr/lib/google-golang/src/runtime/pprof/proto.go:433 +0x151 fp=0xc00005ef58 sp=0xc00005ee48 pc=0x4b5931
runtime/pprof.profileWriter(0x50b1f8, 0xc000010028)
/usr/lib/google-golang/src/runtime/pprof/pprof.go:813 +0x105 fp=0xc00005efd0 sp=0xc00005ef58 pc=0x4b21a5
runtime.goexit()
/usr/lib/google-golang/src/runtime/asm_amd64.s:1371 +0x1 fp=0xc00005efd8 sp=0xc00005efd0 pc=0x46a021
created by runtime/pprof.StartCPUProfile
/usr/lib/google-golang/src/runtime/pprof/pprof.go:784 +0x145
goroutine 1 [chan receive]:
runtime/pprof.StopCPUProfile()
/usr/lib/google-golang/src/runtime/pprof/pprof.go:829 +0xc5
main.main()
/usr/local/google/home/mpratt/Downloads/cgo_traceback/main.go:31 +0xff
Source for this repro in https://github.com/prattmic/scratch/tree/main/cgo_traceback_issue44971.
The obvious question here is: why would cgoTraceback
include such a bogus PC? The answer depends on the traceback engine in use by cgoTraceback
.
For example, https://github.com/ianlancetaylor/cgosymbolizer uses libgcc’s unwind functionality, which uses DWARF information to walk the stack. I’ve not found a way to trick that into providing bogus results (rather than stopping early) short of using flat-out incorrect .cfa directives in assembly.
On the other hand, simpler traceback engines like Abseil’s https://github.com/abseil/abseil-cpp/blob/master/absl/debugging/stacktrace.h perform a more naive (but faster) walk simply following RBP frame pointers. They have some heuristics to try to avoid walking off the deep end, but fundamentally can’t fully protect against code that has clobbered the frame pointer. This bug was first encountered with an Abseil-based traceback of assembly code that clobbered RBP to use as a simple argument register, thus resulting in garbage frames frames that would occasionally point into the alignment region between Go functions.
I don’t think we can reasonably require cgoTraceback
to guarantee it always provides valid frames, thus I see a few options here:
- Change
Frames.Next
to perform a non-strict PCDATA lookup. This is the simplest way to prevent crashes and I think the best approach, but it will make it a bit harder to notice bugs in the native runtime tracebacks. - Change either
funcInfo
or PCDATA to make them consistent: eitherfuncInfo
does not cover alignment regions, or PCDATA does. I’m not sure how difficult these would be, but this would also potentially mask bugs in Go’s tracebacks. - In
Frames
, track which callers came fromcgoTraceback
, and which came from Go’s traceback. Only the latter would even attempt to do afuncInfo
/ PCDATA lookup.
This affects tip, 1.16, and I believe earlier to at least 1.14, though I haven’t tested earlier than 1.16 yet.
[1] A similar principle applies to C-to-Go callbacks, except that cgoTraceback
is called in the middle of the Go traceback generation.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 16 (8 by maintainers)
Commits related to this issue
- [release-branch.go1.15] runtime: non-strict InlTreeIndex lookup in Frames.Next When using cgo, some of the frames can be provided by cgoTraceback, a cgo-provided function to generate C tracebacks. Un... — committed to golang/go by prattmic 3 years ago
- [release-branch.go1.16] runtime: non-strict InlTreeIndex lookup in Frames.Next When using cgo, some of the frames can be provided by cgoTraceback, a cgo-provided function to generate C tracebacks. Un... — committed to golang/go by prattmic 3 years ago
- [release-branch.go1.16] runtime: non-strict InlTreeIndex lookup in expandFinalInlineFrame This is a follow-up to golang.org/cl/301369, which made the same change in Frames.Next. The same logic applie... — committed to golang/go by prattmic 3 years ago
- [release-branch.go1.15] runtime: non-strict InlTreeIndex lookup in expandFinalInlineFrame This is a follow-up to golang.org/cl/301369, which made the same change in Frames.Next. The same logic applie... — committed to golang/go by prattmic 3 years ago
This is indeed the same situation, I’ve sent http://golang.org/cl/309109 to fix.
I believe all of the remaining uses of
_PCDATA_InlTreeIndex
are OK. One use is inisAsyncSafePoint
, which gets the PC directly from signal context, so it must be a valid PC. The others are all ingentraceback
, which I believe is always used with Go PCs/SPs only.@gopherbot please open backport for 1.16 and 1.15. The only workaround is to change the C traceback engine, which isn’t usually feasible. This is a follow-up CL for a previously missed case.