go: runtime: "invalid pc-encoded table" throw caused by bad cgo traceback

When executing cgo code, the signal handler will call cgoTraceback via x_cgo_callers in order to call a traceback from the executing cgo code. This calls the C traceback function provided by the application from runtime.SetCgoTraceback.

The frames returned from cgoTraceback are placed on the top of the recorded stack, followed by a Go runtime-provided trace of the preceding Go callers [1].

Later (in the case of CPU profiling), Frames.Next will call cgoSymbolizer to symbolize C frames and use funcInfo to symbolize Go frames.

… at least, that is how it is supposed to work. In practice, there are no guarantees on what cgoTraceback returns. Though it should only return non-Go PCs, there is nothing preventing it from returning a Go PC.

Generally, that would work OK (i.e., not crash, though the actual stack trace may not make sense): Frames.Next will simply follow the Go path and symbolize the PC as normal.

However, if this PC fell in the alignment region between functions (filled with 0xcc, int 3 on amd64), then:

  1. findfunc will find a funcInfo for this PC (the preceding function), as funcInfos cover the entire range from the start of one function to the start of the next, including the alignment region.
  2. If this funcInfo has inline data, we’ll do a PCDATA lookup for our PC. PCDATA only cover the actually function range, so that will cause a throw like this:
$ ./cgo_traceback
runtime: invalid pc-encoded table f=internal/cpu.doinit pc=0x402f65 targetpc=0x402f6d tab=[0/0]0x0
        value=-1 until pc=0x402d3e
        value=0 until pc=0x402d45
        value=-1 until pc=0x402d4b
        value=1 until pc=0x402d52
        value=-1 until pc=0x402d5a
        value=0 until pc=0x402d60
        value=-1 until pc=0x402d68
        value=2 until pc=0x402d6f
        value=-1 until pc=0x402db4
        value=3 until pc=0x402dbb
        value=-1 until pc=0x402dff
        value=4 until pc=0x402e03
        value=-1 until pc=0x402e05
        value=5 until pc=0x402e0c
        value=-1 until pc=0x402e0e
        value=2 until pc=0x402e12
        value=-1 until pc=0x402f65
fatal error: invalid runtime symbol table

goroutine 6 [running]:
runtime.throw(0x4ebfa5, 0x1c)
        /usr/lib/google-golang/src/runtime/panic.go:1123 +0x72 fp=0xc00005ea18 sp=0xc00005e9e8 pc=0x436952
runtime.pcvalue(0x5572b8, 0x587a40, 0x1f3, 0x402f6d, 0x0, 0x50df01, 0x50df6b, 0x13)
        /usr/lib/google-golang/src/runtime/symtab.go:827 +0x5ae fp=0xc00005ead8 sp=0xc00005ea18 pc=0x45434e
runtime.pcdatavalue(0x5572b8, 0x587a40, 0x2, 0x402f6d, 0x0, 0x7fb05d417500)
        /usr/lib/google-golang/src/runtime/symtab.go:936 +0x7b fp=0xc00005eb28 sp=0xc00005ead8 pc=0x454c3b
runtime.(*Frames).Next(0xc0001088f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /usr/lib/google-golang/src/runtime/symtab.go:105 +0x495 fp=0xc00005ec38 sp=0xc00005eb28 pc=0x452915
runtime/pprof.allFrames(0x402f6e, 0xc00009c030, 0x402f6e, 0x5bfd00, 0x0)
        /usr/lib/google-golang/src/runtime/pprof/proto.go:285 +0xfc fp=0xc00005ed98 sp=0xc00005ec38 pc=0x4b4cfc
runtime/pprof.(*profileBuilder).appendLocsForStack(0xc000098000, 0x0, 0x0, 0x0, 0xc000126000, 0x6, 0x6, 0xe0323c7f6117d1bb, 0xc00007e180, 0xc00005eee8)
        /usr/lib/google-golang/src/runtime/pprof/proto.go:482 +0x2e6 fp=0xc00005ee48 sp=0xc00005ed98 pc=0x4b6086
runtime/pprof.(*profileBuilder).build(0xc000098000)
        /usr/lib/google-golang/src/runtime/pprof/proto.go:433 +0x151 fp=0xc00005ef58 sp=0xc00005ee48 pc=0x4b5931
runtime/pprof.profileWriter(0x50b1f8, 0xc000010028)
        /usr/lib/google-golang/src/runtime/pprof/pprof.go:813 +0x105 fp=0xc00005efd0 sp=0xc00005ef58 pc=0x4b21a5
runtime.goexit()
        /usr/lib/google-golang/src/runtime/asm_amd64.s:1371 +0x1 fp=0xc00005efd8 sp=0xc00005efd0 pc=0x46a021
created by runtime/pprof.StartCPUProfile
        /usr/lib/google-golang/src/runtime/pprof/pprof.go:784 +0x145

goroutine 1 [chan receive]:
runtime/pprof.StopCPUProfile()
        /usr/lib/google-golang/src/runtime/pprof/pprof.go:829 +0xc5
main.main()
        /usr/local/google/home/mpratt/Downloads/cgo_traceback/main.go:31 +0xff

Source for this repro in https://github.com/prattmic/scratch/tree/main/cgo_traceback_issue44971.

The obvious question here is: why would cgoTraceback include such a bogus PC? The answer depends on the traceback engine in use by cgoTraceback.

For example, https://github.com/ianlancetaylor/cgosymbolizer uses libgcc’s unwind functionality, which uses DWARF information to walk the stack. I’ve not found a way to trick that into providing bogus results (rather than stopping early) short of using flat-out incorrect .cfa directives in assembly.

On the other hand, simpler traceback engines like Abseil’s https://github.com/abseil/abseil-cpp/blob/master/absl/debugging/stacktrace.h perform a more naive (but faster) walk simply following RBP frame pointers. They have some heuristics to try to avoid walking off the deep end, but fundamentally can’t fully protect against code that has clobbered the frame pointer. This bug was first encountered with an Abseil-based traceback of assembly code that clobbered RBP to use as a simple argument register, thus resulting in garbage frames frames that would occasionally point into the alignment region between Go functions.

I don’t think we can reasonably require cgoTraceback to guarantee it always provides valid frames, thus I see a few options here:

  1. Change Frames.Next to perform a non-strict PCDATA lookup. This is the simplest way to prevent crashes and I think the best approach, but it will make it a bit harder to notice bugs in the native runtime tracebacks.
  2. Change either funcInfo or PCDATA to make them consistent: either funcInfo does not cover alignment regions, or PCDATA does. I’m not sure how difficult these would be, but this would also potentially mask bugs in Go’s tracebacks.
  3. In Frames, track which callers came from cgoTraceback, and which came from Go’s traceback. Only the latter would even attempt to do a funcInfo / PCDATA lookup.

This affects tip, 1.16, and I believe earlier to at least 1.14, though I haven’t tested earlier than 1.16 yet.

[1] A similar principle applies to C-to-Go callbacks, except that cgoTraceback is called in the middle of the Go traceback generation.

cc @cherrymui @ianlancetaylor @mknyszek @hyangah

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 16 (8 by maintainers)

Commits related to this issue

Most upvoted comments

This is indeed the same situation, I’ve sent http://golang.org/cl/309109 to fix.

I believe all of the remaining uses of _PCDATA_InlTreeIndex are OK. One use is in isAsyncSafePoint, which gets the PC directly from signal context, so it must be a valid PC. The others are all in gentraceback, which I believe is always used with Go PCs/SPs only.

@gopherbot please open backport for 1.16 and 1.15. The only workaround is to change the C traceback engine, which isn’t usually feasible. This is a follow-up CL for a previously missed case.