go: runtime: cgo call to symbol from library loaded dynamically will panic with go 1.21.1 and ld >2.38
What version of Go are you using (go version)?
$ go version go version go1.21.1 linux/amd64
Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (go env)?
go env Output
$ go env GO111MODULE='' GOARCH='amd64' GOBIN='' GOCACHE='/home/braydonk/.cache/go-build' GOENV='/home/braydonk/.config/go/env' GOEXE='' GOEXPERIMENT='' GOFLAGS='' GOHOSTARCH='amd64' GOHOSTOS='linux' GOINSECURE='' GOMODCACHE='/home/braydonk/go/pkg/mod' GONOPROXY='' GONOSUMDB='' GOOS='linux' GOPATH='/home/braydonk/go' GOPRIVATE='' GOPROXY='https://proxy.golang.org,direct' GOROOT='/usr/local/go' GOSUMDB='sum.golang.org' GOTMPDIR='' GOTOOLCHAIN='auto' GOTOOLDIR='/usr/local/go/pkg/tool/linux_amd64' GOVCS='' GOVERSION='go1.21.1' GCCGO='gccgo' GOAMD64='v1' AR='ar' CC='gcc' CXX='g++' CGO_ENABLED='1' GOMOD='/home/braydonk/Git/cgo_dl_repro/go.mod' GOWORK='' CGO_CFLAGS='-O2 -g' CGO_CPPFLAGS='' CGO_CXXFLAGS='-O2 -g' CGO_FFLAGS='-O2 -g' CGO_LDFLAGS='-O2 -g' PKG_CONFIG='pkg-config' GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build1951551231=/tmp/go-build -gno-record-gcc-switches'
What did you do?
I created a minimal reproduction setup at https://github.com/braydonk/cgo_dl_repro
In this scenario, I have a header file that references a single function get42 that I will get from a shared object, which I will load at runtime with dlopen. The ld flags -Wl,--unresolved-symbols=ignore-in-object-files are used.
First, I run make liblib, which will compile the C file in this repo that implements the get42 function and then turn it into a shared object.
Then I run go run .
What did you expect to see?
In go1.20.8, and in go1.21.1 with ld version 2.34, I get the expected result:
braydonk@braydonk:~/Git/cgo_dl_repro$ go run .
get42 address: 0x7fb06a1fe0f9
42
What did you see instead?
In go1.21 with an ld version > 2.38 I get a panic:
braydonk@braydonk:~/Git/cgo_dl_repro$ go run .
get42 address: 0x7f2c601c00f9
SIGSEGV: segmentation violation
PC=0x0 m=0 sigcode=1
signal arrived during cgo execution
goroutine 1 [syscall]:
runtime.cgocall(0x48a800, 0xc000065eb8)
/usr/local/go/src/runtime/cgocall.go:157 +0x4b fp=0xc000065e90 sp=0xc000065e58 pc=0x40590b
main._Cfunc_get42()
_cgo_gotypes.go:139 +0x47 fp=0xc000065eb8 sp=0xc000065e90 pc=0x48a007
main.main()
/home/braydonk/Git/cgo_dl_repro/main.go:24 +0xf9 fp=0xc000065f40 sp=0xc000065eb8 pc=0x48a6b9
runtime.main()
/usr/local/go/src/runtime/proc.go:267 +0x2bb fp=0xc000065fe0 sp=0xc000065f40 pc=0x435e9b
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000065fe8 sp=0xc000065fe0 pc=0x45f901
goroutine 2 [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000050fa8 sp=0xc000050f88 pc=0x4362ee
runtime.goparkunlock(...)
/usr/local/go/src/runtime/proc.go:404
runtime.forcegchelper()
/usr/local/go/src/runtime/proc.go:322 +0xb3 fp=0xc000050fe0 sp=0xc000050fa8 pc=0x436173
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000050fe8 sp=0xc000050fe0 pc=0x45f901
created by runtime.init.6 in goroutine 1
/usr/local/go/src/runtime/proc.go:310 +0x1a
goroutine 3 [GC sweep wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000051778 sp=0xc000051758 pc=0x4362ee
runtime.goparkunlock(...)
/usr/local/go/src/runtime/proc.go:404
runtime.bgsweep(0x0?)
/usr/local/go/src/runtime/mgcsweep.go:280 +0x94 fp=0xc0000517c8 sp=0xc000051778 pc=0x422c14
runtime.gcenable.func1()
/usr/local/go/src/runtime/mgc.go:200 +0x25 fp=0xc0000517e0 sp=0xc0000517c8 pc=0x417fa5
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000517e8 sp=0xc0000517e0 pc=0x45f901
created by runtime.gcenable in goroutine 1
/usr/local/go/src/runtime/mgc.go:200 +0x66
goroutine 4 [GC scavenge wait]:
runtime.gopark(0xc00007a000?, 0x4c5128?, 0x1?, 0x0?, 0xc0000071e0?)
/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000051f70 sp=0xc000051f50 pc=0x4362ee
runtime.goparkunlock(...)
/usr/local/go/src/runtime/proc.go:404
runtime.(*scavengerState).park(0x53bfe0)
/usr/local/go/src/runtime/mgcscavenge.go:425 +0x49 fp=0xc000051fa0 sp=0xc000051f70 pc=0x4204a9
runtime.bgscavenge(0x0?)
/usr/local/go/src/runtime/mgcscavenge.go:653 +0x3c fp=0xc000051fc8 sp=0xc000051fa0 pc=0x420a3c
runtime.gcenable.func2()
/usr/local/go/src/runtime/mgc.go:201 +0x25 fp=0xc000051fe0 sp=0xc000051fc8 pc=0x417f45
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000051fe8 sp=0xc000051fe0 pc=0x45f901
created by runtime.gcenable in goroutine 1
/usr/local/go/src/runtime/mgc.go:201 +0xa5
goroutine 5 [finalizer wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000052628 sp=0xc000052608 pc=0x4362ee
runtime.runfinq()
/usr/local/go/src/runtime/mfinal.go:193 +0x107 fp=0xc0000527e0 sp=0xc000052628 pc=0x417027
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000527e8 sp=0xc0000527e0 pc=0x45f901
created by runtime.createfing in goroutine 1
/usr/local/go/src/runtime/mfinal.go:163 +0x3d
rax 0x0
rbx 0xc000065eb8
rcx 0xc000065eb8
rdx 0xc000065e48
rdi 0xc000065eb8
rsi 0x53c080
rbp 0xc000065e48
rsp 0x7ffd04bb7088
r8 0x53c460
r9 0x0
r10 0x1
r11 0x206
r12 0xc000066000
r13 0x53c460
r14 0xc0000061a0
r15 0x8
rip 0x0
rflags 0x10246
cs 0x33
fs 0x0
gs 0x0
exit status 2
Additional Info
This seems to be a result of how CGO handles --unresolved-symbols=ignore-in-object-files. The unresolved symbol results in SIGSEGV because the address of the symbols is 0x0. In go1.20.8 when I completely eschew the dlopen step and just try to call C.get42() without loading anything, I get an unresolved symbol lookup error:
braydonk@braydonk:~/Git/cgo_dl_repro$ go run .
/tmp/go-build1699640599/b001/exe/cgo_dl_repro: symbol lookup error: /tmp/go-build1699640599/b001/exe/cgo_dl_repro: undefined symbol: get42
exit status 127
However in go1.21.1, I get a panic identical to calling it after loading the library.
Different ld versions
My setup for testing the different ld versions was actually by changing distros entirely. I have my personal machine which is on a Rolling Debian Testing distro, and VMs on Debian Bullseye (11), Ubuntu Jammy (22.04), and Ubuntu Focal (20.04). The panic in go1.21.1 occurs on every OS expect Ubuntu Focal, and the only difference I could think of was the lower ld version, which is why I have called that out, BUT technically there could be some other secret difference that is causing this which I missed.
Why the strange setup?
This setup case may seem very oddly specific. I am mirroring the setup used by NVIDIA’s Go NVML bindings; we discovered this error through our usage of that library. See https://github.com/NVIDIA/go-nvml/issues/36, particularly you’ll want to scroll down to the newest comments which talk about how this specific breakage happened after upgrading to go1.21.1.
About this issue
- Original URL
- State: closed
- Created 9 months ago
- Comments: 21 (4 by maintainers)
Thanks for digging into this.
Good detective work @braydonk . Yeah in retrospect the export dynamic change would seem to make sense given what you described.