go: cmd/link: `go tool dist test testshared` failed if linked with lld or mold
What version of Go are you using (go version
)?
$ go version go version devel go1.17-962d5c997a Fri Jun 4 01:31:23 2021 +0000 linux/amd64
What operating system and processor architecture are you using (go env
)?
go env
Output
$ go env GO111MODULE="" GOARCH="amd64" GOBIN="" GOCACHE="/home/ruiu/.cache/go-build" GOENV="/home/ruiu/.config/go/env" GOEXE="" GOFLAGS="" GOHOSTARCH="amd64" GOHOSTOS="linux" GOINSECURE="" GOMODCACHE="/home/ruiu/go/pkg/mod" GONOPROXY="" GONOSUMDB="" GOOS="linux" GOPATH="/home/ruiu/go" GOPRIVATE="" GOPROXY="https://proxy.golang.org,direct" GOROOT="/home/ruiu/golang" GOSUMDB="sum.golang.org" GOTMPDIR="" GOTOOLDIR="/home/ruiu/golang/pkg/tool/linux_amd64" GOVCS="" GOVERSION="devel go1.17-962d5c997a Fri Jun 4 01:31:23 2021 +0000" GCCGO="gccgo" AR="ar" CC="gcc" CXX="g++" CGO_ENABLED="1" GOMOD="/home/ruiu/golang/src/go.mod" CGO_CFLAGS="-g -O2" CGO_CPPFLAGS="" CGO_CXXFLAGS="-g -O2" CGO_FFLAGS="-g -O2" CGO_LDFLAGS="-g -O2" PKG_CONFIG="pkg-config" GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build237113391=/tmp/go-build -gno-record-gcc-switches
What did you do?
I tried to build Go with my own linker, mold (https://github.com/rui314/mold), and noticed that a CGO-related test fails only when linked with mold. The same test fails with lld. So the test seems to pass only when you are using GNU ld or GNU gold.
Specifically, this is the exact command that I can reproduce the issue on my Ubuntu 20.04 machine.
$ git clone git@github.com:golang/go.git golang $ cd golang/src $ ./make.bash $ sudo ln -sf /usr/bin/ld.lld-11 /usr/bin/ld $ ../bin/go tool dist test testshared
If I do not substitute the default linker with lld using sudo ln
, the last test command succeeds. Before running the above command, please install LLVM lld 11 by apt-get install lld-11
.
To restore the original ld, run (cd /usr/bin; sudo ln -sf x86_64-linux-gnu-ld ld)
.
What did you expect to see?
The test succeeds
What did you see instead?
The test fails with the following error message.
--- FAIL: TestGCData (0.61s) shared_test.go:50: executing ./main (running gcdata/main) failed exit status 2: x[4] == -2401053088876216593, want 12345 panic: FAIL goroutine 1 [running]: panic({0x7ff705a2d180, 0x556bea3cb938}) /home/ruiu/golang/src/runtime/panic.go:1147 +0x3d3 fp=0xc0001b1f00 sp=0xc0001b1e40 pc=0x7ff7059b11f3 main.main() /tmp/shared_test2784724792/gopath/src/testshared/gcdata/main/main.go:34 +0x14c fp=0xc0001b1f80 sp=0xc0001b1f00 pc=0x556bea3b6bec runtime.main() /home/ruiu/golang/src/runtime/proc.go:255 +0x282 fp=0xc0001b1fe0 sp=0xc0001b1f80 pc=0x7ff7059b4d42 runtime.goexit() /home/ruiu/golang/src/runtime/asm_amd64.s:1581 +0x1 fp=0xc0001b1fe8 sp=0xc0001b1fe0 pc=0x7ff7059ed801
So, the test fails because Go garbage collector wrongly collects live objects.
I’m debugging the issue for two days so far without any luck. It looks like if I build all but gopath/pkg/linux_amd64_dynlink/libtestshared-gcdata-p.so
using lld and link the particular DSO using GNU ld, the test passes. But I can’t find a cause why that test dislikes lld or mold-linked shared object file. Is there any chance that CGO unnecessarily depends on GNU ld-specific section or symbol layout or something?
About this issue
- Original URL
- State: open
- Created 3 years ago
- Reactions: 2
- Comments: 15 (14 by maintainers)
Thanks. Yes, you made that clear already in your previous comment 8 days ago, and it is indeed evident that the problem is due to the fact that the go linker is not applying dynamic relocations to the section in question. Please bear with me while I work on this bug; I have many other demands on my time; I need to balance working on your bug with working on other bugs as well. Thank for your patience.
It was extremely puzzling, but I think I found the cause of the issue. It looks like there’s a bug in go’s linker. Here is what was happening:
src/cmd/link/internal/loader/loader.go
has code that reads section contents from a DSO.decodetypeGcmask
indecodesym.go
callsctxt.loader.Data
to reads GC bitmaps from a DSO’s .data.rel.ro section for type symbols.src/cmd/link/internal/loader/loader.go
does not seem to apply dynamic relocations before reading section contents. Therefore, some values that are non-zero when generated by a GNU linker seem to be just zeros when generated by lld or mold. That causes the difference of GC bitmaps. mold/lld-generated GC bitmaps have more zeros, causing GC to reclaim more objects than it should be.So, I think the proper fix is to change
loader.go
so that it apply dynamic relocations for a DSO before reading its section contents and returning it todecodetypeGcmask
.I’ll take a look.