go: cmd/link: rare corruption of ELF binaries
What version of Go are you using (go version
)?
$ go version go version go1.18 linux/amd64
Does this issue reproduce with the latest release?
Yes, Go 1.18 is the latest (stable) release.
What operating system and processor architecture are you using (go env
)?
go env
Output
$ go env GO111MODULE="" GOARCH="amd64" GOBIN="" GOCACHE="/home/michael/.cache/go-build" GOENV="/home/michael/.config/go/env" GOEXE="" GOEXPERIMENT="" GOFLAGS="" GOHOSTARCH="amd64" GOHOSTOS="linux" GOINSECURE="" GOMODCACHE="/home/michael/go/pkg/mod" GONOPROXY="" GONOSUMDB="" GOOS="linux" GOPATH="/home/michael/go" GOPRIVATE="" GOPROXY="https://proxy.golang.org,direct" GOROOT="/home/michael/sdk/go1.18" GOSUMDB="sum.golang.org" GOTMPDIR="" GOTOOLDIR="/home/michael/sdk/go1.18/pkg/tool/linux_amd64" GOVCS="" GOVERSION="go1.18" GCCGO="gccgo" GOAMD64="v1" AR="ar" CC="gcc" CXX="g++" CGO_ENABLED="1" GOMOD="/home/michael/2022-06-25-squashfs-debug/go.mod" GOWORK="" CGO_CFLAGS="-g -O2" CGO_CPPFLAGS="" CGO_CXXFLAGS="-g -O2" CGO_FFLAGS="-g -O2" CGO_LDFLAGS="-g -O2" PKG_CONFIG="pkg-config" GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build1868362460=/tmp/go-build -gno-record-gcc-switches"
What did you do?
I am building and deploying Go software from a cron job every day.
Recently, I noticed that sometimes, some of my executable binary files do not start up because they are corrupt!
The first time I noticed the issue, the init
binary of one of my https://gokrazy.org/ installations was affected, resulting in an installation that wouldn’t boot at all.
The other time, it wasn’t the init
binary, but a program of mine called regelwerk
which is involved in motion sensor/light control in my home, so I noticed that because the lights weren’t working as they should.
It’s possible this happened more times and I just didn’t notice it.
Yesterday, I found someone on twitter who is also running into this issue, but with an entirely different program (not related to gokrazy at all): https://twitter.com/alvs_versteck/status/1546601648532983808
What did you expect to see?
The Go compiler/linker should produce ELF binaries that contain a valid ELF header.
What did you see instead?
The first 4096 bytes of the ELF binary are zeroed out, as well as another block of 4096 bytes at offset 256K.
You can find the files at https://t.zekjur.net/_2022-06-25-init/
In the other occurrence, it was 4096 bytes at the start of the ELF binary, then 4096 bytes at offset 0x9000.
Unfortunately I have no idea how to reproduce this issue.
About this issue
- Original URL
- State: open
- Created 2 years ago
- Comments: 44 (34 by maintainers)
@cherrymui
I can reproduce the error of fallocate returning EINTR in the C code by sending a signal to another process, and found that the problem of data loss does occur. My C code: https://github.com/abner-chenc/abner/blob/master/fallocate.c
I can reproduce the same error as loong64 by executing make.bash in Golang’s linux/amd64 (kernel version 6.1.0-rc2), here are my logs and corrupted files: bad_go_asm_x86.tar.gz golang-x86.log
Note: You need to mount /tmp as tmpfs when testing (because some Linux distributions don’t do this)
This should be a kernel bug(introduced in linux 5.16-rc4), I have reported this bug:https://lore.kernel.org/linux-mm/33b85d82.7764.1842e9ab207.Coremail.chenguoqic@163.com/
This is the fix patch provided by our colleagues: https://lore.kernel.org/linux-mm/20221101032248.819360-1-kernel@hev.cc/T/#u
@cherrymui yes, reducing the patch to only the changes in
outbufs_mmap.go
as shown below is enough to make the issue go away. Thanks for taking the time to look into this btw!Thanks. Frequent corruption on the ELF header is probably enough information. The ELF header is written by the linker only. The compiler and (most of) the Go command are unrelated. The one place that the go command touches the binary after linking is stamping the build ID. So it is either the linker or that build ID stamping.
Is your program a pure-Go binary or it uses cgo (i.e. whether it is internal linking or external linking)?