go: cmd/link: rare corruption of ELF binaries

What version of Go are you using (go version)?

$ go version
go version go1.18 linux/amd64

Does this issue reproduce with the latest release?

Yes, Go 1.18 is the latest (stable) release.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/michael/.cache/go-build"
GOENV="/home/michael/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/michael/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/michael/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/michael/sdk/go1.18"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/michael/sdk/go1.18/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.18"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/home/michael/2022-06-25-squashfs-debug/go.mod"
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build1868362460=/tmp/go-build -gno-record-gcc-switches"

What did you do?

I am building and deploying Go software from a cron job every day.

Recently, I noticed that sometimes, some of my executable binary files do not start up because they are corrupt!

The first time I noticed the issue, the init binary of one of my https://gokrazy.org/ installations was affected, resulting in an installation that wouldn’t boot at all.

The other time, it wasn’t the init binary, but a program of mine called regelwerk which is involved in motion sensor/light control in my home, so I noticed that because the lights weren’t working as they should.

It’s possible this happened more times and I just didn’t notice it.

Yesterday, I found someone on twitter who is also running into this issue, but with an entirely different program (not related to gokrazy at all): https://twitter.com/alvs_versteck/status/1546601648532983808

What did you expect to see?

The Go compiler/linker should produce ELF binaries that contain a valid ELF header.

What did you see instead?

The first 4096 bytes of the ELF binary are zeroed out, as well as another block of 4096 bytes at offset 256K.

You can find the files at https://t.zekjur.net/_2022-06-25-init/

In the other occurrence, it was 4096 bytes at the start of the ELF binary, then 4096 bytes at offset 0x9000.

Unfortunately I have no idea how to reproduce this issue.

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Comments: 44 (34 by maintainers)

Commits related to this issue

Most upvoted comments

Thanks for trying the C code.

when fallocate returns the error code EINTR for the first time, and trying to fallocate again will cause the problem to appear

Could you try in the C code, having a thread (or process) sending some signals to the fallocate thread, so it may return EINTR?

@cherrymui

@cherrymui yes, reducing the patch to only the changes in outbufs_mmap.go as shown below is enough to make the issue go away. Thanks for taking the time to look into this btw!

--- a/src/cmd/link/internal/ld/outbuf_mmap.go
+++ b/src/cmd/link/internal/ld/outbuf_mmap.go
@@ -20,6 +20,10 @@ func (out *OutBuf) Mmap(filesize uint64) (err error) {
                out.munmap()
        }
 
+       err = out.f.Truncate(int64(filesize))
+       if err != nil {
+               Exitf("resize output file failed: %v", err)
+       }
        for {
                if err = out.fallocate(filesize); err != syscall.EINTR {
                        break
@@ -33,10 +37,6 @@ func (out *OutBuf) Mmap(filesize uint64) (err error) {
                        return err
                }
        }
-       err = out.f.Truncate(int64(filesize))
-       if err != nil {
-               Exitf("resize output file failed: %v", err)
-       }
        out.buf, err = syscall.Mmap(int(out.f.Fd()), 0, int(filesize), syscall.PROT_READ|syscall.PROT_WRITE, syscall.MAP_SHARED|syscall.MAP_FILE)
        if err != nil {
                return err

Thanks. Frequent corruption on the ELF header is probably enough information. The ELF header is written by the linker only. The compiler and (most of) the Go command are unrelated. The one place that the go command touches the binary after linking is stamping the build ID. So it is either the linker or that build ID stamping.

Is your program a pure-Go binary or it uses cgo (i.e. whether it is internal linking or external linking)?