go: runtime: corrupt binary export data seen after signal preemption CL

$ go version
go version devel +d2c039fb21 Sun Nov 3 01:44:46 2019 +0000 linux/amd64
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/mvdan/.cache/go-build"
GOENV="/home/mvdan/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GONOPROXY="brank.as/*"
GONOSUMDB="brank.as/*"
GOOS="linux"
GOPATH="/home/mvdan/go"
GOPRIVATE="brank.as/*"
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/mvdan/tip"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/mvdan/tip/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/home/mvdan/src/gio/cmd/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build819987201=/tmp/go-build -gno-record-gcc-switches"

After building Go from master, I sometimes see errors like:

$ go test -race
# encoding/json
vet: /home/mvdan/tip/src/encoding/json/decode.go:13:2: could not import fmt (cannot import "fmt" (unknown bexport format version -1 ("\x80\x16\x13\x00\xc0\x00\x00\x00\x80\x17\x13\x00\xc0\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x06format\x01a\x00\x04esc:\x05esc:\x02\x18$GOROOT/src/fmt/print.go\x05Write\x01b\x01n\x03err\x05Width\x03wid\x02ok\tPrecision\x04prec\x04Flag\x01c\x06Format\x01f\x05State\x06String\bGoString\x01w\x06Writer\x02io")), possibly version skew - reinstall package)
PASS
ok      gioui.org/cmd/gogio     4.559s

Here’s another crash from earlier today, with a slightly modified (and freshly built) Go master - you can see the error mentions a different std package:

$ go test
# mime
vet: /home/mvdan/tip/src/mime/encodedword.go:12:2: could not import io (cannot import "io" (unknown bexport format version -1 ("DX\xcdq㦔d_\xbf\x97\xa64h\xf7\x8f\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00d\x01p\x01n\x03err\x05Write\x05Close\x04Seek\x06offset\x06whence\x06Reader\x06Writer\x06Closer\x06Seeker\bReadFrom\x01r\aWriteTo\x01w\x06ReadAt\x03off\aWriteAt\bReadByte")), possibly version skew - reinstall package)
PASS
ok      gioui.org/cmd/gogio     7.199s

@heschik correctly points out that this could be a bad version of vet in play, since the bexport format has been replaced with the iexport. However, I already nuked my $GOBIN, and go test -x seems to be running /home/mvdan/tip/pkg/tool/linux_amd64/vet, which is freshly built.

Usually I’d assume this is an issue with my setup, but I can’t find anything wrong with it, and I’ve only started seeing these errors today.

/cc @mdempsky @griesemer @alandonovan

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 88 (74 by maintainers)

Commits related to this issue

Most upvoted comments

I just got the C reproducer working. I’m working on tidying it up and I’ll post it. Both madvising and mmaping the sigaltstack work to clear the pages (and that is necessary). The other missing ingredient was just running lots of the processes simultaneously.

Here’s the C reproducer. This fails almost instantly on 5.3.0-1008-gcp, and torvalds/linux@d9c9ce34ed5c892323cbf5b4f9a4c498e036316a (5.1.0-rc3+). It does not fail at the parent of that commit (torvalds/linux@a5eff7259790d5314eff10563d6e59d358cce482).

I’ll work on filing this upstream with Linux.

// Build with: gcc -pthread test.c
//
// This demonstrates an issue where AVX state becomes corrupted when a
// signal is delivered where the signal stack pages aren't faulted in.
//
// There appear to be three necessary ingredients, which are marked
// with "!!!" below:
//
// 1. A thread doing AVX operations using YMM registers.
//
// 2. A signal where the kernel must fault in stack pages to write the
//    signal context.
//
// 3. Context switches. Having a single task isn't sufficient.

#include <errno.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <pthread.h>
#include <sys/mman.h>
#include <sys/prctl.h>
#include <sys/wait.h>

static int sigs;

static stack_t altstack;
static pthread_t tid;

static void die(const char* msg, int err) {
  if (err != 0) {
    fprintf(stderr, "%s: %s\n", msg, strerror(err));
  } else {
    fprintf(stderr, "%s\n", msg);
  }
  exit(EXIT_FAILURE);
}

void handler(int sig __attribute__((unused)),
             siginfo_t* info __attribute__((unused)),
             void* context __attribute__((unused))) {
  sigs++;
}

void* sender(void *arg) {
  int err;

  for (;;) {
    usleep(100);
    err = pthread_kill(tid, SIGWINCH);
    if (err != 0)
      die("pthread_kill", err);
  }
  return NULL;
}

void dump(const char *label, unsigned char *data) {
  printf("%s =", label);
  for (int i = 0; i < 32; i++)
    printf(" %02x", data[i]);
  printf("\n");
}

void doAVX(void) {
  unsigned char input[32];
  unsigned char output[32];

  // Set input to a known pattern.
  for (int i = 0; i < sizeof input; i++)
    input[i] = i;
  // Mix our PID in so we detect cross-process leakage, though this
  // doesn't appear to be what's happening.
  pid_t pid = getpid();
  memcpy(input, &pid, sizeof pid);

  while (1) {
    for (int i = 0; i < 1000; i++) {
      // !!! Do some computation we can check using YMM registers.
      asm volatile(
        "vmovdqu %1, %%ymm0;"
        "vmovdqa %%ymm0, %%ymm1;"
        "vmovdqa %%ymm1, %%ymm2;"
        "vmovdqa %%ymm2, %%ymm3;"
        "vmovdqu %%ymm3, %0;"
        : "=m" (output)
        : "m" (input)
        : "memory", "ymm0", "ymm1", "ymm2", "ymm3");
      // Check that input == output.
      if (memcmp(input, output, sizeof input) != 0) {
        dump("input ", input);
        dump("output", output);
        die("mismatch", 0);
      }
    }

    // !!! Release the pages of the signal stack. This is necessary
    // because the error happens when copy_fpstate_to_sigframe enters
    // the failure path that handles faulting in the stack pages.
    // (mmap with MMAP_FIXED also works.)
    //
    // (We do this here to ensure it doesn't race with the signal
    // itself.)
    if (madvise(altstack.ss_sp, altstack.ss_size, MADV_DONTNEED) != 0)
      die("madvise", errno);
  }
}

void doTest() {
  // Create an alternate signal stack so we can release its pages.
  void *altSigstack = mmap(NULL, SIGSTKSZ, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
  if (altSigstack == MAP_FAILED)
    die("mmap failed", errno);
  altstack.ss_sp = altSigstack;
  altstack.ss_size = SIGSTKSZ;
  if (sigaltstack(&altstack, NULL) < 0)
    die("sigaltstack", errno);

  // Install SIGWINCH handler.
  struct sigaction sa = {
    .sa_sigaction = handler,
    .sa_flags = SA_ONSTACK | SA_RESTART,
  };
  sigfillset(&sa.sa_mask);
  if (sigaction(SIGWINCH, &sa, NULL) < 0)
    die("sigaction", errno);

  // Start thread to send SIGWINCH.
  int err;
  pthread_t ctid;
  tid = pthread_self();
  if ((err = pthread_create(&ctid, NULL, sender, NULL)) != 0)
    die("pthread_create sender", err);

  // Run test.
  doAVX();
}

void *exiter(void *arg) {
  sleep(60);
  exit(0);
}

int main() {
  int err;
  pthread_t ctid;

  // !!! We need several processes to cause context switches. Threads
  // probably also work. I don't know if the other tasks also need to
  // be doing AVX operations, but here we do.
  int nproc = sysconf(_SC_NPROCESSORS_ONLN);
  for (int i = 0; i < 2 * nproc; i++) {
    pid_t child = fork();
    if (child < 0) {
      die("fork failed", errno);
    } else if (child == 0) {
      // Exit if the parent dies.
      prctl(PR_SET_PDEATHSIG, SIGKILL, 0, 0, 0);
      doTest();
    }
  }

  // Exit after a while.
  if ((err = pthread_create(&ctid, NULL, exiter, NULL)) != 0)
    die("pthread_create exiter", err);

  // Wait for a failure.
  int status;
  if (wait(&status) < 0)
    die("wait", errno);
  if (status == 0)
    die("child unexpectedly exited with success", 0);
  fprintf(stderr, "child process failed\n");
  exit(1);
}

I’ve filed the upstream kernel bug here: https://bugzilla.kernel.org/show_bug.cgi?id=205663

I set out to bisect the kernel and discovered that it also depends on the GCC version:

Kernel tag GCC Outcome
v5.3 9 Reproducible
v5.3 8 Not reproducible
v5.2 9 Kernel doesn’t build
v5.2 8 Not reproducible

So this may happen in earlier kernel versions but be masked by their incompatibility with GCC 9. My guess is that GCC 9 started using AVX registers for something that GCC 8 didn’t?

I’m going to see if a later v5.2.x kernel builds with GCC 9.

I’m finally ready to actually bisect. Setting CONFIG_RETPOLINE to n works around the GCC 9 incompatibility. With this configuration, I’ve been able to reproduce the failure with v5.2, but NOT with v5.1. I’m now bisecting between the two.

For reference, here’s the script I’m using to configure, build, and load the kernel. This is based on the ubuntu-1910-eoan-v20191114 GCE image (grown to 50 GiB to fit the kernel build, sigh).

#!/bin/sh
set -e
# Start with system's 5.3 config
cp /boot/config-5.3.0-1008-gcp .config
# Work around GCC 9 flag incompatibility
sed -r -i 's/^(CONFIG_RETPOLINE)=y/\1=n/' .config
# Remove file systems that are modules
sed -r -i 's/^(CONFIG_.*_FS)=m/\1=n/' .config
# Disable big hardware subsystems we don't need
sed -r -i 's/^(CONFIG_(BT|WLAN|WIRELESS|SOUND|INFINIBAND|USB_SUPPORT|MEDIA_SUPPORT|FIREWIRE|ISDN|IIO|INPUT_(JOYSTICK|TOUCHSCREEN|TABLET|MISC)))=./\1=n/' .config
# Turn off ETHERNET (we use VIRTIO_NET) and things that force it on
sed -r -i 's/^(CONFIG_(ETHERNET|SCSI_CXGB3_ISCSI|SCSI_BNX2_ISCSI|SCSI_BNX2X_FCOE))=./\1=n/' .config
# Clean up the config
make olddefconfig
# Build kernel and install modules
make -j12 && sudo make modules_install
# Load kexec image. Command line from /boot/grub/grub.cfg
sudo kexec -l arch/x86/boot/bzImage --append="root=PARTUUID=2954cd84-339a-4c4a-b80d-c039bd33478c ro scsi_mod.use_blk_mq=Y console=ttyS0" --initrd=/boot/initrd.img-5.3.0-1008-gcp
# Run sudo kexec -e to reboot into loaded kernel (does NOT do a clean shut down)

I’ve bisected the issue to kernel commit torvalds/linux@d9c9ce34ed5c892323cbf5b4f9a4c498e036316a. I haven’t dug into this commit yet, but it appears to be a fix for bug that was introduced earlier (or possibly a redo of an earlier commit).

For reference, torvalds/linux@a352a3b7b7920212ee4c45a41500c66826318e92 in that same commit series introduced another bug first that produced similar failure types, but at a far higher rate (I couldn’t even run cmd/go). However, this was fixed later in the series, somewhere between torvalds/linux@e0d3602f933367881bddfff310a744e6e61c284c and torvalds/linux@1d731e731c4cd7cbd3b1aa295f0932e7610da82f.

My first bisect log is below. This led to the fail-fast failure.

# bad: [0ecfebd2b52404ae0c54a878c872bb93363ada36] Linux 5.2
# good: [e93c9c99a629c61837d5a7fc2120cd2b6c70dbdd] Linux 5.1
git bisect start 'v5.2' 'v5.1'
# bad: [a2d635decbfa9c1e4ae15cb05b68b2559f7f827c] Merge tag 'drm-next-2019-05-09' of git://anongit.freedesktop.org/drm/drm
git bisect bad a2d635decbfa9c1e4ae15cb05b68b2559f7f827c
# bad: [82efe439599439a5e1e225ce5740e6cfb777a7dd] Merge tag 'devicetree-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
git bisect bad 82efe439599439a5e1e225ce5740e6cfb777a7dd
# bad: [78438ce18f26dbcaa8993bb45d20ffb0cec3bc3e] Merge branch 'stable-fodder' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
git bisect bad 78438ce18f26dbcaa8993bb45d20ffb0cec3bc3e
# good: [275b103a26e218b3d739e5ab15be6b40303a1428] Merge tag 'edac_for_5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
git bisect good 275b103a26e218b3d739e5ab15be6b40303a1428
# good: [962d5ecca101e65175a8cdb1b91da8e1b8434d96] Merge tag 'regmap-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
git bisect good 962d5ecca101e65175a8cdb1b91da8e1b8434d96
# good: [9bff9dfc513bd5de72cb59f4bffb72cf0a5aa526] Merge tag 'spi-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
git bisect good 9bff9dfc513bd5de72cb59f4bffb72cf0a5aa526
# good: [573de2a6e844cb230c4483833f29b8344a6a17cc] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching
git bisect good 573de2a6e844cb230c4483833f29b8344a6a17cc
# good: [b62de322579702f07175fc275ecb2c3afae6cd96] hugetlb: make use of ->free_inode()
git bisect good b62de322579702f07175fc275ecb2c3afae6cd96
# bad: [d9c9ce34ed5c892323cbf5b4f9a4c498e036316a] x86/fpu: Fault-in user stack if copy_fpstate_to_sigframe() fails
git bisect bad d9c9ce34ed5c892323cbf5b4f9a4c498e036316a
# good: [577ff465f5a6a5a0866d75a033844810baca20a0] x86/fpu: Only write PKRU if it is different from current
git bisect good 577ff465f5a6a5a0866d75a033844810baca20a0
#
# Below here the failure mode changed to fail almost instantly in cmd/go,
# though the types of failures looked similar.
#
# bad: [e0d3602f933367881bddfff310a744e6e61c284c] x86/fpu: Inline copy_user_to_fpregs_zeroing()
git bisect bad e0d3602f933367881bddfff310a744e6e61c284c
# good: [383c252545edcc708128e2028a2318b05c45ede4] x86/entry: Add TIF_NEED_FPU_LOAD
git bisect good 383c252545edcc708128e2028a2318b05c45ede4
# bad: [a352a3b7b7920212ee4c45a41500c66826318e92] x86/fpu: Prepare copy_fpstate_to_sigframe() for TIF_NEED_FPU_LOAD
git bisect bad a352a3b7b7920212ee4c45a41500c66826318e92
# good: [69277c98f5eef0d9839699b7825923c3985f665f] x86/fpu: Always store the registers in copy_fpstate_to_sigframe()
git bisect good 69277c98f5eef0d9839699b7825923c3985f665f
# first bad commit: [a352a3b7b7920212ee4c45a41500c66826318e92] x86/fpu: Prepare copy_fpstate_to_sigframe() for TIF_NEED_FPU_LOAD

After this I backed up and looked for the original failure. This bisect also happened to reveal that the fast failure had been fixed before the original failure was introduced.

# bad: [d9c9ce34ed5c892323cbf5b4f9a4c498e036316a] x86/fpu: Fault-in user stack if copy_fpstate_to_sigframe() fails
# good: [e0d3602f933367881bddfff310a744e6e61c284c] x86/fpu: Inline copy_user_to_fpregs_zeroing()
git bisect start 'd9c9ce34ed5c892323cbf5b4f9a4c498e036316a' 'e0d3602f933367881bddfff310a744e6e61c284c'
# good: [1d731e731c4cd7cbd3b1aa295f0932e7610da82f] x86/fpu: Add a fastpath to __fpu__restore_sig()
git bisect good 1d731e731c4cd7cbd3b1aa295f0932e7610da82f
# good: [06b251dff78704c7d122bd109384d970a7dbe94d] x86/fpu: Restore regs in copy_fpstate_to_sigframe() in order to use the fastpath
git bisect good 06b251dff78704c7d122bd109384d970a7dbe94d
# good: [a5eff7259790d5314eff10563d6e59d358cce482] x86/pkeys: Add PKRU value to init_fpstate
git bisect good a5eff7259790d5314eff10563d6e59d358cce482
# first bad commit: [d9c9ce34ed5c892323cbf5b4f9a4c498e036316a] x86/fpu: Fault-in user stack if copy_fpstate_to_sigframe() fails

Linux kernel version-specific signal handling behavior sounds fuuuuun.

@ianlancetaylor, I just tried with Go 1.13 and sending SIGWINCH in a loop and was able to reproduce the same vet version header corruption.

The change you mentioned in the kernel suggests that the kernel is now using AVX registers and probably wasn’t before. I could definitely see that introducing bugs around save/restore (especially if that’s done lazily; I’m not sure if it is). But it’s a good question why we’re seeing this.

Well, at least you’ve proved that this is a pre-existing bug that has nothing to do with signal preemption, so clearly we don’t have to fix it for 1.14.

(Edit: this is a joke.)

I’ve now reproduced this on another Fedora 30 linux/amd64 machine, also with kernel 5.3.7-200.fc30.x86_64 but this time with a Ryzen 1800X (not overclocked, 8/16 cores) with 32 GB.

I’ve git bisected this and on both of my Fedora 30 machines, the commit where the failures start is 62e53b79227dafc6afcd92240c89acb8c0e1dd56, ‘runtime: use signals to preempt Gs for suspendG’, which is targeted at #10958 and #24543. I think this makes this issue CC @aclements.

@dr2chase One difference I see in the disassembly is that GCC 9 is caching the address of the thread-local variable checked by test_thread_flag across the function, while GCC 8 is reloading it each time. If the retry loop can cause a change between threads, then the call to test_thread_flag(TIF_NEED_FPU_LOAD) will be looking at the wrong thread if the retry occurs.

@ianlancetaylor, I tried your suggestion and it does indeed still crash with a signal completely unrelated to preemption, delivered from outside the process.

I ran while GODEBUG=asyncpreemptoff=1 ./bin/go vet all; do true; done for 10 minutes with no failures.

I then started (while true; do killall -WINCH vet; done) (we don’t ignore XCPU, but do ignore WINCH) and the vet loop failed three times in 10 minutes (one header corruption, one index out of range, and one weird nil dereference).

For your nerd-sniping amusement (and not mine), the bug is somewhere in the differences in these two disassembled files. Link Linux with the gcc8 one, all is well, link with the other, it goes bad.

signal.8.dis.txt signal.9.dis.txt

There’s a ridiculous amount of inlining going on, not sure it helps to have the source annotation in the disassembly, but here it is:

signal.8.il.dis.txt signal.9.il.dis.txt

The source file in question, arch/x86/kernel/fpu/signal.o, is from this commit: https://github.com/torvalds/linux/commit/d9c9ce34ed5c892323cbf5b4f9a4c498e036316a

I verified this by building two kernels, one entirely compiled by gcc8 except for arch/x86/kernel/fpu/signal.o, compiled by gcc9. It fails . The other kernel is built entirely by gcc9, except for arch/x86/kernel/fpu/signal.o, built by gcc8. It does not fail.

Enjoy!

I assume that the corruption does not occur if you disable preemption. You could try disabling preemption, and then run the program, and have another program hit it regularly with some signal that the Go runtime will catch but ignore, like SIGXCPU. If that still fails, there is something wrong with the way that the AVX registers are being saved and restored when a signal occurs, and the problem has nothing to do with preemption as such.

@mdempsky There is some discussion of workarounds over at #35777 (I forget why there are two different issues).

FYI, I reproduced, at Linux tip (5.4 + a little), the dependence of the bug on how arch/x86/kernel/fpu/signal.o is compiled –

  • if that is the only file compiled with gcc9, the rest with gcc8, then the bug appears;
  • if that is the only file compiled with gcc8, the rest with gcc9 then the bug does not appear.

The exact gcc versions are

gcc-8 --version
gcc-8 (Ubuntu 8.3.0-23ubuntu2) 8.3.0

and

gcc-9 --version
gcc-9 (Ubuntu 9.2.1-9ubuntu2) 9.2.1 20191008

Would somebody like to try this C program on a 5.3 kernel and see whether it reports any memory corruption? It runs without error on my 4.19 kernel.

#include <errno.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <pthread.h>

static void die(const char* msg, int err) {
  if (err != 0) {
    fprintf(stderr, "%s: %s\n", msg, strerror(err));
  } else {
    fprintf(stderr, "%s\n", msg);
  }
  exit(EXIT_FAILURE);
}

static int sigs;

void handler(int sig __attribute__((unused)),
	     siginfo_t* info __attribute__((unused)),
	     void* context __attribute__((unused))) {
  sigs++;
}

void* thread(void* arg) {
  pthread_t tid;
  int err;

  tid = *(pthread_t*)(arg);
  for (;;) {
    usleep(100);
    err = pthread_kill(tid, SIGWINCH);
    if (err != 0) {
      die("pthread_kill", err);
    }
  }
  return NULL;
}

static char altSigstack[SIGSTKSZ];

/* Looking at the glibc source code it appears that it will use rep
   movsb somewhere above 2048 bytes.  */
static char buf1[2040];
static char buf2[2040];

/* Pretend to clobber memory to avoid compiler optimizations.  */
static void clobber() {
  asm volatile("" : : : "memory");
}

int main() {
  stack_t ss;
  struct sigaction sa;
  pthread_t tid;
  pthread_t ctid;
  int err;
  int i;

  memset(&ss, 0, sizeof ss);
  ss.ss_sp = altSigstack;
  ss.ss_size = sizeof altSigstack;
  if (sigaltstack(&ss, NULL) < 0) {
    die("sigaltstack", errno);
  }

  memset(&sa, 0, sizeof sa);
  sa.sa_sigaction = handler;
  sigfillset(&sa.sa_mask);
  sa.sa_flags = SA_ONSTACK | SA_RESTART;
  if (sigaction(SIGWINCH, &sa, NULL) < 0) {
    die("sigaction", errno);
  }

  tid = pthread_self();
  err = pthread_create(&ctid, NULL, thread, (void*)(&tid));
  if (err != 0) {
    die("pthread_create", err);
  }

  for (i = 0; i < sizeof buf1; i++) {
    buf1[i] = (char)(i);
  }

  for (i = 0; i < 10000000; i++) {
    memset(buf2, 0, sizeof buf2);
    clobber();
    memcpy(buf2, buf1, sizeof buf1);
    clobber();
    if (memcmp(buf1, buf2, sizeof buf1) != 0) {
      printf("memory corruption at iteration %d\n", i);
    }
    clobber();
  }

  printf("received %d signals\n", sigs);
}

Sorry for confusion, I was being sarcastic.

Of course we have to fix it, one way or another.

Change https://golang.org/cl/208218 mentions this issue: runtime: stress testing for non-cooperative preemption