go-internal: testscript: "signal: killed" exec errors on MacOS 12

     --- FAIL: TestScript/flags (0.11s)
        testscript.go:429: > exec shfmt -h
            [signal: killed]
            FAIL: testdata/script/flags.txtar:1: unexpected command failure

I’ve seen this in a number of projects of mine, like:

@rvagg mentions the same crash in https://github.com/ipld/go-car/pull/364, and in the past, others like @mr-joshcrane have mentioned the same error on Slack.

This must be something going wrong with either testscript or Go, because for example, that TestScript/flags test from above was just running exec shfmt -h, showing the help output from a Go program. You can see that the testscript file is rather boring, so it’s not doing anything particularly worrying.

Personally, I’ve worked around this by downgrading from macos-latest on GitHub Actions (which switched to macos-12 late last year) to macos-11, which seems to make the failures go away entirely. But of course that’s not a complete fix.

I first hoped that this would be fixed in Go 1.20 with https://go-review.googlesource.com/c/go/+/460476, and that may still be true, given that there are four distinct os/exec bugs for Mac there. But it’s just a good guess, I haven’t verified this yet - nor do I have a Mac machine to test with. Help would be appreciated.

The only other recent mentions of “signal: killed” upstream for Mac are https://github.com/golang/go/issues/57418 and https://github.com/golang/go/issues/57239, and they both seem to point to processes being OOM-killed by the system. This could be the case for us as well, perhaps either due to the OS version upgrade changing the OOM behavior, or perhaps because the macos-12 GitHub machines have less available memory. But I’d also find it hard to believe, given that testscript doesn’t use a particularly high amount of memory.

Filing this issue to track investigation and progress.

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 2
  • Comments: 29 (8 by maintainers)

Commits related to this issue

Most upvoted comments

Full copy on Windows, Clonefile on Mac, and hard links on Linux sound good to me. Beware that we likely need build tags now, since unix.Clonefile is only defined for GOOS=darwin. Perhaps we can add a cloneFile func which is os.Link on unix && !darwin, unix.CloneFile on darwin, and return fmt.Errorf(“unavailable”) on !unix.

I can prepare a PR with something ala the above in a few hours.

There is no guarantee that the source and destination are the same filesystem. Which is why there is a fallback with regular file copying. However, they will often be the same filesystem, as go test and testing both place temporary files under os.TempDir.

@bep - thanks very much for digging in here to get this fixed.

It’s clonefile, the one you linked to above: https://cs.opensource.google/go/x/sys/+/refs/tags/v0.8.0:unix/zsyscall_darwin_amd64.go;l=1011

Where do you see clone2? I only see clonefile(2), but that is just the rather confusing man page syntax to say that clonefile is in the category 2 System calls (functions provided by the kernel), per man man.

This has become a real issue for me, so I decided to take a look at it, and I found that there’s a correlation between the size/amount of commands added to RunMain and this issue. I added a patch in the PR below that fixes this on my MacBook, a sleep after the commands gets written to disk (which I assume allow the OS to flush) and before the test execution.

See https://github.com/rogpeppe/go-internal/pull/219

@rogpeppe has been seeing the same failures now on macos-11, in PRs to this very repo. In particular, the OS version appears to be 11.7.4, and the failures happen on both Go 1.19.x and 1.20.x.

Perhaps whatever changed in macos-12 to trigger this bug was backported to macos-11 now.