go-internal: testscript: "signal: killed" exec errors on MacOS 12
--- FAIL: TestScript/flags (0.11s)
testscript.go:429: > exec shfmt -h
[signal: killed]
FAIL: testdata/script/flags.txtar:1: unexpected command failure
I’ve seen this in a number of projects of mine, like:
- https://github.com/burrowers/garble/pull/614
- https://github.com/mvdan/sh/pull/934#issuecomment-1287113938
- https://github.com/mvdan/sh/commit/461be7eb20ff0eca28755287f192b258f0ac3a34
@rvagg mentions the same crash in https://github.com/ipld/go-car/pull/364, and in the past, others like @mr-joshcrane have mentioned the same error on Slack.
This must be something going wrong with either testscript or Go, because for example, that TestScript/flags test from above was just running exec shfmt -h, showing the help output from a Go program. You can see that the testscript file is rather boring, so it’s not doing anything particularly worrying.
Personally, I’ve worked around this by downgrading from macos-latest on GitHub Actions (which switched to macos-12 late last year) to macos-11, which seems to make the failures go away entirely. But of course that’s not a complete fix.
I first hoped that this would be fixed in Go 1.20 with https://go-review.googlesource.com/c/go/+/460476, and that may still be true, given that there are four distinct os/exec bugs for Mac there. But it’s just a good guess, I haven’t verified this yet - nor do I have a Mac machine to test with. Help would be appreciated.
The only other recent mentions of “signal: killed” upstream for Mac are https://github.com/golang/go/issues/57418 and https://github.com/golang/go/issues/57239, and they both seem to point to processes being OOM-killed by the system. This could be the case for us as well, perhaps either due to the OS version upgrade changing the OOM behavior, or perhaps because the macos-12 GitHub machines have less available memory. But I’d also find it hard to believe, given that testscript doesn’t use a particularly high amount of memory.
Filing this issue to track investigation and progress.
About this issue
- Original URL
- State: closed
- Created a year ago
- Reactions: 2
- Comments: 29 (8 by maintainers)
Commits related to this issue
- CI: downgrade macos to 11 See https://github.com/rogpeppe/go-internal/issues/200. — committed to mvdan/xurls by mvdan a year ago
- CI: downgrade from macos-12, drop test-gotip Per #200, macos-12 can cause sporadic `signal: killed` testscript failures, and we have started seeing them in some jobs within go-internal itself as well... — committed to mvdan/go-internal by mvdan a year ago
- CI: downgrade from macos-12, drop test-gotip Per #200, macos-12 can cause sporadic `signal: killed` testscript failures, and we have started seeing them in some jobs within go-internal itself as well... — committed to rogpeppe/go-internal by mvdan a year ago
- testscript: fix "signal: killed" exec errors By adding a small sleep before `TestingM.Run()` to allow the write of the test commands to be flushed to disk. Fixes #200 — committed to bep/go-internal by bep a year ago
- testscript: fix "signal: killed" exec errors By adding a small sleep before `TestingM.Run()` to allow the write of the test commands to be flushed to disk. Fixes #200 — committed to bep/go-internal by bep a year ago
- testscript: fix "signal: killed" exec errors on darwin By adding a small sleep before `TestingM.Run()` to allow the write of the test commands to be flushed to disk. Fixes #200 — committed to bep/go-internal by bep a year ago
- testscript: fix "signal: killed" exec errors on MacOS On `MacOS` there are lots of reports of unexpected failing tests with output similar to this: ``` [signal: killed] FAIL: testscript... — committed to bep/go-internal by bep a year ago
- testscript: fix "signal: killed" exec errors on MacOS On `MacOS` there are lots of reports of unexpected failing tests with output similar to this: ``` [signal: killed] FAIL: testscript... — committed to bep/go-internal by bep a year ago
- testscript: fix "signal: killed" exec errors on MacOS On `MacOS` there are lots of reports of unexpected failing tests with output similar to this: ``` [signal: killed] FAIL: testscript... — committed to bep/go-internal by bep a year ago
- testscript: fix "signal: killed" exec errors on MacOS On `MacOS` there are lots of reports of unexpected failing tests with output similar to this: ``` [signal: killed] FAIL: testscript... — committed to bep/go-internal by bep a year ago
- testscript: fix "signal: killed" exec errors on MacOS By doing a full copy and not a hard link of the binaries. This is the fall back used already for Windows. This is tested OK to remove unexpecte... — committed to bep/go-internal by bep a year ago
- testscript: fix "signal: killed" exec errors on MacOS By doing a full copy and not a hard link of the binaries. This is the fall back used already for Windows. This is tested OK to remove unexpecte... — committed to bep/go-internal by bep a year ago
- testscript: fix "signal: killed" exec errors on MacOS By using `os.Symlink` on Darwin. See #200 — committed to bep/go-internal by bep a year ago
- testscript: fix "signal: killed" exec errors on MacOS By using `os.Symlink` on Darwin. See #200 — committed to bep/go-internal by bep a year ago
- testscript: fix "signal: killed" exec errors on MacOS By using `os.Symlink` instead of `os.Link`.. See #200 — committed to bep/go-internal by bep a year ago
- testscript: fix "signal: killed" exec errors on MacOS By using `os.Symlink` instead of `os.Link`.. See #200 — committed to bep/go-internal by bep a year ago
- Use unix.CloneFile on MacOs To fix unexpected errors of type: ``` [signal: killed] FAIL: testscripts/myecho.txt:1: unexpected command failure ``` Fixes #200 — committed to bep/go-internal by bep a year ago
- testscript: use unix.CloneFile on MacOs To fix unexpected errors of type: ``` [signal: killed] FAIL: testscripts/myecho.txt:1: unexpected command failure ``` Fixes #200 — committed to bep/go-internal by bep a year ago
- testscript: use unix.CloneFile on MacOs To fix unexpected errors of type: ``` [signal: killed] FAIL: testscripts/myecho.txt:1: unexpected command failure ``` Fixes #200 — committed to bep/go-internal by bep a year ago
- testscript: use unix.CloneFile on MacOs To fix unexpected errors of type: ``` [signal: killed] FAIL: testscripts/myecho.txt:1: unexpected command failure ``` Fixes #200 — committed to bep/go-internal by bep a year ago
I can prepare a PR with something ala the above in a few hours.
There is no guarantee that the source and destination are the same filesystem. Which is why there is a fallback with regular file copying. However, they will often be the same filesystem, as
go testandtestingboth place temporary files underos.TempDir.@ldemailly see the previous comments, particularly https://github.com/rogpeppe/go-internal/issues/200#issuecomment-1536127648.
@bep - thanks very much for digging in here to get this fixed.
It’s
clonefile, the one you linked to above: https://cs.opensource.google/go/x/sys/+/refs/tags/v0.8.0:unix/zsyscall_darwin_amd64.go;l=1011Where do you see
clone2? I only seeclonefile(2), but that is just the rather confusing man page syntax to say thatclonefileis in the category2 System calls (functions provided by the kernel), perman man.This has become a real issue for me, so I decided to take a look at it, and I found that there’s a correlation between the size/amount of commands added to
RunMainand this issue. I added a patch in the PR below that fixes this on my MacBook, a sleep after the commands gets written to disk (which I assume allow the OS to flush) and before the test execution.See https://github.com/rogpeppe/go-internal/pull/219
@rogpeppe has been seeing the same failures now on
macos-11, in PRs to this very repo. In particular, the OS version appears to be 11.7.4, and the failures happen on both Go 1.19.x and 1.20.x.Perhaps whatever changed in macos-12 to trigger this bug was backported to macos-11 now.