go: runtime,cmd/compile: `exit status 0xc0000374` (`STATUS_HEAP_CORRUPTION`) on windows-amd64-longtest
#!watchflakes
post <- builder ~ `windows` && `0xc0000374`
XXXBANNERXXX:Test execution environment.
# GOARCH: amd64
# CPU: Intel(R) Xeon(R) CPU @ 2.20GHz
# GOOS: windows
# OS Version: 10.0.14393
go tool compile: exit status 0xc0000374
go tool dist: FAILED: go list -f={{if .Stale}} STALE {{.ImportPath}}: {{.StaleReason}}{{end}} std: exit status 1
According to https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-erref/596a1078-e883-4972-9bbc-49e60bebca55, this exit code means:
0xC0000374 STATUS_HEAP_CORRUPTION A heap has been corrupted.
greplogs --dashboard -md -l -e \(\?ms\)\\Awindows-.\*0xc0000374
2022-04-27T14:23:28-f0c0e0f/windows-amd64-longtest
Since this has only been seen once, leaving on the backlog to see whether this is a recurring pattern or a one-off fluke. (CC @golang/runtime)
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 38 (35 by maintainers)
I wrote a simple program that ran four instances of
$GOROOT/pkg/pkg/tool/$GOOS_$GOARCH=compile -V=full
four times in parallel. I ran that program 100,000 times on a gomote, while in a separate terminal runninggo test std cmd
in a loop. I never saw theSTATUS_HEAP_CORRUPTION
failure.No idea what is happening here.
OK, I see where the
go tool compile
error message is coming from:(*Builder).toolID
in cmd/go/internal/work/buildid.go. That code is invoked as, among other things,b.toolID("compile")
. It runs the compiler binary directly; it does not rungo tool compile
. However, if the compiler fails, it prints a message asgo tool compile:
followed by the error message.This is invoked while running the
go list
command. I’m fairly confident that this is why we are seeing these error messages.The
toolID
method will always the compiler with-V=full
. So it appears that very occasionally running the compiler with-V=full
is causing it to exit withSTATUS_HEAP_CORRUPTION
.One thing I haven’t tried is testing at exactly one of the commits that previously failed. To that end, I’ll test at f0c0e0f255c59c8ee6e463103d0b8491b8f9b1af (commit from the 2022-04-27 failure).
I’ve instrumented
checkNotStale
and you are right that we don’t run it very often in standard all.bash (once per#####
test block). With sharding it should be running every few packages I believe. So I can try increasing the number of staleness checks. That said, by my envelope calculations I think I’ve run ~5000 all.bash runs, so I’ve still run the staleness check quite a bit. (I have 578 other windows test failure logs sitting in /tmp!)Three days of continuous testing on 25 windows gomotes has gotten me zero of these failures, so I suspect I am missing some required component of the failure.