go: runtime,cmd/compile: frequent memory corruption on NetBSD and OpenBSD since 2021-10-11
greplogs --dashboard -md -l -e 'freeIndex is not valid' --since=2021-05-01
2021-10-28T19:04:41-4e1c44d-18b9702/netbsd-386-9_0 2021-10-28T18:17:57-f229e70/netbsd-386-9_0 2021-10-28T18:01:34-03971e3-18b9702/netbsd-386-9_0 2021-10-28T01:15:26-103d89b-b2fe2eb/netbsd-386-9_0 2021-10-27T20:03:17-7b0b504-68bd512/netbsd-386-9_0 2021-10-27T16:39:27-94870a3-4f73fd0/netbsd-386-9_0 2021-10-27T13:12:49-d418f37-cfb5321/netbsd-386-9_0 2021-10-27T06:23:35-5786a54/netbsd-386-9_0 2021-10-27T05:33:58-ca5f65d/netbsd-386-9_0 2021-10-26T22:24:36-591e12a-80be4a4/netbsd-386-9_0 2021-10-26T22:05:53-80be4a4/netbsd-amd64-9_0 2021-10-26T18:40:06-9626607-11b64b4/netbsd-386-9_0 2021-10-26T15:46:18-c4ead46-1b2362b/netbsd-386-9_0 2021-10-19T07:45:46-98f6e03-ee92daa/netbsd-386-9_0 2021-10-18T21:52:05-98f6e03-425db64/netbsd-386-9_0 2021-10-18T21:52:05-425db64/netbsd-amd64-9_0
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 45 (39 by maintainers)
Commits related to this issue
- dashboard: add n1 hosts for OpenBSD and NetBSD We recently migrated in CL 354757 from n1 to e2 hosts around the same time issues started appearing with these builders. n1 hosts are notably different ... — committed to golang/build by toothrot 3 years ago
- dashboard: add n2 and n2d hosts for OpenBSD and NetBSD We recently migrated in CL 354757 from n1 to e2 hosts around the same time issues started appearing with these builders. n1 hosts are notably di... — committed to golang/build by toothrot 3 years ago
- dashboard: clean up builders affected by memory corruption Remove freebsd 12.2, which is replaced by 12.3 with the XSAVE fix. Move freebsd 11.* to N2 machines, which are not affected. Remove openbsd ... — committed to golang/build by heschi 2 years ago
- cmd/dist: log CPU model when testing Knowing whether test failures are correlated with specific CPU models on has proven useful on several issues. Log it for prior to testing so it is always availabl... — committed to golang/go by prattmic 3 years ago
- cmd/dist: log CPU model when testing Knowing whether test failures are correlated with specific CPU models on has proven useful on several issues. Log it for prior to testing so it is always availabl... — committed to golang/go by prattmic 3 years ago
- cmd/dist: log CPU model when testing Knowing whether test failures are correlated with specific CPU models on has proven useful on several issues. Log it for prior to testing so it is always availabl... — committed to jproberts/go by prattmic 3 years ago
Reproduced on a bare metal Ryzen 3600 running NetBSD 9.99.92
Using the quicker reproducer at #34988 on NetBSD with the help of several NetBSD folk: AMD 10h: OK (Turion II Neo N40L) AMD 15h: OK AMD 17h: NOT OK (Zen 1950X, Zen2 3600) AMD 19h: NOT OK (Zen3 5950X)
We have narrowed this down a bit to being specifically related to AMD CPUs. The E2 instances we switched to are a mix of Intel or AMD machines. https://golang.org/cl/367534 added explicit Intel-only (
-n2
) and AMD-only (-n2d
) builders and we found:Thus far we’ve only been able to test on GCE instances, but would love to know if these crashes reproduce on OpenBSD/NetBSD on bare-metal AMD machines.
cc @4a6f656c @bsiegert @tklauser or anyone else that may have an OpenBSD or NetBSD AMD machine, just running
GOARCH=386 ./all.bash
(perhaps a couple of times) should be sufficient to reproduce some kind of memory corruption crash.This is not a bug in Go. The failing builders will be annotated with a known issue until it is resolved. Because of this, it is no longer a release blocker.
Per https://github.com/golang/go/issues/49209#issuecomment-982057154 you would need to be running OpenBSD i386 (not amd64) to be able to reproduce the issue (OpenBSD amd64 does not run i386 binaries, which would presumably be needed to trigger the problem).
I have yet (after about a dozen tries) to reproduce this on an openbsd/amd64 host using 1.17.3 as bootstrap to build the 386 dist.
GOARCH=386 ./all.bash
only starts erroring withexec format error
when tests begin to run, which is expected.That said, I have long suspected memory corruption errors specifically related to Go, OpenBSD, AMD, and forking; see issue #34988.
I’m not sure this error is entirely a regression, as I’ve seen this with pre-Go-1.18 on NetBSD. But perhaps something is making it much more frequent.