go: runtime: possible memory corruption on FreeBSD
Several failures in the last month on FreeBSD builders have failure modes that are very difficult to explain, such as SIGSEGV
s in hot runtime paths accessing values that don’t quite make sense (e.g. spanOf
sees a bad arena entry, fixalloc
accesses an out-of-bounds slot, a broken return PC in a runtime stack frame). I suspect they share an underlying cause. Three issues have already been opened for these: #45977, #46103, #46182.
As far as I know, these all seem to be specific to FreeBSD, and even more specifically, the “race” and “12_2” builders.
The relevant logs are available below.
2021-05-04T20:50:35-d19e549/freebsd-amd64-race 2021-05-10T23:42:56-5c48951/freebsd-amd64-12_2 2021-05-14T16:42:01-3d324f1/freebsd-amd64-race
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 86 (59 by maintainers)
Commits related to this issue
- dashboard: add freebsd-[amd64|386]-13_0 builders This change adds builders for FreeBSD 13.0. Updates golang/go#46272 Fixes golang/go#45903 Change-Id: I92f68f8d5eeee2d3bae8864dbfe5ee61b93ea677 Revie... — committed to golang/build by cagedmantis 3 years ago
- dashboard: update freebsd-amd64-race builder host This change updates the version of FreeBSD used for the freebsd-amd64-race builder to 13.0. The slowbot aliases have been updated to use FreeBSD 13.0... — committed to golang/build by cagedmantis 3 years ago
- amd64: correct size of the SSE area in the xsave layout PR: https://github.com/golang/go/issues/46272 Reviewed by: emaste Sponsored by: The FreeBSD Foundation MFC after: 3 days — committed to freebsd/freebsd-src by kostikbel 3 years ago
- build/env/freebsd-amd64: add 13.0-STABLE snapshot option Including the fix for the XSAVE kernel bug. For golang/go#46272. Change-Id: Ie3a74db9199ac18c256447847c6cbd39105a8d00 Reviewed-on: https://g... — committed to golang/build by heschi 2 years ago
- dashboard,env/freebsd: add 12.3-STABLE With the XSAVE bug fix. For golang/go#49967, golang/go#46272. Change-Id: Ie0711ba83e4f15d26e6fe89e89c80b7c3ad2c23c Reviewed-on: https://go-review.googlesource... — committed to golang/build by heschi 2 years ago
- cmd/dist: log CPU model when testing Knowing whether test failures are correlated with specific CPU models on has proven useful on several issues. Log it for prior to testing so it is always availabl... — committed to golang/go by prattmic 3 years ago
- amd64: correct size of the SSE area in the xsave layout PR: https://github.com/golang/go/issues/46272 Reviewed by: emaste Sponsored by: The FreeBSD Foundation MFC after: 3 days — committed to bsdjhb/cheribsd by kostikbel 3 years ago
- cmd/dist: log CPU model when testing Knowing whether test failures are correlated with specific CPU models on has proven useful on several issues. Log it for prior to testing so it is always availabl... — committed to golang/go by prattmic 3 years ago
- cmd/dist: log CPU model when testing Knowing whether test failures are correlated with specific CPU models on has proven useful on several issues. Log it for prior to testing so it is always availabl... — committed to jproberts/go by prattmic 3 years ago
Extracting more bits from that failure log.
There are two tests failures, both using the same global test data table. They both failed on the 100,000 zero-byte entry. At first glance it looks like they failed in different ways, because they report different hash values, but they did not. The first is reporting the hash of all 100,000 bytes, while the second is reporting the hash of only the first half - 50,000 bytes. Both of them make copies in a string -> []byte conversion before running the hash code, but the error prints the original string. And the reported hashes for both the full and half string are correct for the corrupted string.
This suggests the string in the table, built at init time with strings.Repeat(“\x00”, 1e5), is the problem, not a copy.
Also,
GODEBUG=gctrace=1 ./adler32.test -test.short
literally doesn’t run a garbage collection. That suggests the GC is not involved, and in particular that the memory in question was not being recycled by Go. (Could be non-GC’ed memory freed and then reused for heap-allocated memory, but since it’s huge and allocated at init time, that seems unlikely as well.)That means in all likelihood Go mmap’ed new pages for this allocation. It crossed my mind maybe the OS is bad at zeroing, but we would have reinitialized the data in strings.Repeat.
strings.Repeat is interesting. For efficient copying, we do it by doubling the input as far as it will go: 1 byte becomes 2, 2 become 4, and so on, up to 32,768 in this case. Then we will double that, to 65,536, and then we append the first 34,464 bytes once more to complete the 100,000. So that’s why the pattern is there three times: it appeared during the doubling of the first 16kB to 32kB, and then it was copied twice more while tripling the 32kB into 100,000 bytes.
Now the question is: why did the 16kB copy introduce spurious bytes? That happens in strings.Builder.WriteString, which does:
In this case s is b.buf, as it exists before the append. So there’s no overlap, but the destination is adjacent to (just beyond) the source. That append boils down to runtime.memmove, and again the destination should already be zeroed by the OS, so it’s not that the move was short.
runtime.memmove on freebsd/amd64 depends on runtime.useAVXmemmove to decide whether to use AVX. That is set up as:
I’m not sure how to find out what processor we have in that trace. Let’s assume for now that useAVXmemmove is true.
The AVX code starts with:
CX and BX will be equal in this case, so as I read that, there should be no carry in JC and we fall through to the forward loop. And it’s not big enough for the big data case. So we end up in this nicely commented algorithm, which handles the end of the buffer specially:
Here’s the assembly:
So the bytes that ended up all FF are the ones that were stored in X8 through X12.
Sounds like something in FreeBSD is smashing those registers out from under us?
I got through 912 iterations of all.bash on 12_3 and 898 on 12_2. I got two failures that appear to be memory corruption on 12_2 and zero on 12_3. That alone isn’t enough to base particularly strong statistical conclusions on, I’m afraid, so I’m going to take a Bayesian out and say that in combination with the fact that we’re already pretty sure the bug was fixed, we can close this issue as fixed.
Thank you everyone!
The FreeBSD fix has been cherry-picked to stable/13 in https://github.com/freebsd/freebsd-src/commit/1d6ebddb62bc18833b21a15f8e7eb86102140100 and to stable/12 in https://github.com/freebsd/freebsd-src/commit//7e45b72844768d7fd5c3c4d4e29f4e84b4bc0402. We release regular weekly snapshot builds from these branches; I will leave a note here when it is available in an errata update.
I do not think right now, that the context switch is vulnerable.
That said, why do you want an app workaround while it is OS issue, and should be fixed in OS. The fix is already in HEAD and will be merged to stable branches shortly. Most likely, there will be errata notices for supported releases.
It only matters when there are two factors in play
This is where the problem happen. For states parts 0 (x87) and 1 (SSE/XMM), Intel does not provide enumeration of layout in CPUID, assuming that OS knows about the regions anyway. The bug was that amd64 kernel hardcoded 32bit size of XMM save area, effectively filling %XMM8-%XMM15 with garbage on signal return when init optimization kicked in, because only specified part of the SSE save area was copied from the canonical copy.
So to get this bug you must have CPU that does the init optimization at all (not just claim that it has it), and also you need to get a signal delivered at right time.
I would not expect this to be fixed in a stock FreeBSD 12.3 release image (but should be fixed by an errata update available later today via FreeBSD-update).
From the change referenced above (https://golang.org/cl/375695) it looks like Go has an entry host-freebsd-12_3 which uses image freebsd-amd64-123-stable-20211230 which is a snapshot built from the stable/12 branch in late Dec 2021, it will indeed have the fix.
The distinction is likely immaterial for Go CI, but in case anyone encounters this issue on a production FreeBSD 12.3 deployment they’ll want to ensure that they’ve updated to at least 12.3-RELEASE-p1.
This reproduces pretty consistently on a 13.0-RELEASE-p5 running in a bhyve VM with 6 cores, 8G RAM under an AMD Ryzen 7 5800X.
Also look at the uninitialized memory on this one:
The failure in https://build.golang.org/log/46d12b0622c0a0cc8f7ede6959235b1b06f6ecef shows the actual corrupted memory. In an allocation of 100,000 (0x186a0) zeros, these indexes ended up being 0xFF instead:
7fb0 - 7fff ffb0 - ffff 17fb0 - 17fff
!!!
For information, the
freebsd-amd64-12_2
builder was recently added as part of #44431, and in that same change, thefreebsd-amd64-race
builder was updated to also use FreeBSD 12.2 (up from FreeBSD 12.0 previously).CC @paulzhol, @dmgk, @cagedmantis.
@tarkhil We believe this is fixed by a FreeBSD update. You want to be using at least version 12.3-RELEASE-p1.
We believe this is fixed with FreeBSD 12.3-RELEASE-p1. I’m doing a stress test run on both the before and after builders to verify by running the following with VER=12_2 and VER=12_3:
Edited: Clarified that this is 12.3-RELEASE-p1, not just 12.3.
@kostikbel Thanks, that makes perfect sense.
For Go programs,
GODEBUG=asyncpreemptoff=1
will be a partial workaround by reducing the number of signals, but not complete.For a more complete workaround (I’m not sure we want to bother with this) I think we may be able to look at the signal context and if
XSTATE_BV[1]
(SSE state present) is zero, then we explicitly zero the XMM region of the XSAVE area in the signal context. I think this is OK modulo extra details I missed, per Intel SDM Vol 1, Ch 13.9 “Operation of XSAVEOPT”: “If RFBM[i] = 1, XSTATE_BV[i] is set to the value of XINUSE[i].”However, I think this would still leave us vulnerable to corruption if the kernel interrupted execution just for a context switch rather than a signal?
@kostikbel sorry for the delay. With your fix applied directly on the 13.0-RELEASE-p5 kernel, neither the gist nor is the ./adler32.test loop trigger under bhyve. Thanks!
XSAVE != XSAVES or XSAVEC. Yes, FreeBSD uses XSAVE/XSAVEOPT for saving FPU context either on context switches or on signal delivery (it is slighly more nuanced, sometimes kernel needs to save FPU registers for other reasons). For signals, it is then copied out to userspace as part of the ucontext, and then copied in and XRSTOR-ed as part of return from the signal handler.
Bad news are that I cannot reproduce the abort from avx_sig neither on my hosts nor in bhyve VM. BTW I updated the gist, there was a stupid typo with the signal generation, also the program also works under Linux. I tested on fresh HEAD and several days old stable/13, which might be relevant. There are some changes that could affect this problem, done around this moment https://cgit.freebsd.org/src/commit/?id=bd9e0f5df681da8b5ef05a587b4b5b07572d3fc2 But then I tried it on 12.2 with no abort as well.
It’s particularly interesting because the hash is running on 100,000 zero bytes, and the low 16 bits of the hash are 1 + sum of the bytes mod 65521, which of course is just 1. But the failure got 0x4fb1 in the low 16 bits. So clearly the bytes were not zeroed.
If only the print statement had not elided the bytes in the middle of the input, although maybe by then they would have printed as zeros.
We have a potentially interesting test case from #49862 that must be memory corruption on freebsd/amd64 in a very small and deterministic hash function. It’s particularly interesting that the function got the same wrong answer two out of the three times we’ve seen it fail: 2021-11-29T19:45:58-f598e29/freebsd-amd64-12_2 and 2021-11-29T22:02:45-1970e3e/freebsd-amd64-13_0.
Well, here’s a theory. We upgraded our builders to 12.2 and 11.4 at the same time. Perhaps its possible that there was some bug fix that was backported to 11.X but happens to be broken?
@problame Back when this first started happening, what FreeBSD did you upgrade from?
@problame Wow, thank you for that reference. That’s incredibly useful. Given the crashes that I’ve poked in that issue, this definitely seems to be the same thing. It’s good to know that this has been happening for y’all since at least January.
@dmitshur I’m not sure this qualifies as a release blocker anymore. Based on zrepl/zrepl#411 I don’t think it’s a new issue with Go 1.17.
FWIW, I think we have been hitting this bug in zrepl since FreeBSD 12.2 was released: https://github.com/zrepl/zrepl/issues/411
The workaround is to pin the process to a single core using
cpuset
.Another friendly ping via a weekly release meeting, since this is currently labeled release-blocker.
Since we’re approaching the RC and the porting policy doesn’t include FreeBSD as a first class port, we may need to move in the direction of resolving the release-blocking issue by documenting the known problem, rather than blocking the release on its resolution.