go: proposal: unsafe: add Unreachable
When optimizing code, I often run up against cases in which the compiler is missing a key fact. Sometimes it could but does not infer it; sometimes there’s no reasonable way for the compiler to know.
In those cases, there is nothing to do but drop to assembly. (In normal code you could write if !x { panic }, but in many of these cases, that is prohibitively expense.)
I propose that we add unsafe.Assume. It accepts a boolean expression. The expression is typechecked but never evaluated. However, the compiler may assume that it evaluates to true when compiling other code.
I imagine the most common uses would be things like unsafe.Assume(p != nil), unsafe.Assume(0 <= i && i < len(s)), and unsafe.Assume(x < 64), for nil checks, bounds checks, and shift amounts, respectively.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 29
- Comments: 39 (33 by maintainers)
Commits related to this issue
- cmd/compile: add unsafe.Unreachable DO NOT SUBMIT This is a rudimentary implementation, so that I can experiment with it. Updates #30582 Change-Id: I2e3a9e776490a9b01d4f166014b133625e419418 — committed to josharian/go by josharian 5 years ago
- cmd/compile: add unsafe.Unreachable DO NOT SUBMIT This is a rudimentary implementation, so that I can experiment with it. Updates #30582 Change-Id: I2e3a9e776490a9b01d4f166014b133625e419418 — committed to josharian/go by josharian 5 years ago
- cmd/compile: add unsafe.Unreachable DO NOT SUBMIT This is a rudimentary implementation, so that I can experiment with it. Updates #30582 Change-Id: I2e3a9e776490a9b01d4f166014b133625e419418 — committed to josharian/go by josharian 5 years ago
I’m simultaneously awed and horrified. Congratudolences?
I like this a lot for my code, I’m terrified of what happens if other people have it.
I wrote a rudimentary implementation so that I could experiment with it: https://golang.org/cl/165358
Some data points, for
addVV_ginmath/big, based just on what I could hack in a half hour…Going from my recently-optimized CLs to adding a few unsafe.Unreachable annotations:
Going from regular to an unrolled loop, which requires yet more annotations:
This gets us within striking distance of the hand-optimized assembly for small n. Going from my unrolled Go+Unreachable code to assembly:
And the main difference now is that the hand-rolled assembly can do
ADC mem reg, rather than loading from memory into a register and thenADC reg reg. That’s fixable straightforwardly in the compiler. So I think it is possible this could let us remove the assembly entirely.I plan to do the ADC upgrade anyway. I’ll report back once I’ve done it; might be a few days.
None of the
unsafe.Unreachableannotations I’ve provided could ever be inferred by the compiler; they come from upstream invariants.Then you’re in UB world: It depends on the compiler and the details. I’m guessing that in practice, with cmd/compile right now, probably not too much. With gccgo, you probably start spewing random numbers until you generate the kubernetes code base.
I was thinking some more about the properties of
unsafe.Unreachable. Specifically:Unreachablestatement is ever reached, and should do so by default.stderroutput of such a crash should indicate where the invariant violation occurred.Unreachablestatement.It occurs to me that we actually already have a statement that satisfies those properties:
Since the
panicoccurs a new goroutine, the compiler should know that it cannot be recovered and the program will terminate, and the compiler and runtime are allowed to schedule and run that goroutine immediately. The traceback for a panic currently indicates where the panic occurs, and to my knowledge nothing prevents us from adding more information about how the goroutine was created.On the other hand, since the new goroutine does not communicate or synchronize with any other goroutine in the program — and, in particular, does not have any happens-before relationship with any remaining statement in
main.main— the spec (by my reading) does not require the compiler or runtime to ever schedule it at all, and thus a very aggressive optimizing compiler could omit it entirely.@griesemer points out that we added unsafe to provide the ability to do things that the language could not express but that needed to be available, specifically type-unsafe conversions. unsafe.Unreachable is different: it’s purely a compiler optimization, not giving new expressive power.
I worry a lot about debugability of buggy code, especially after years of suffering C and C++ compilers and “undefined behavior”. In general Go’s approach has been to prioritize safe code execution over absolute raw performance. C/C++ compilers have shown us where we end up when the compiler assumes things like “array indexes are always in bound” or “pointers are never null”. It’s true that unsafe.Unreachable would only produce more limited effects, since “always” and “never” are replaced by “in this specific instance”. Even so, I fear that adding unsafe.Unreachable will encourage overuse for the sake of performance and lead to code that misbehaves in mysterious ways.
Separately, the fact that the compiler is not smart enough to optimize away certain checks creates a constructive tension on developers that makes us search out new analyses or idioms that optimize better but are still safe. An example of this is the
_ = b[7]bounds check hint in encoding/binary’s implementation, which replaces 8 bounds checks by one. That’s not zero, but it stays safe and ends up being a significant win. If we’d had unsafe.Unreachable or unsafe.Assume, we’d have been under pressure to use it there instead of inventing something safe. I would rather keep the constructive tension and encourage people to look for other safe idioms, perhaps backed by new, safe compiler analyses.Another approach would be
unsafe.Unreachable(). The compiler can assume that no code path will lead to a call of that function. This effectively lets you writeand then at some point during code generation the compiler removes the
p == niltest and theUnreachablecall.I’m not sure we should go down this road. This is basically a “performance annotation” and it’s one more thing everyone needs to understand. I’m happy to give up 5% in performance to never have to make (or see) any such annotations.
There’s a lot of other annotations people could want.
unsafe.Unlikely,unsafe.UnrollThisLoop, etc.If we do this I like Ian’s API.
To me, this sounds like “Assume”. I would think “Assert” means if the assertion fails, the program crashes (more like the C assert function).
I think there is unanimity at this point that Unreachable is the better API.
@bcmills that’s a very clever bit of language-lawyering. But I think I’d rather be explicit. And force importing unsafe, with all the signaling and extra-linguistic guardrails that come with that.
If the compiler doesn’t know that
x <= -1, sure UB, but, if the compiler has proved thatx <= -1and you ask it to assume thatx > 0, it seems like it should error out.If I write your example above slightly differently (and encapsulate into a complete package so I can compile it):
It looks like the compiler is already smart enough to do the right thing. The inner loop appears to be:
or, copying from godbolt.org:
So at least for this code we may not have to do anything.
Interesting, but the idea that there should be a difference between debug mode and release mode is false. I have spent years doing life critical software in a large, famous, company, and there the rule was the production binary must be the tested binary, which almost always meant the binary with debug information available, in order to be able to do postmortem analysis in case of a crash.
Nevertheless
unsafe.Unreachableseems to give some substantial performance increases. But I think one of the strong points of Go is that is has few undefined behavior. So I think unsafe.Unreachable can still serve as a performance hint for statement that follow it, but it should always cause a panic in case it is reached, and the check itself that leads to unafe.Unreachable should not be optimized out in any case.https://ziglang.org/documentation/master/#unreachable
Perhaps optionally? I think that a compiler that simply ignored
unsafe.Unreachable/unsafe.Assumeshould be spec-compliant.Fair enough. That said, I think this is plausibly qualitatively different:
I can see wanting a way to hint about branch likeliness, although that’s not actually unsafe. And there have been myriad performance compiler requests around nil checks, BCE, etc., and almost none around branch likeliness.
Loop unrolling can be done manually when you really need it; these annotations cannot.
And if you want to see what the simple, hinted code looks like:
I’d be interested to see the benchmarks for
addVV_gwith the bounds checks hoisted out of the loop:I suspect the code using
unsafe.Unreachable()is faster for a small number of iterations. I would expect, however, for the performance of the two different implementations to converge as the number of iterations is increased.Also, if the compiler inlined code using
forloops (https://github.com/golang/go/issues/14768) then we might see bounds checks optimized/merged away in real code (probably not in micro-benchmarks). Inlining seems likely to be a big win foraddVV_gwhich has a very short and simple loop. It is also a massive advantage that pure Go implementations of small functions like this have over their assembly equivalents.I can see annotations like
unsafe.Unreachable()being useful when the slice index is coming from an external source that is known in-range. This happens a lot in the compiler. But this also seems like a scenario where we really do want a bounds check to catch bugs.EDIT: dropped the length check panics, I don’t think they are necessary for this use case and with them the function doesn’t work when
len(z) == 0.Based on the discussion above, this proposal seems like a likely decline. — rsc for the proposal review group
I apologize that my previous comment were a bit unclear.
As you targeted this proposal at the
unsafepackage, I believe we agree that its direct goal runs counter to the Go philosophy of being a “memory-safe” language and that it tries to increase undefined behavior in return for increased performance. If this proposal had been made specifically to remove bounds-checks that a programmer decided were unnecessary, I believe the proposal would have been dismissed out of hand, as it harkens back to the C philosophy of trusting the programmer to not do something stupid. In this era of cheap and abundant CPU power, I think these micro-optimizations are foolhardy. I think time is better spent helping equip the compiler to predict these more complicated relationships itself.This leads me to the leap that I made to turn this into a meaningful
assert()invariant and the exciting compiler optimizations possible if the compiler took full advantage of these assertion expressions. (I realize this extrapolation is somewhat of a 180° from the original intent of this proposal, but may result in similarly optimized code that is actually safer.)Firstly, let me clarify my expectations of assertions:
So, how can adding assertions make the code more optimized?
I realize this is essentially a counter-proposal to the spirit of the one you were proposing, and requires much deeper compiler inference to fully realize it’s full potential, but I think it can offer many of the gains you were looking for in a much safer way.
I welcome your comments and would like to hear alternate viewpoints defending the original proposal as well.
unsafe.Unreachablewould also address the sanitization problem: the compiler can always choose to insert dynamic checks — and report violations — on any path marked unreachable.So perhaps that’s another argument in favor of
UnreachableoverAssume.That seems dangerous to me: it adds a large space of undefined behavior (that is, any program that violates the assumption), but no way to validate that the program does not actually exhibit that behavior (contrast
-fsanitize=undefined).To me, the biggest advantage (by far!) of undefined behavior is that it can be sanitized: unlike a Go
panic(which can berecovered and relied upon for control flow, as in the case offmtwith nil pointers), a violation of anunsafe.Assumecondition always indicates a programmer error — and can be reported as such.So instead, I would propose that the expression always be evaluated, and the result discarded except in sanitized (perhaps race-detector-enabled?) builds. That makes it possible to optimize away arithmetic expressions and pure functions (by far the common use-cases), but allows a sanitizer to verify the assumptions — without producing inconsistent behavior if the expression has side-effects.