go: proposal: unsafe: add Unreachable

When optimizing code, I often run up against cases in which the compiler is missing a key fact. Sometimes it could but does not infer it; sometimes there’s no reasonable way for the compiler to know.

In those cases, there is nothing to do but drop to assembly. (In normal code you could write if !x { panic }, but in many of these cases, that is prohibitively expense.)

I propose that we add unsafe.Assume. It accepts a boolean expression. The expression is typechecked but never evaluated. However, the compiler may assume that it evaluates to true when compiling other code.

I imagine the most common uses would be things like unsafe.Assume(p != nil), unsafe.Assume(0 <= i && i < len(s)), and unsafe.Assume(x < 64), for nil checks, bounds checks, and shift amounts, respectively.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 29
  • Comments: 39 (33 by maintainers)

Commits related to this issue

Most upvoted comments

I’m simultaneously awed and horrified. Congratudolences?

I like this a lot for my code, I’m terrified of what happens if other people have it.

I wrote a rudimentary implementation so that I could experiment with it: https://golang.org/cl/165358

I’m happy to give up 5% in performance to never have to make (or see) any such annotations.

Some data points, for addVV_g in math/big, based just on what I could hack in a half hour…

Going from my recently-optimized CLs to adding a few unsafe.Unreachable annotations:

name            old time/op    new time/op    delta
AddVV/1-8         5.49ns ± 5%    4.36ns ± 1%  -20.58%  (p=0.000 n=45+41)
AddVV/2-8         6.95ns ± 4%    4.88ns ± 1%  -29.81%  (p=0.000 n=48+43)
AddVV/3-8         8.25ns ± 6%    6.18ns ± 1%  -25.14%  (p=0.000 n=44+46)
AddVV/4-8         9.34ns ± 4%    7.01ns ± 2%  -24.94%  (p=0.000 n=47+47)
AddVV/5-8         10.7ns ± 2%     7.7ns ± 4%  -28.48%  (p=0.000 n=45+44)
AddVV/10-8        17.7ns ± 4%    11.1ns ± 2%  -37.22%  (p=0.000 n=50+39)
AddVV/100-8        159ns ± 1%     118ns ± 2%  -25.78%  (p=0.000 n=34+42)
AddVV/1000-8      1.43µs ± 5%    1.14µs ± 3%  -19.99%  (p=0.000 n=47+49)
AddVV/10000-8     14.2µs ± 5%    11.4µs ± 2%  -20.15%  (p=0.000 n=49+50)
AddVV/100000-8     143µs ± 4%     113µs ± 2%  -21.01%  (p=0.000 n=48+48)

Going from regular to an unrolled loop, which requires yet more annotations:

name            old time/op    new time/op    delta
AddVV/1-8         6.38ns ± 1%    4.31ns ± 1%   -32.50%  (p=0.000 n=38+43)
AddVV/2-8         9.52ns ± 1%    5.17ns ± 1%   -45.67%  (p=0.000 n=43+44)
AddVV/3-8         11.4ns ± 1%     6.3ns ± 2%   -44.14%  (p=0.000 n=46+43)
AddVV/4-8         13.6ns ± 1%     7.2ns ± 1%   -47.09%  (p=0.000 n=44+40)
AddVV/5-8         22.5ns ± 3%     9.8ns ± 2%   -56.62%  (p=0.000 n=45+43)
AddVV/10-8        40.6ns ± 4%    14.5ns ± 4%   -64.35%  (p=0.000 n=48+49)
AddVV/100-8        414ns ± 2%     163ns ± 2%   -60.61%  (p=0.000 n=49+50)
AddVV/1000-8      4.04µs ± 2%    1.62µs ± 4%   -59.89%  (p=0.000 n=35+46)
AddVV/10000-8     40.3µs ± 1%    16.8µs ± 4%   -58.38%  (p=0.000 n=32+49)
AddVV/100000-8     403µs ± 2%     173µs ± 8%   -57.18%  (p=0.000 n=34+49)

This gets us within striking distance of the hand-optimized assembly for small n. Going from my unrolled Go+Unreachable code to assembly:

name            old time/op    new time/op    delta
AddVV/1-8         4.07ns ± 3%    4.31ns ± 1%    +5.76%  (p=0.000 n=42+43)
AddVV/2-8         4.78ns ± 2%    5.17ns ± 1%    +8.17%  (p=0.000 n=42+44)
AddVV/3-8         5.84ns ± 2%    6.34ns ± 2%    +8.65%  (p=0.000 n=46+43)
AddVV/4-8         6.75ns ± 4%    7.18ns ± 1%    +6.35%  (p=0.000 n=50+40)
AddVV/5-8         7.51ns ± 2%    9.76ns ± 2%   +29.86%  (p=0.000 n=47+43)
AddVV/10-8        9.84ns ± 4%   14.48ns ± 4%   +47.11%  (p=0.000 n=49+49)
AddVV/100-8       49.5ns ± 5%   163.1ns ± 2%  +229.25%  (p=0.000 n=48+50)
AddVV/1000-8       434ns ± 4%    1622ns ± 4%  +273.53%  (p=0.000 n=50+46)
AddVV/10000-8     5.50µs ± 4%   16.79µs ± 4%  +204.95%  (p=0.000 n=41+49)
AddVV/100000-8    61.0µs ± 9%   172.8µs ± 8%  +183.11%  (p=0.000 n=49+49)

And the main difference now is that the hand-rolled assembly can do ADC mem reg, rather than loading from memory into a register and then ADC reg reg. That’s fixable straightforwardly in the compiler. So I think it is possible this could let us remove the assembly entirely.

I plan to do the ADC upgrade anyway. I’ll report back once I’ve done it; might be a few days.

None of the unsafe.Unreachable annotations I’ve provided could ever be inferred by the compiler; they come from upstream invariants.

What happens if you ask the compiler to assume something it knows is false?

Then you’re in UB world: It depends on the compiler and the details. I’m guessing that in practice, with cmd/compile right now, probably not too much. With gccgo, you probably start spewing random numbers until you generate the kubernetes code base.

I was thinking some more about the properties of unsafe.Unreachable. Specifically:

  1. The program should be allowed to crash immediately and unrecoverably if the Unreachable statement is ever reached, and should do so by default.
    • …and the stderr output of such a crash should indicate where the invariant violation occurred.
  2. An aggressively optimizing compiler should be allowed to omit all of the code required to detect the execution of an Unreachable statement.

It occurs to me that we actually already have a statement that satisfies those properties:

func Unreachable() {
	go panic("unreachable")
}

Since the panic occurs a new goroutine, the compiler should know that it cannot be recovered and the program will terminate, and the compiler and runtime are allowed to schedule and run that goroutine immediately. The traceback for a panic currently indicates where the panic occurs, and to my knowledge nothing prevents us from adding more information about how the goroutine was created.

On the other hand, since the new goroutine does not communicate or synchronize with any other goroutine in the program — and, in particular, does not have any happens-before relationship with any remaining statement in main.main — the spec (by my reading) does not require the compiler or runtime to ever schedule it at all, and thus a very aggressive optimizing compiler could omit it entirely.

@griesemer points out that we added unsafe to provide the ability to do things that the language could not express but that needed to be available, specifically type-unsafe conversions. unsafe.Unreachable is different: it’s purely a compiler optimization, not giving new expressive power.

I worry a lot about debugability of buggy code, especially after years of suffering C and C++ compilers and “undefined behavior”. In general Go’s approach has been to prioritize safe code execution over absolute raw performance. C/C++ compilers have shown us where we end up when the compiler assumes things like “array indexes are always in bound” or “pointers are never null”. It’s true that unsafe.Unreachable would only produce more limited effects, since “always” and “never” are replaced by “in this specific instance”. Even so, I fear that adding unsafe.Unreachable will encourage overuse for the sake of performance and lead to code that misbehaves in mysterious ways.

Separately, the fact that the compiler is not smart enough to optimize away certain checks creates a constructive tension on developers that makes us search out new analyses or idioms that optimize better but are still safe. An example of this is the _ = b[7] bounds check hint in encoding/binary’s implementation, which replaces 8 bounds checks by one. That’s not zero, but it stays safe and ends up being a significant win. If we’d had unsafe.Unreachable or unsafe.Assume, we’d have been under pressure to use it there instead of inventing something safe. I would rather keep the constructive tension and encourage people to look for other safe idioms, perhaps backed by new, safe compiler analyses.

Another approach would be unsafe.Unreachable(). The compiler can assume that no code path will lead to a call of that function. This effectively lets you write

    if p == nil {
        unsafe.Unreachable()
    } else {
        // Here the compiler knows that p != nil.
    }

and then at some point during code generation the compiler removes the p == nil test and the Unreachable call.

I’m not sure we should go down this road. This is basically a “performance annotation” and it’s one more thing everyone needs to understand. I’m happy to give up 5% in performance to never have to make (or see) any such annotations.

There’s a lot of other annotations people could want. unsafe.Unlikely, unsafe.UnrollThisLoop, etc.

If we do this I like Ian’s API.

To me, this sounds like “Assume”. I would think “Assert” means if the assertion fails, the program crashes (more like the C assert function).

I think there is unanimity at this point that Unreachable is the better API.

@bcmills that’s a very clever bit of language-lawyering. But I think I’d rather be explicit. And force importing unsafe, with all the signaling and extra-linguistic guardrails that come with that.

If the compiler doesn’t know that x <= -1, sure UB, but, if the compiler has proved that x <= -1 and you ask it to assume that x > 0, it seems like it should error out.

If I write your example above slightly differently (and encapsulate into a complete package so I can compile it):

package main

import "math/bits"

type Word uint64

func addVV_g(z, x, y []Word) (c Word) {
	if len(x) != len(z) || len(y) != len(z) {
		panic("vector lengths don't match")
	}
	// lengths of x, y, z are the same
	for i := 0; i < len(z); i++ {
		zi, cc := bits.Add(uint(x[i]), uint(y[i]), uint(c))
		z[i] = Word(zi)
		c = Word(cc)
	}
	return
}

func main() {}

It looks like the compiler is already smart enough to do the right thing. The inner loop appears to be:

	0x0033 00051 (x.go:13)	NEGL	DX
	0x0035 00053 (x.go:13)	MOVQ	(DI)(CX*8), SI
	0x0039 00057 (x.go:13)	MOVQ	(R9)(CX*8), R8
	0x003d 00061 (x.go:13)	ADCQ	SI, R8
	0x0040 00064 (x.go:14)	MOVQ	R8, (AX)(CX*8)
	0x0044 00068 (x.go:13)	SBBQ	DX, DX
	0x0047 00071 (x.go:12)	INCQ	CX
	0x004a 00074 (x.go:13)	NEGQ	DX
	0x004d 00077 (x.go:12)	CMPQ	BX, CX
	0x0050 00080 (x.go:12)	JGT	51

or, copying from godbolt.org:

        JMP     addVV_g_pc77
addVV_g_pc51:
        NEGL    DX
        MOVQ    (DI)(CX*8), SI
        MOVQ    (R9)(CX*8), R8
        ADCQ    SI, R8
        MOVQ    R8, (AX)(CX*8)
        SBBQ    DX, DX
        INCQ    CX
        NEGQ    DX
addVV_g_pc77:
        CMPQ    BX, CX
        JGT     addVV_g_pc51

So at least for this code we may not have to do anything.

Interesting, but the idea that there should be a difference between debug mode and release mode is false. I have spent years doing life critical software in a large, famous, company, and there the rule was the production binary must be the tested binary, which almost always meant the binary with debug information available, in order to be able to do postmortem analysis in case of a crash.

Nevertheless unsafe.Unreachable seems to give some substantial performance increases. But I think one of the strong points of Go is that is has few undefined behavior. So I think unsafe.Unreachable can still serve as a performance hint for statement that follow it, but it should always cause a panic in case it is reached, and the check itself that leads to unafe.Unreachable should not be optimized out in any case.

it seems like it should error out.

Perhaps optionally? I think that a compiler that simply ignored unsafe.Unreachable/unsafe.Assume should be spec-compliant.

There’s a lot of other annotations people could want. unsafe.Unlikely, unsafe.UnrollThisLoop, etc.

Fair enough. That said, I think this is plausibly qualitatively different:

  • I can see wanting a way to hint about branch likeliness, although that’s not actually unsafe. And there have been myriad performance compiler requests around nil checks, BCE, etc., and almost none around branch likeliness.

  • Loop unrolling can be done manually when you really need it; these annotations cannot.

And if you want to see what the simple, hinted code looks like:

func addVV_g(z, x, y []Word) (c Word) {
	for i := 0; i < len(z); i++ {
		if i >= len(x) || i >= len(y) {
			// The calling code ensures that x and y are no longer than z.
			// The compiler doesn't see this; this hint prevents bounds
			// checks in the bits.Add call below.
			unsafe.Unreachable()
		}
		zi, cc := bits.Add(uint(x[i]), uint(y[i]), uint(c))
		z[i] = Word(zi)
		c = Word(cc)
	}
	return
}

I’d be interested to see the benchmarks for addVV_g with the bounds checks hoisted out of the loop:

func addVV_g(z, x, y []Word) (c Word) {
        n := len(z)
        x, y = x[:n:n], y[:n:n]
	for i := 0; i < n; i++ {
		zi, cc := bits.Add(uint(x[i]), uint(y[i]), uint(c))
		z[i] = Word(zi)
		c = Word(cc)
	}
	return
}

I suspect the code using unsafe.Unreachable() is faster for a small number of iterations. I would expect, however, for the performance of the two different implementations to converge as the number of iterations is increased.

Also, if the compiler inlined code using for loops (https://github.com/golang/go/issues/14768) then we might see bounds checks optimized/merged away in real code (probably not in micro-benchmarks). Inlining seems likely to be a big win for addVV_g which has a very short and simple loop. It is also a massive advantage that pure Go implementations of small functions like this have over their assembly equivalents.

I can see annotations like unsafe.Unreachable() being useful when the slice index is coming from an external source that is known in-range. This happens a lot in the compiler. But this also seems like a scenario where we really do want a bounds check to catch bugs.

EDIT: dropped the length check panics, I don’t think they are necessary for this use case and with them the function doesn’t work when len(z) == 0.

Based on the discussion above, this proposal seems like a likely decline. — rsc for the proposal review group

I apologize that my previous comment were a bit unclear.

As you targeted this proposal at the unsafe package, I believe we agree that its direct goal runs counter to the Go philosophy of being a “memory-safe” language and that it tries to increase undefined behavior in return for increased performance. If this proposal had been made specifically to remove bounds-checks that a programmer decided were unnecessary, I believe the proposal would have been dismissed out of hand, as it harkens back to the C philosophy of trusting the programmer to not do something stupid. In this era of cheap and abundant CPU power, I think these micro-optimizations are foolhardy. I think time is better spent helping equip the compiler to predict these more complicated relationships itself.

This leads me to the leap that I made to turn this into a meaningful assert() invariant and the exciting compiler optimizations possible if the compiler took full advantage of these assertion expressions. (I realize this extrapolation is somewhat of a 180° from the original intent of this proposal, but may result in similarly optimized code that is actually safer.)

Firstly, let me clarify my expectations of assertions:

  • assertions come from C and C++ where there are traditionally a distinct debug and production build. The debug build is used for development and testing, does lots of extra error checking, including assertions. Assertion failures are fatal and are typically only used to detect logic errors in the code, like array bounds errors. Circumstantial errors (like I/O) should never be checked with assertions. The theory is that all the bugs will be worked out of the debug build, so in production the fatal assertion failures will never happen, and are simply compiled out of the build as an optimization.
  • Part of the reason “memory-safe” languages like Go exist, is that we have learned from our mistakes, and decided that integrity checks, including bounds checking, are important enough to leave in production code. I’m not sure if a fatal crash is “acceptable production code behavior”, but there never was a deliberate intent to ignore assertion failures in production code, so much as removing them for optimization reasons because of the naive belief that they would never fail in production; and the somewhat quixotic notion that if one of these “impossible” situations did occur, it was preferable to gallantly fight on than to crash (so the user doesn’t loose their data).

So, how can adding assertions make the code more optimized?

  • Having a specific expression to deconstruct might give the compiler enough information to prove that it’s true. This would allow it to optimize exactly the way this proposal originally suggested
  • Similarly, the compiler may be able to prove that the expression can’t be true and generate a compile error
  • It may be able to work backwards to re-order the instructions, hoisting the assertions to operate on the function arguments at the beginning of the function.
  • These hoisted assertions can then be used by the compiler as function precondition decorators in a few interesting ways:
    • The compiler can identify function calls that would violate these preconditions and generate compile errors
    • They could identify function calls that can never violate the preconditions, and jump to an alternate function entry point, just past the precondition, skipping the assertion in this case.

I realize this is essentially a counter-proposal to the spirit of the one you were proposing, and requires much deeper compiler inference to fully realize it’s full potential, but I think it can offer many of the gains you were looking for in a much safer way.

I welcome your comments and would like to hear alternate viewpoints defending the original proposal as well.

unsafe.Unreachable would also address the sanitization problem: the compiler can always choose to insert dynamic checks — and report violations — on any path marked unreachable.

So perhaps that’s another argument in favor of Unreachable over Assume.

The expression is typechecked but never evaluated. However, the compiler may assume that it evaluates to true when compiling other code.

That seems dangerous to me: it adds a large space of undefined behavior (that is, any program that violates the assumption), but no way to validate that the program does not actually exhibit that behavior (contrast -fsanitize=undefined).

To me, the biggest advantage (by far!) of undefined behavior is that it can be sanitized: unlike a Go panic (which can be recovered and relied upon for control flow, as in the case of fmt with nil pointers), a violation of an unsafe.Assume condition always indicates a programmer error — and can be reported as such.

So instead, I would propose that the expression always be evaluated, and the result discarded except in sanitized (perhaps race-detector-enabled?) builds. That makes it possible to optimize away arithmetic expressions and pure functions (by far the common use-cases), but allows a sanitizer to verify the assumptions — without producing inconsistent behavior if the expression has side-effects.