go: runtime: nil-dereference panic refers to addr=0x8

What version of Go are you using (go version)?

$ go version
go version go1.18.7 linux/amd64

Does this issue reproduce with the latest release?

Yes, confirmed via playground.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/snap/go/9981"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/snap/go/9981/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.18.7"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build662566094=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Run the following program:

package main

type A struct {
	F int
}

func (a *A) Foo() {
	if a != nil {
		a.F++
	}
}

type B struct {
	A
}

func main() {
	var a *A
	a.Foo()
	var b *B
	b.Foo()
}

Link to a playground.

What did you expect to see?

No panic for both method calls.

What did you see instead?

Panic on the b.Foo() method call:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x454ea2]

goroutine 1 [running]:
main.main()
	/tmp/sandbox1079324543/prog.go:21 +0x2

Discussion

Technically, I would expect b.Foo() to be equivalent to (*A).Foo(nil) in this case, at least when the struct offset of embedded A field is zero.

It is clear that the following would case panics in any case (struct offset is not zero, equivalent to (*A).Foo(0x8)):

type B struct {
	F2 int
	A
}

As well as this code (always requires pointer de-reference):

type B struct {
	*A
}

If I understand correctly, as of now the behavior of b.Foo() is similar to (*A).Foo(&b.A), where first b.A causes an operation analogous to pointer de-reference. I think this is a surprising behavior that makes it hard to embed structs (promoting methods) and make the new type nil-safe.

The only non-error-prone way to fix it currently, is to make a named type instead of embedding and manually expose all the methods:

type B A

func (b *B) Foo() {
	(*A)(b).Foo()
}

As I mentioned previously, I see no technical reason why the compiler cannot call (*A).Foo(nil) instead of requiring an additional boilerplate. Otherwise it defeats the purpose of struct composition, when nil-safety is required for promoted methods.

I believe this change doesn’t break the backward compatibility promise, only makes more programs safe(er).

Open questions are:

  • Can this be implemented without slowing down existing programs?
  • Should the case of (*A).Foo(0x8) be translated to (*A).Foo(nil) as well, so that the struct offset doesn’t matter? This complicates the implementation, so likely is unnecessary.

I admit this is not a bug per-se and works “as intended”, however it’s surprising enough to be worth fixing. Also, when using unsafe/cgo, I was able to somehow confuse the compiler to emit case similar to (*A).Foo(0x8) (thus skipping the initial nil check). I will try to provide a reproducer for it as well.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 16 (15 by maintainers)

Commits related to this issue

Most upvoted comments

Yeah, the fix for #42673 fixes this one as well.

Ideally, if the compiler/runtime can enforce <nil ptr> + <any off> -> nil for method calls, it will make the original use case I described much safer (where methods of embedded struct at off=0 can test for a nil pointer, which is propagated from a parent struct being nil).

I hope that these cases with constants can start a larger discussion for possible spec change/clarification.

Well, with the CL above it still panic with addr=0x8. The SSA dump before scheduling is

v1 (?) = InitMem <mem>
v17 (?) = MOVQconst <*B> [0] (b[*B], b[*B])
v19 (+32) = InlMark <void> [2] v1
v20 (+25) = LoweredNilCheck <void> v17 v1
v22 (25) = InlMark <void> [4] v1
v24 (11) = MOVQconst <**A> [8]
v25 (+11) = MOVQload <*A> v24 v1 (~R0[*A], ~R0[*B], c[*B])
v33 (+33) = LoweredNilCheck <void> v25 v1

Basically, after inlining and constant folding, the compiler is smart enough to know that B is at constant address 0 and A is at constant address 8. We issue a nil check for B, but there is no scheduling edge between that nil check and the constant 8.

Usually if a pointer is derived from another pointer, that derivation (pointer addition, or folded into a load with offset) is scheduled later than the nil check. But in this case the pointer is really a constant, which can be scheduled arbitrarily (and happened to be before the nil check in this case).

I don’t see an easy way to enforce scheduling order for nil check vs. constants. If we know statically a pointer is constant nil, maybe we can make derived pointers all constant nil, i.e. (OffPtr [n] (ConstNil)) -> (ConstNil)? Not sure if that is too much…

That second case does seem like a potential quality of implementation issue. gccgo reports

panic: runtime error: invalid memory address or nil pointer dereference

goroutine 1 [running]:
main.B.NextB
	foo.go:25
main.main
	foo.go:32

That is, I think we agree that the program should crash, but the crash report from the gc compiler could be clearer as to where the bad nil dereference occurs.

Go 1.4 reports

[signal 0xb code=0x1 addr=0x0 pc=0x400c45]

goroutine 1 [running]:
main.(*B).NextB(0x0, 0x0)
	/home/iant/foo11.go:25 +0x15
main.main()
	/home/iant/foo11.go:32 +0x37

but after that it gets more confusing due to the fact that older gc releases were not as reliable at reporting failures in inlined functions. At least I think that’s the reason, I didn’t look too deeply. The reference to addr=0x8 starts in Go 1.10.

Reopening in case this is something we want to fix in the gc compiler.

I would expect b.Foo() to be equivalent to (*A).Foo(nil)

I’d expect b.Foo() be equivalent to b.A.Foo() (the embedding makes the .A implicit). And per spec, for a nil b, b.A panics.