go: net: slice bounds out of range

What version of Go are you using (go version)?

$ go version
go version go1.21rc2 darwin/arm64

Does this issue reproduce with the latest release?

It occurs with 1.21, 1.20, and earlier version too (untested)

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
darwin/amd64

What did you do?

Probably try to resolve an IP

What did you expect to see?

It resolve

What did you see instead?

panic: runtime error: slice bounds out of range [54:45]

goroutine 65 [running]:
internal/poll.(*FD).Write(0xc0001f6080, {0xc00015e002, 0x2d, 0x200})
	/opt/homebrew/Cellar/go/1.20.3/libexec/src/internal/poll/fd_unix.go:383 +0x49c
net.(*netFD).Write(0xc0001f6080, {0xc00015e002, 0x2d, 0x200})
	/opt/homebrew/Cellar/go/1.20.3/libexec/src/net/fd_posix.go:96 +0x48
net.(*conn).Write(0xc0001200e0, {0xc00015e002, 0x2d, 0x200})
	/opt/homebrew/Cellar/go/1.20.3/libexec/src/net/net.go:195 +0x88
net.dnsPacketRoundTrip({_, _}, _, {{{0x6d, 0x79, 0x69, 0x70, 0x2e, 0x6f, 0x70, ...}, ...}, ...}, ...)
	/opt/homebrew/Cellar/go/1.20.3/libexec/src/net/dnsclient_unix.go:102 +0x88
net.(*Resolver).exchange(_, {_, _}, {_, _}, {{{0x6d, 0x79, 0x69, 0x70, 0x2e, ...}, ...}, ...}, ...)
	/opt/homebrew/Cellar/go/1.20.3/libexec/src/net/dnsclient_unix.go:187 +0x3ec
net.(*Resolver).tryOneName(_, {_, _}, _, {_, _}, _)
	/opt/homebrew/Cellar/go/1.20.3/libexec/src/net/dnsclient_unix.go:277 +0x40c
net.(*Resolver).goLookupIPCNAMEOrder.func3.1(0x1c?)
	/opt/homebrew/Cellar/go/1.20.3/libexec/src/net/dnsclient_unix.go:653 +0xa0
created by net.(*Resolver).goLookupIPCNAMEOrder.func3
	/opt/homebrew/Cellar/go/1.20.3/libexec/src/net/dnsclient_unix.go:652 +0x244

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Comments: 26 (20 by maintainers)

Most upvoted comments

We actually have a similar issue on Intel Macs (go1.20.3, go1.20.5): panics while connecting to VPN using Cisco AnyConnect. arm64 macs running same go code compiled for Apple Silicon don’t panic.

It seems, at some point syscall.Write returns 33554436.

panic: runtime error: slice bounds out of range [33554436:43]
goroutine 7738 [running]:
internal/poll.(*FD).Write(0xc000842100, {0xc0005f01e0, 0x2b, 0x1c5})
  /usr/local/go/src/internal/poll/fd_unix.go:383 +0x4aa
net.(*netFD).Write(0xc000842100, {0xc0005f01e0?, 0xc000837890?, 0x100ed61e0?})
  /usr/local/go/src/net/fd_posix.go:96 +0x29
net.(*conn).Write(0xc000506000, {0xc0005f01e0?, 0xc0008378f0?, 0xc0005f01e0?})
  /usr/local/go/src/net/net.go:195 +0x45
crypto/tls.(*Conn).write(0xc0007dd500, {0xc0005f01e0?, 0x5?, 0x1c5?})
  /usr/local/go/src/crypto/tls/conn.go:923 +0x10d
crypto/tls.(*Conn).writeRecordLocked(0xc0007dd500, 0x17, {0xc000516000, 0x15, 0x1000})
  /usr/local/go/src/crypto/tls/conn.go:991 +0x354
crypto/tls.(*Conn).Write(0x0?, {0xc000516000, 0x15, 0x1000})
  /usr/local/go/src/crypto/tls/conn.go:1186 +0x411
net/http.http2stickyErrWriter.Write({{0x101381f18?, 0xc0007dd500?}, 0xc0004a2460?, 0xc0006602c0?}, {0xc000516000, 0x15, 0x1000})
  /usr/local/go/src/net/http/h2_bundle.go:7429 +0x149
bufio.(*Writer).Flush(0xc000596340)
  /usr/local/go/src/bufio/bufio.go:628 +0x62
net/http.(*http2ClientConn).writeHeaders(0xc000660180, 0x9, 0x0, 0x4000, {0xc00066db00?, 0x0?, 0x2403?})
  /usr/local/go/src/net/http/h2_bundle.go:8579 +0x195
net/http.(*http2clientStream).encodeAndWriteHeaders(0xc000acb380, 0xc0007f3900)
  /usr/local/go/src/net/http/h2_bundle.go:8455 +0x38e
net/http.(*http2clientStream).writeRequest(0xc000acb380, 0xc0007f3900)
  /usr/local/go/src/net/http/h2_bundle.go:8343 +0x528
net/http.(*http2clientStream).doRequest(0xc000ac34f0?, 0xc0005bc701?)
  /usr/local/go/src/net/http/h2_bundle.go:8261 +0x1e
created by net/http.(*http2ClientConn).RoundTrip
  /usr/local/go/src/net/http/h2_bundle.go:8190 +0x34a

What’s interesting is that the left index is always the same 33554436, if you convert it into hex you will get 0x02000004, which corresponds to write syscall: (SYSCALL_CLASS_UNIX << SYSCALL_CLASS_SHIFT) + 4

Can this be a trampoline messing with memory layout? or some kind of alignment issue?

UPD: Apple Silicon Macs panic as well but with arbitrary left index.

UPD2: Managed to build minimal repro script, need to be run while VPN client is establishing connection:

package main

import (
	"log"
	"net/http"
	"time"
)

func main() {
	for {
		time.Sleep(500 * time.Millisecond)
		log.Println("hello")
		resp, err := http.Get("https://google.com")
		if err != nil {
			continue
		}
		resp.Body.Close()
	}
}

Disabling HTTP KeepAlives “fixes” the issue. It seems, VPN client rewrites some network routes and when Go tries to reuse a connection from a pool something in libc or kernel breaks and Write() syscall returns incorrect value.

Can’t reproduce this bug on macOS Sonoma Beta 7 release, probably Apple fixed it.

  • Can you reproduce this same behavior with a program written in C?

I tried, but no luck so far.

When I run go reproducer under dtruss I can’t see any write calls returning more bytes than number of bytes passed in. So, I think it’s not kernel but libc or go runtime bug.

  • Does the Go reproducer also reproduce the bug in a darwin/amd64 binary?

Will try next week when I get access to my Intel Mac.

Disabling ipv6 also “fixes” this issue.

Personally I don’t think that the Go standard library should have to double-check that the write system call behaves as expected.

I agree with that in principle, but I also think that if we have reason to believe that a particular system call may be broken, it benefits our users to make the problem easier to diagnose — and the run-time cost of an else if nn > len(p) here should be negligible compared to the cost of the syscall.

(I don’t think we need to try to rush a check into 1.21 or backport it to older releases, but I do think we should consider it for 1.22 so that if this happens for other users they will be able to figure out what’s going on more easily.)

panic: runtime error: slice bounds out of range [54:45]

goroutine 65 [running]:
internal/poll.(*FD).Write(0xc0001f6080, {0xc00015e002, 0x2d, 0x200})
	/opt/homebrew/Cellar/go/1.20.3/libexec/src/internal/poll/fd_unix.go:383 +0x49c

The relevant block of code (in go1.20.3) is here: https://cs.opensource.google/go/go/+/refs/tags/go1.20.3:src/internal/poll/fd_unix.go;l=379-386;drc=a2baae6851a157d662dff7cc508659f66249698a

That would seem to imply that at that point nn is 54 and max is 45.

  • The upper bound on max is len(p) or nn + maxRW, whichever is smaller. (maxRW is 1 << 30, so in this case it must be len(p).)
  • The Write loop terminates when nn == len(p) after a call to syscall.Write with a slice of length max - nn.
  • The loop terminates when nn == len(p), and increments nn by the number of bytes reported by syscall.Write.

Unfortunately, the most plausible explanations both seem unlikely:

  • either the previous syscall.Write returned an n larger than len(p[nn:max]),
  • or a something in the program (cgo, or unsafe, or a bug in runtime or syscall, or a kernel or libc bug?) corrupted some local variable in (*FD).Write or syscall.Write or syscall.write.

The latter possibility makes me think of #60449, but note that that is for amd64 whereas this report is for arm64.

But the fact that this reproduces for you “[w]hile using the Mullvad VPN” makes me wonder if something about the VPN is causing the libc write call to return an incorrect count. Perhaps (*FD).Write should check for that explicitly and return an error for it?

(CC @ianlancetaylor, @golang/runtime)

Sorry I forgot to follow up, I tested on Windows directly and there was no issue as far as I can recall.

Wait, no. In that stack trace goroutine 65 is running, not panicking. Maybe that goroutine stack is a red herring.

@anacrolix, can you post the complete goroutine dump from a failure?