go: net: slice bounds out of range
What version of Go are you using (go version)?
$ go version go version go1.21rc2 darwin/arm64
Does this issue reproduce with the latest release?
It occurs with 1.21, 1.20, and earlier version too (untested)
What operating system and processor architecture are you using (go env)?
go env Output
$ go env darwin/amd64
What did you do?
Probably try to resolve an IP
What did you expect to see?
It resolve
What did you see instead?
panic: runtime error: slice bounds out of range [54:45]
goroutine 65 [running]:
internal/poll.(*FD).Write(0xc0001f6080, {0xc00015e002, 0x2d, 0x200})
/opt/homebrew/Cellar/go/1.20.3/libexec/src/internal/poll/fd_unix.go:383 +0x49c
net.(*netFD).Write(0xc0001f6080, {0xc00015e002, 0x2d, 0x200})
/opt/homebrew/Cellar/go/1.20.3/libexec/src/net/fd_posix.go:96 +0x48
net.(*conn).Write(0xc0001200e0, {0xc00015e002, 0x2d, 0x200})
/opt/homebrew/Cellar/go/1.20.3/libexec/src/net/net.go:195 +0x88
net.dnsPacketRoundTrip({_, _}, _, {{{0x6d, 0x79, 0x69, 0x70, 0x2e, 0x6f, 0x70, ...}, ...}, ...}, ...)
/opt/homebrew/Cellar/go/1.20.3/libexec/src/net/dnsclient_unix.go:102 +0x88
net.(*Resolver).exchange(_, {_, _}, {_, _}, {{{0x6d, 0x79, 0x69, 0x70, 0x2e, ...}, ...}, ...}, ...)
/opt/homebrew/Cellar/go/1.20.3/libexec/src/net/dnsclient_unix.go:187 +0x3ec
net.(*Resolver).tryOneName(_, {_, _}, _, {_, _}, _)
/opt/homebrew/Cellar/go/1.20.3/libexec/src/net/dnsclient_unix.go:277 +0x40c
net.(*Resolver).goLookupIPCNAMEOrder.func3.1(0x1c?)
/opt/homebrew/Cellar/go/1.20.3/libexec/src/net/dnsclient_unix.go:653 +0xa0
created by net.(*Resolver).goLookupIPCNAMEOrder.func3
/opt/homebrew/Cellar/go/1.20.3/libexec/src/net/dnsclient_unix.go:652 +0x244
About this issue
- Original URL
- State: open
- Created a year ago
- Comments: 26 (20 by maintainers)
We actually have a similar issue on Intel Macs (go1.20.3, go1.20.5): panics while connecting to VPN using Cisco AnyConnect. arm64 macs running same go code compiled for Apple Silicon don’t panic.
It seems, at some point
syscall.Writereturns33554436.What’s interesting is that the left index is always the same
33554436, if you convert it into hex you will get0x02000004, which corresponds to write syscall:(SYSCALL_CLASS_UNIX << SYSCALL_CLASS_SHIFT) + 4Can this be a trampoline messing with memory layout? or some kind of alignment issue?
UPD: Apple Silicon Macs panic as well but with arbitrary left index.
UPD2: Managed to build minimal repro script, need to be run while VPN client is establishing connection:
Disabling HTTP KeepAlives “fixes” the issue. It seems, VPN client rewrites some network routes and when Go tries to reuse a connection from a pool something in libc or kernel breaks and Write() syscall returns incorrect value.
Can’t reproduce this bug on macOS Sonoma Beta 7 release, probably Apple fixed it.
I tried, but no luck so far.
When I run go reproducer under dtruss I can’t see any
writecalls returning more bytes than number of bytes passed in. So, I think it’s not kernel but libc or go runtime bug.Will try next week when I get access to my Intel Mac.
Disabling ipv6 also “fixes” this issue.
I agree with that in principle, but I also think that if we have reason to believe that a particular system call may be broken, it benefits our users to make the problem easier to diagnose — and the run-time cost of an
else if nn > len(p)here should be negligible compared to the cost of the syscall.(I don’t think we need to try to rush a check into 1.21 or backport it to older releases, but I do think we should consider it for 1.22 so that if this happens for other users they will be able to figure out what’s going on more easily.)
The relevant block of code (in
go1.20.3) is here: https://cs.opensource.google/go/go/+/refs/tags/go1.20.3:src/internal/poll/fd_unix.go;l=379-386;drc=a2baae6851a157d662dff7cc508659f66249698aThat would seem to imply that at that point
nnis 54 andmaxis 45.maxislen(p)ornn + maxRW, whichever is smaller. (maxRWis1 << 30, so in this case it must belen(p).)Writeloop terminates whennn == len(p)after a call tosyscall.Writewith a slice of lengthmax - nn.nn == len(p), and incrementsnnby the number of bytes reported bysyscall.Write.Unfortunately, the most plausible explanations both seem unlikely:
syscall.Writereturned annlarger thanlen(p[nn:max]),cgo, orunsafe, or a bug inruntimeorsyscall, or a kernel or libc bug?) corrupted some local variable in(*FD).Writeorsyscall.Writeorsyscall.write.The latter possibility makes me think of #60449, but note that that is for
amd64whereas this report is forarm64.But the fact that this reproduces for you “[w]hile using the Mullvad VPN” makes me wonder if something about the VPN is causing the libc
writecall to return an incorrect count. Perhaps(*FD).Writeshould check for that explicitly and return an error for it?(CC @ianlancetaylor, @golang/runtime)
https://mullvad.net/en/blog/2023/9/13/bug-in-macos-14-sonoma-prevents-our-app-from-working/
Sorry I forgot to follow up, I tested on Windows directly and there was no issue as far as I can recall.
Wait, no. In that stack trace
goroutine 65is running, not panicking. Maybe that goroutine stack is a red herring.@anacrolix, can you post the complete goroutine dump from a failure?