tailscale: gvisor: panic: Incrementing non-positive count 0xc000533180 on stack.PacketBuffer
What is the issue?
From email report:
we compiled 1.22.1 and are experiencing problems on two of our macOS machines. tailscaled crashes directly after startup.
The ports 6883, 6885 are used by a local program which is running in the background before tailscaled is started.
We do see this error on our 2 Intel macOS systems. Arm macOS does not show the error.
We had a similar problem with 1.22.0. Downgrading on one of the intel machines to 1.20.4 solved the problem.
2022/03/10 15:08:59 logtail started
2022/03/10 15:08:59 Program starting: v1.22.1-tc8fb4f8c7, Go 1.17.8-tsdce70b6: []string{"/usr/local/bin/tailscaled", "-state=/var/db/tailscale/tailscaled.state", "-tun", "utun9"}
2022/03/10 15:08:59 LogID: 459c4a8ac464c58028859fb0d0bb79ee88adf8db6a27e083c7a7d63242103f9e
2022/03/10 15:08:59 logpolicy: using system state directory "/Library/Tailscale"
2022/03/10 15:08:59 wgengine.NewUserspaceEngine(tun "utun9") ...
2022/03/10 15:08:59 dns: using dns.noopManager
2022/03/10 15:08:59 link state: interfaces.State{defaultRoute=en13 ifs={en13:[10.0.5.6/24]} v4=true v6=false}
2022/03/10 15:08:59 magicsock: disco key = d:f376519d5ab5ccd3
2022/03/10 15:08:59 Creating wireguard device...
2022/03/10 15:08:59 Bringing wireguard device up...
2022/03/10 15:08:59 Bringing router up...
2022/03/10 15:08:59 external route: up
2022/03/10 15:08:59 Clearing router settings...
2022/03/10 15:08:59 Starting link monitor...
2022/03/10 15:08:59 Engine created.
2022/03/10 15:09:01 Start
2022/03/10 15:09:01 using backend prefs for "_daemon": Prefs{ra=true dns=true want=true routes=[0.0.0.0/0 ::/0 10.0.5.0/24] snat=true Persist{lm=, o=, n=[jR0DB] u="REDACTED"}}
2022/03/10 15:09:01 Backend: logs: be:REDACTED fe:
2022/03/10 15:09:01 control: client.Login(false, 0)
2022/03/10 15:09:01 control: doLogin(regen=false, hasUrl=false)
2022/03/10 15:09:01 control: control server key [nlFWp] from https://controlplane.tailscale.com/
2022/03/10 15:09:01 control: RegisterReq: onode= node=[jR0DB] fup=false
2022/03/10 15:09:02 control: RegisterReq: got response; nodeKeyExpired=false, machineAuthorized=true; authURL=false
2022/03/10 15:09:02 active login: REDACTED
2022/03/10 15:09:02 Switching ipn state NoState -> Starting (WantRunning=true, nm=true)
2022/03/10 15:09:02 health("overall"): error: state=Starting, wantRunning=true
2022/03/10 15:09:02 magicsock: SetPrivateKey called (init)
2022/03/10 15:09:02 wgengine: Reconfig: configuring userspace wireguard config (with 1/10 peers)
2022/03/10 15:09:02 wgengine: Reconfig: configuring router
2022/03/10 15:09:02 wgengine: Reconfig: configuring DNS
2022/03/10 15:09:02 dns: Set: {DefaultResolvers:[] Routes:{} SearchDomains:[] Hosts:11}
2022/03/10 15:09:02 dns: Resolvercfg: {Routes:{} Hosts:11 LocalDomains:[]}
2022/03/10 15:09:02 dns: OScfg: {Nameservers:[] SearchDomains:[] MatchDomains:[]}
2022/03/10 15:09:02 peerapi: serving on http://100.x.y.z:35528/
2022/03/10 15:09:02 peerapi: serving on http://[fd7a:115c:a1e0:ab12:q:r:s:t]:35528/
2022/03/10 15:09:02 magicsock: home is now derp-1 (nyc)
2022/03/10 15:09:02 magicsock: adding connection to derp-1 for home-keep-alive
2022/03/10 15:09:02 control: NetInfo: NetInfo{varies=false hairpin=false ipv6=false udp=true derp=#1 portmap= link=""}
2022/03/10 15:09:02 magicsock: 1 active derp conns: derp-1=cr0s,wr0s
2022/03/10 15:09:02 derphttp.Client.Connect: connecting to derp-1 (nyc)
2022/03/10 15:09:02 Switching ipn state Starting -> Running (WantRunning=true, nm=true)
2022/03/10 15:09:02 magicsock: endpoints changed: a.b.c.d:57147 (stun), 10.0.5.6:57147 (local)
2022/03/10 15:09:02 magicsock: adding connection to derp-4 for [/+7hj]
2022/03/10 15:09:02 magicsock: 2 active derp conns: derp-1=cr83ms,wr83ms derp-4=cr0s,wr0s
2022/03/10 15:09:02 derphttp.Client.Recv: connecting to derp-4 (fra)
2022/03/10 15:09:03 magicsock: derp-1 connected; connGen=1
2022/03/10 15:09:03 health("overall"): ok
2022/03/10 15:09:03 magicsock: derp-4 connected; connGen=1
2022/03/10 15:09:03 wgengine: idle peer [HOh7S] now active, reconfiguring wireguard
2022/03/10 15:09:03 wgengine: Reconfig: configuring userspace wireguard config (with 2/10 peers)
2022/03/10 15:09:03 magicsock: disco: node [HOh7S] d:c2b10c247333c6cc now using 10.0.5.8:62659
2022/03/10 15:09:03 Accept: UDP{100.x.y.1:6883 > 10.0.5.6:6885} 48 ok
2022/03/10 15:09:03 Accept: UDP{100.x.y.1:6883 > 10.0.5.6:6885} 48 ok
2022/03/10 15:09:03 Accept: UDP{100.x.y.1:6883 > 10.0.5.6:6885} 48 ok
2022/03/10 15:09:03 netstack: could not bind local port 6883: listen udp 0.0.0.0:6883: bind: address already in use, trying again with random port
panic: Incrementing non-positive count 0xc000533180 on stack.PacketBuffer
Steps to reproduce
No response
Are there any recent changes that introduced the issue?
Panic first appeared in 1.22.0.
OS
macOS
OS version
No response
Tailscale version
1.22
Bug report
No response
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 1
- Comments: 21 (9 by maintainers)
Commits related to this issue
- go.mod: bump netstack for clone reset fix In tracking down issue #4144 and reading through the netstack code in detail, I discovered that the packet buf Clone path did not reset the packetbuf it was ... — committed to tailscale/tailscale by raggi 2 years ago
- go.mod: bump netstack for clone reset fix In tracking down issue #4144 and reading through the netstack code in detail, I discovered that the packet buf Clone path did not reset the packetbuf it was ... — committed to tailscale/tailscale by raggi 2 years ago
- go.mod: bump netstack for clone reset fix (#4379) In tracking down issue #4144 and reading through the netstack code in detail, I discovered that the packet buf Clone path did not reset the packetb... — committed to tailscale/tailscale by raggi 2 years ago
Ok, reproduced.
The crash is caused by fragment handling in the userspace networking path. The easiest method to reproduce the issue is to use a normal mode configuration to send a larger than host mtu packet to a node in userspace-networking.
Further digging in gvisor reveals it’s fixed in https://github.com/google/gvisor/commit/6a28dc7c59632b4007a095377073b8b74df85bea so this issue will be addressed with the 1.24 release.