tailscale: Slow throughput over tsnet compared to Windows tailscaled
What is the issue?
I have a go http server that serves up https://github.com/librespeed/speedtest.
I made this server accessible on my tailnet, using both system tailscale (windows 1.50) and embedded tsnet.
On a remote node across the country where I am accessing the speedtest, I confirmed via tailscale status there’s a direct connection to both IPs.
When I invoke the browser based speed test, the speeds shown on the system tailscale IP will stabilize after a few seconds and in my case measure 35mbps. Hitting the same server on the same node using its tsnet IP, the speeds never stabilize, keep dropping, and are near 4mbps at the end of the test.
cc @eric
Steps to reproduce
full standalone repro to come
here are the gin-gonic routes being used to drive the speed test:
r.GET("/garbage", func(c *gin.Context) {
c.Header("Content-Description", "File Transfer")
c.Header("Content-Type", "application/octet-stream")
c.Header("Content-Disposition", "attachment; filename=random.dat")
c.Header("Content-Transfer-Encoding", "binary")
ckSize, _ := strconv.ParseInt(c.Query("ckSize"), 10, 64)
if ckSize == 0 || ckSize > 2048 {
ckSize = 4
}
data := make([]byte, 1024*1024)
io.ReadAtLeast(rand.Reader, data, len(data))
for ckSize > 0 {
_, err := c.Writer.Write(data)
if err != nil {
break
}
ckSize--
}
})
r.GET("/empty", func(c *gin.Context) {
c.Status(200)
})
Are there any recent changes that introduced the issue?
Not sure, but I upgraded tsnet from 1.48 to main (which includes a gvisor bump) and did not observe any improvement.
I also ran the same test against a linux and macos server and observed the same behavior where the speed declines continuously over tsnet
OS
Windows
OS version
Windows 11
Tailscale version
1.51.0-dev20231007
Other software
No response
Bug report
No response
About this issue
- Original URL
- State: open
- Created 9 months ago
- Comments: 25 (16 by maintainers)
Commits related to this issue
- tsnet: add CapturePcap method for debugging Application code can call the tsnet s.CapturePcap(filename) method to write all packets, sent and received, to a pcap file. The cleartext packets are writt... — committed to tailscale/tailscale by DentonGentry 7 months ago
- tsnet: add CapturePcap method for debugging Application code can call the tsnet s.CapturePcap(filename) method to write all packets, sent and received, to a pcap file. The cleartext packets are writt... — committed to tailscale/tailscale by DentonGentry 7 months ago
- tsnet: add CapturePcap method for debugging Application code can call the tsnet s.CapturePcap(filename) method to write all packets, sent and received, to a pcap file. The cleartext packets are writt... — committed to tailscale/tailscale by DentonGentry 7 months ago
- tsnet: add CapturePcap method for debugging Application code can call the tsnet s.CapturePcap(filename) method to write all packets, sent and received, to a pcap file. The cleartext packets are writt... — committed to tailscale/tailscale by DentonGentry 7 months ago
- tsnet: add CapturePcap method for debugging Application code can call the tsnet s.CapturePcap(filename) method to write all packets, sent and received, to a pcap file. The cleartext packets are writt... — committed to tailscale/tailscale by DentonGentry 7 months ago
- tsnet: add CapturePcap method for debugging Application code can call the tsnet s.CapturePcap(filename) method to write all packets, sent and received, to a pcap file. The cleartext packets are writt... — committed to tailscale/tailscale by DentonGentry 7 months ago
- wgengine/netstack: disable RACK on Windows Updates #9707 Signed-off-by: Jordan Whited <jordan@tailscale.com> — committed to tailscale/tailscale by jwhited 7 months ago
- wgengine/netstack: disable RACK on Windows Updates #9707 Signed-off-by: Jordan Whited <jordan@tailscale.com> — committed to tailscale/tailscale by jwhited 7 months ago
- wgengine/netstack: disable RACK on Windows (#10402) Updates #9707 Signed-off-by: Jordan Whited <jordan@tailscale.com> — committed to tailscale/tailscale by jwhited 7 months ago
Traffic is being generated by the tsnet app to send to the Mac. The Mac receives the frames and generates ACKs back to tsnet.
1/ Looking at the tsnet app sending packets, to the left of the 1.08 second timestamp, we see tsnet send a burst of four packets followed a short time later by a burst of three packets.
2/ Looking at the Mac between 1.2 and 1.205 seconds:
The timestamps are relative to the start of the capture and not synchronized between the Mac and tsnet systems, don’t pay attention to 1.08 versus 1.2 seconds.
3/ Looking at tsnet again, the ACKs sent by the MAC are not received until almost 1.086. Zooming in on that timestamp:
Clearly this was the ACKs for four packets and for three packets which the Mac sent some time ago in response to receiving the first transmission of these packets. They can’t possibly arrive instantaneously after retransmitting, the ACKs must have been sitting in a queue waiting to be processed.
To me it looks like tsnet is only processing ACKs when it decides to transmit something, so it finally receives the ACK after already having decided to retransmit. The retransmission is unnecessary.
There’s definitely some kind of bug in the flow with gvisor, and it is significantly worsened with SACK enabled. The results are also skewed when RACK is enabled, but disabling both does not result in satisfactory statistics on a local link.
I still for example see ~15k fast retransmits over a run with both RACK and SACK disabled. Disabling SACK will however improve throughput by 60-100%, depending on some other luck - this effectively mitigates part of the problem, but it’s not a solution.
I suspect this probably is also happening on Linux as well, but the impact may be lower.
I’ve not yet validated the part that this may not be occurring when running in privileged mode, certainly I would expect to have noticed if it was occurring in tun mode. There is no platform specific code on the gvisor side related to timers and retries, intuitions suggests this may be an issue with unprivileged userspace failing to set socket buffers and then bursting far too much traffic. What isn’t immediately obvious from that theory is why the window size doesn’t eventually get this under control, as retransmits continue throughout a test run. Plenty to look into still.
cc @jwhited who will also be interested in this.
There is no mechanism by which Windows would retransmit a UDP packet, UDP is send-and-forget. The retransmission is gvisor within the tsnet app deciding to re-send a TCP packet.
Because there is clearly some sort of Windows involvement in the result, my guess would be that either:
Using a Windows 11 VM running https://github.com/fancybits/tsnetspeedtest, I get about 25 Mbps downloaded from tsnet. A time-sequence graph from wireshark is attached. Zooming way in we see a lot of selective ACK and retransmissions.
Using the LAN to the same VM achieves 500 Mbps. A time-sequence graph shows no significant retransmissions.
I think this is as far as we’re going to be able to take it for now. We’d need to have a full team of engineers go dig into netstack at this point and then push any fixes upstream, and right now we don’t have staff available to do that. It will be a while before we can do significantly more work on tsnet throughput for Windows.
@tmm1 & @eric - can you re-test with https://github.com/tailscale/tailscale/commit/5e861c38718ffcde3ded6d2922ca464886e41321 or later? This commit improves performance significantly in my local testing of tsnet on Windows.
The results I gathered from before the change aligns with @DentonGentry’s findings in that gVisor appears to be slow to handle ACKs, or possibly mishandles them until one w/SACK option is seen. Without TCP RACK this behavior is less apparent as seen via less spurious retransmissions, and higher throughput.
I added a runtime/debug PrintBackTrace() in injectInbound(), to be sure of how it was being called.
The backtrace is:
This calls netstack.go:injectInboud(): https://github.com/tailscale/tailscale/blob/d5d84f1a68dc9495ed7b91b4655a869268ac3cfd/wgengine/netstack/netstack.go#L710
which calls ns.linkEP.InjectInbound() to send the packet in to netstack: https://github.com/tailscale/tailscale/blob/d5d84f1a68dc9495ed7b91b4655a869268ac3cfd/wgengine/netstack/netstack.go#L753
The Windows machine appears to be retransmitting packets which shouldn’t need to be retransmitted.
The test setup is a browser on a Mac connecting over Tailscale to the tsspeedtest app running on Windows. The pcaps are gathered on the Mac, where we have a full Tailscale installed using a tun device where we can capture the cleartext content inside the Wireguard tunnel. We don’t have a way to get a pcap of the cleartext from inside the tsnet app on Windows.
This is called a time-sequence graph, which shows packets transmitted as vertical lines, and the ACK going back as horizontal. It can be found in the Wireshark Statistics -> TCP Stream Graph selection.
The red lines are four packets which the Windows machine decided to retransmit for unclear reasons. The Mac had sent back an ACK, but the Windows system retransmitted anyway. The Windows system appears to either be losing ACKs, or at least is very tardy in processing them.
Packet dissections from the PCAP of the retransmission marked “A” are show below.
The impact of this is more than just the airtime spent retransmitting. The netstack TCP implementation will treat these retransmissions as congestion, and slow down.