tailscale: userspace-networking incoming TCP doesn't always work right away
I’ve noticed a few times now, most recently with 1.12.3, that a tailscaled running in userspace-networking mode doesn’t always forward incoming TCP connections to localhost reliably for a bit after the process is started.
Note that I can ping it and the first curl fails, but then it works a bit later:
crow5k:~ $ ping foo
PING foo.tailscale.com.beta.tailscale.net (100.127.0.49): 56 data bytes
64 bytes from 100.127.0.49: icmp_seq=0 ttl=64 time=161.106 ms
^C
--- foo.tailscale.com.beta.tailscale.net ping statistics ---
2 packets transmitted, 1 packets received, 50.0% packet loss
round-trip min/avg/max/stddev = 161.106/161.106/161.106/0.000 ms
crow5k:~ $ curl http://foo:8383/debug/varz
curl: (7) Failed to connect to foo port 8383: Connection refused
crow5k:~ $ curl http://foo:8383/debug/varz
[works, output omitted]
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 1
- Comments: 25 (21 by maintainers)
Commits related to this issue
- go.mod: bump inet.af/netstack Updates #2642 (might fix it?) But might cause other problems. Change-Id: Id54af7c90a1206bc7018215957e20e954782b911 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.... — committed to tailscale/tailscale by bradfitz 3 years ago
- go.mod: bump inet.af/netstack Updates #2642 (might fix it?) But might cause other problems. Change-Id: Id54af7c90a1206bc7018215957e20e954782b911 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.... — committed to tailscale/tailscale by bradfitz 3 years ago
- go.mod: bump inet.af/netstack Updates #2642 (might fix it?) But might cause other problems. Change-Id: Id54af7c90a1206bc7018215957e20e954782b911 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.... — committed to tailscale/tailscale by bradfitz 3 years ago
- go.mod: bump inet.af/netstack Updates #2642 (I'd hoped, but doesn't seem to fix it) Change-Id: Id54af7c90a1206bc7018215957e20e954782b911 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com> — committed to tailscale/tailscale by bradfitz 3 years ago
- go.mod: bump inet.af/netstack Updates #2642 (I'd hoped, but doesn't seem to fix it) Change-Id: Id54af7c90a1206bc7018215957e20e954782b911 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com> — committed to tailscale/tailscale by bradfitz 3 years ago
- wgengine/netstack: don't remove 255.255.255.255/32 from netstack Updates #2642 (maybe fixes?) Change-Id: I37cb23f8e3f07a42a1a55a585689ca51c2be7c60 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale... — committed to tailscale/tailscale by bradfitz 3 years ago
- wgengine/netstack: don't remove 255.255.255.255/32 from netstack The intent of the updateIPs code is to add & remove IP addresses to netstack based on what we get from the netmap. But netstack itsel... — committed to tailscale/tailscale by bradfitz 3 years ago
- wgengine/netstack: don't remove 255.255.255.255/32 from netstack The intent of the updateIPs code is to add & remove IP addresses to netstack based on what we get from the netmap. But netstack itsel... — committed to tailscale/tailscale by bradfitz 3 years ago
- wgengine/netstack: add env knob to turn on netstack debug logs Except for the super verbose packet-level dumps. Keep those disabled by default with a const. Updates #2642 Change-Id: Ia9eae1677e8b3f... — committed to tailscale/tailscale by bradfitz 2 years ago
- wgengine/netstack: add env knob to turn on netstack debug logs Except for the super verbose packet-level dumps. Keep those disabled by default with a const. Updates #2642 Change-Id: Ia9eae1677e8b3f... — committed to tailscale/tailscale by bradfitz 2 years ago
- wgengine/netstack: add env knob to turn on netstack debug logs Except for the super verbose packet-level dumps. Keep those disabled by default with a const. Updates #2642 Change-Id: Ia9eae1677e8b3f... — committed to tailscale/tailscale by bradfitz 2 years ago
- wgengine/netstack: add missing error logging in a RST case Updates #2642 Change-Id: I9f2f8fd28fc980208b0739eb9caf9db7b0977c09 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com> — committed to tailscale/tailscale by bradfitz 2 years ago
- wgengine/netstack: add missing error logging in a RST case Updates #2642 Change-Id: I9f2f8fd28fc980208b0739eb9caf9db7b0977c09 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com> — committed to tailscale/tailscale by bradfitz 2 years ago
- wgengine/netstack: clear TCP ECN bits before giving to gvisor Updates #2642 Change-Id: Ic219442a2656dd9dc99ae1dd91e907fd3d924987 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com> — committed to tailscale/tailscale by bradfitz 2 years ago
- wgengine/netstack: clear TCP ECN bits before giving to gvisor Updates #2642 Change-Id: Ic219442a2656dd9dc99ae1dd91e907fd3d924987 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com> — committed to tailscale/tailscale by bradfitz 2 years ago
- wgengine/netstack: clear TCP ECN bits before giving to gvisor Updates #2642 Change-Id: Ic219442a2656dd9dc99ae1dd91e907fd3d924987 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com> — committed to tailscale/tailscale by bradfitz 2 years ago
- wgengine/netstack: clear TCP ECN bits before giving to gvisor Updates #2642 Change-Id: Ic219442a2656dd9dc99ae1dd91e907fd3d924987 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com> — committed to tailscale/tailscale by bradfitz 2 years ago
- wgengine/netstack: clear TCP ECN bits before giving to gvisor Updates #2642 Change-Id: Ic219442a2656dd9dc99ae1dd91e907fd3d924987 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com> — committed to tailscale/tailscale by bradfitz 2 years ago
- wgengine/netstack: add missing error logging in a RST case Updates #2642 Change-Id: I9f2f8fd28fc980208b0739eb9caf9db7b0977c09 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com> (cherry picked ... — committed to tailscale/tailscale by bradfitz 2 years ago
- wgengine/netstack: clear TCP ECN bits before giving to gvisor Updates #2642 Change-Id: Ic219442a2656dd9dc99ae1dd91e907fd3d924987 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com> (cherry pick... — committed to tailscale/tailscale by bradfitz 2 years ago
Found it!
Note the
s.flags != header.TCPFlagSyn.😂
I CAN REPRODUCE IT!!! Only on Linux, but good enough to convince me that https://github.com/tailscale/tailscale/pull/3770 is the fix.
On Linux:
(For me, it was set to 2 before, which means “Enable ECN when requested by incoming connections but do not request ECN on outgoing connections”)
But once you force ECN on on Linux, then modified that PR to log when it removed it:
Then you’ll see curl to the userspace tailscaled IP work every time, with the logging that it removed ECN:
But if you comment out the
RemoveECNBitsline instead, it fails every time:Seems 100% reproducible, which is great.
macOS does its weird 5% or 50% or automatic thing with ECN which makes it really hard to debug when ECN is actually being used. I think there’s a ioctl/sockopt to force ECN, but I’m not exactly sure what it is. But for now I don’t care, since I can reproduce perfectly from Linux.
Still not sure why gVisor hates ECN so much, but at least we have a workaround.
cc @danderson @DentonGentry
When the 1.22 tree is branched I’ll remove our ECN bit masking workaround on main and close this bug.
Fixed (with workaround) in 1.20.2 and 1.21.x unstable builds at the same time.
Keeping this open to track removing our workarounds and using the upstream fix (https://github.com/google/gvisor/commit/0b81b32c95f6643936d0894671d0b00809fe22d6) instead.
I dug through the git history on google/gvisor and it goes all the way back to the first code dump from the Google tree to github when the code at the time was:
…
@bradfitz Happy to help - your product is probably the single most magical piece of software I use on a daily basis these days 😃
Yes, I can reproduce 100% of the time when connecting to at least one particular host. Let me know if it’s easiest to connect live for a minute. Happy to demo.