calico: TCP offloading on vxlan.calico adaptor causing 63 second delays in VXLAN communications node->nodeport or node->clusterip:port.
I am experiencing a 63 second delay in VXLAN communications node->node:nodeport or node->clusterip:port. After inspecting pcaps on both sending and receiving nodes it appears related to TCP offloading on the vxlan.calico interface. Disabling this through ethtool appears to ‘resolve’ the issue, but I’m entirely unsure whether this is a good idea or not, or if there’s a better fix?
Expected Behavior
From a node, the following should work in all cases:
curl localhost: [nodeport]
curl [nodeip]:[nodeport]
curl [clusterip]:[port]
Current Behavior
consider the following:
A: node running pod [web], a simple web service (containous/whoami) B: node running pod [alpine], a base container; exec sh C: node not doing anything in particular.
With service defined:
$ kubectl describe svc whoami-cluster-nodeport
...
Type: NodePort
NodePort: web 32081/TCP
External Traffic Policy: Cluster
I get the following results when trying to access the web service:
from/to | [web_ip]:80 | [cluster_ip]:80 | localhost:32081 | [a_ip]:32081 | [b_ip]:32081 | [c_ip]:32081 |
---|---|---|---|---|---|---|
node A | ok | ok | ok | ok | ok | ok |
node B | ok | 63 seconds | 63 seconds | ok | 63 seconds | ok |
node C | ok | 63 seconds | 63 seconds | ok | ok | 63 seconds |
pod alpine | ok | ok | - | ok | 63 seconds | ok |
external host | - | - | - | ok | ok | ok |
Further, if I change replicas for the pods so that pods are on A & C (rr load balancing b/t 2 nodes), the 63 second delay will occur half of the time from those hosting nodes:
- from A: curl localhost:32081
- from A: curl [node_a_ip]:32081
The problem seems to stem from trying to route sourced from a node to another node. I did a trace and tcp dump from C -> A (via localhost:32081 on C… see below). On both nodes, the TCPDUMP shows repeated SYN packets attempting to establish the connection. They all show “bad udp cksum 0xffff -> 0x76dc!” in the results. After 63 seconds, a SYN packet is set with ‘no cksum’ and the connection is established.
I had to disable TCP Offloading… after issuing this command, curl localhost:32081 worked consistently on all nodes.
# ethtool --offload vxlan.calico rx off tx off
Actual changes:
rx-checksumming: off
tx-checksumming: off
tx-checksum-ip-generic: off
tcp-segmentation-offload: off
tx-tcp-segmentation: off [requested on]
tx-tcp-ecn-segmentation: off [requested on]
tx-tcp6-segmentation: off [requested on]
tx-tcp-mangleid-segmentation: off [requested on]
udp-fragmentation-offload: off [requested on]
So… I’m entirely unsure whether this is a good idea or not? Or whether there’s a way to fix this through iptables? Or whether this needs fixed in the OS/hosting (VSphere)?
TRACE from Node C (client)
# sudo perf trace --no-syscalls --event 'net:*' wget -q -O /dev/null localhost:32081
0.000 net:net_dev_queue:dev=vxlan.calico skbaddr=0xffff8c6ed93d00f8 len=66
0.033 net:net_dev_queue:dev=eth0 skbaddr=0xffff8c6ed93d00f8 len=116
0.084 net:net_dev_xmit:dev=eth0 skbaddr=0xffff8c6ed93d00f8 len=116 rc=0
0.088 net:net_dev_xmit:dev=vxlan.calico skbaddr=0xffff8c6ed93d00f8 len=66 rc=0
63122.053 net:net_dev_queue:dev=vxlan.calico skbaddr=0xffff8c607b6be0f8 len=165
63122.070 net:net_dev_queue:dev=eth0 skbaddr=0xffff8c607b6be0f8 len=215
63122.078 net:net_dev_xmit:dev=eth0 skbaddr=0xffff8c607b6be0f8 len=215 rc=0
63122.080 net:net_dev_xmit:dev=vxlan.calico skbaddr=0xffff8c607b6be0f8 len=165 rc=0
63123.135 net:net_dev_queue:dev=vxlan.calico skbaddr=0xffff8c607b6b90f8 len=54
63123.154 net:net_dev_queue:dev=eth0 skbaddr=0xffff8c607b6b90f8 len=104
63123.162 net:net_dev_xmit:dev=eth0 skbaddr=0xffff8c607b6b90f8 len=104 rc=0
63123.165 net:net_dev_xmit:dev=vxlan.calico skbaddr=0xffff8c607b6b90f8 len=54 rc=0
TCPDUMP from Node C (client)
# tcpdump -vv host [node_a_ip]
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
11:51:31.656624 IP (tos 0x0, ttl 64, id 59791, offset 0, flags [none], proto UDP (17), length 102)
nodeC.local.26898 > nodeA.local.4789: [bad udp cksum 0xffff -> 0x76dc!] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 8153, offset 0, flags [DF], proto TCP (6), length 52)
nodeC.39274 > 10.244.143.193.http: Flags [S], cksum 0xe849 (correct), seq 2295512834, win 43690, options [mss 65495,nop,nop,sackOK,nop,wscale 7], length 0
11:51:32.657185 IP (tos 0x0, ttl 64, id 60020, offset 0, flags [none], proto UDP (17), length 102)
nodeC.local.26898 > nodeA.local.4789: [bad udp cksum 0xffff -> 0x76dc!] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 8154, offset 0, flags [DF], proto TCP (6), length 52)
nodeC.39274 > 10.244.143.193.http: Flags [S], cksum 0xe849 (correct), seq 2295512834, win 43690, options [mss 65495,nop,nop,sackOK,nop,wscale 7], length 0
11:51:34.661166 IP (tos 0x0, ttl 64, id 61933, offset 0, flags [none], proto UDP (17), length 102)
nodeC.local.26898 > nodeA.local.4789: [bad udp cksum 0xffff -> 0x76dc!] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 8155, offset 0, flags [DF], proto TCP (6), length 52)
nodeC.39274 > 10.244.143.193.http: Flags [S], cksum 0xe849 (correct), seq 2295512834, win 43690, options [mss 65495,nop,nop,sackOK,nop,wscale 7], length 0
11:51:36.669116 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has nodeA.local tell nodeC.local, length 28
11:51:36.669385 ARP, Ethernet (len 6), IPv4 (len 4), Reply nodeA.local is-at 00:50:56:a3:d4:91 (oui Unknown), length 46
11:51:38.669155 IP (tos 0x0, ttl 64, id 65370, offset 0, flags [none], proto UDP (17), length 102)
nodeC.local.26898 > nodeA.local.4789: [bad udp cksum 0xffff -> 0x76dc!] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 8156, offset 0, flags [DF], proto TCP (6), length 52)
nodeC.39274 > 10.244.143.193.http: Flags [S], cksum 0xe849 (correct), seq 2295512834, win 43690, options [mss 65495,nop,nop,sackOK,nop,wscale 7], length 0
11:51:46.685204 IP (tos 0x0, ttl 64, id 3142, offset 0, flags [none], proto UDP (17), length 102)
nodeC.local.26898 > nodeA.local.4789: [bad udp cksum 0xffff -> 0x76dc!] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 8157, offset 0, flags [DF], proto TCP (6), length 52)
nodeC.39274 > 10.244.143.193.http: Flags [S], cksum 0xe849 (correct), seq 2295512834, win 43690, options [mss 65495,nop,nop,sackOK,nop,wscale 7], length 0
11:52:02.733178 IP (tos 0x0, ttl 64, id 5028, offset 0, flags [none], proto UDP (17), length 102)
nodeC.local.26898 > nodeA.local.4789: [bad udp cksum 0xffff -> 0x76dc!] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 8158, offset 0, flags [DF], proto TCP (6), length 52)
nodeC.39274 > 10.244.143.193.http: Flags [S], cksum 0xe849 (correct), seq 2295512834, win 43690, options [mss 65495,nop,nop,sackOK,nop,wscale 7], length 0
11:52:34.797203 IP (tos 0x0, ttl 64, id 30608, offset 0, flags [none], proto UDP (17), length 102)
nodeC.local.60563 > nodeA.local.4789: [no cksum] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 8159, offset 0, flags [DF], proto TCP (6), length 52)
nodeC.39274 > 10.244.143.193.http: Flags [S], cksum 0xe849 (correct), seq 2295512834, win 43690, options [mss 65495,nop,nop,sackOK,nop,wscale 7], length 0
11:52:34.797731 IP (tos 0x0, ttl 64, id 33394, offset 0, flags [none], proto UDP (17), length 102)
nodeA.local.36276 > nodeC.local.4789: [no cksum] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 52)
10.244.143.193.http > nodeC.39274: Flags [S.], cksum 0x4348 (correct), seq 4013223269, ack 2295512835, win 28000, options [mss 1400,nop,nop,sackOK,nop,wscale 7], length 0
11:52:34.797854 IP (tos 0x0, ttl 64, id 30609, offset 0, flags [none], proto UDP (17), length 90)
nodeC.local.60563 > nodeA.local.4789: [no cksum] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 8160, offset 0, flags [DF], proto TCP (6), length 40)
nodeC.39274 > 10.244.143.193.http: Flags [.], cksum 0xefe8 (correct), seq 1, ack 1, win 342, length 0
11:52:34.797936 IP (tos 0x0, ttl 64, id 30610, offset 0, flags [none], proto UDP (17), length 203)
nodeC.local.60563 > nodeA.local.4789: [no cksum] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 8161, offset 0, flags [DF], proto TCP (6), length 153)
nodeC.39274 > 10.244.143.193.http: Flags [P.], cksum 0x7ffc (correct), seq 1:114, ack 1, win 342, length 113: HTTP, length: 113
GET / HTTP/1.1
User-Agent: Wget/1.14 (linux-gnu)
Accept: */*
Host: localhost:32081
Connection: Keep-Alive
11:52:34.798117 IP (tos 0x0, ttl 64, id 33395, offset 0, flags [none], proto UDP (17), length 90)
nodeA.local.37789 > nodeC.local.4789: [no cksum] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 63, id 26251, offset 0, flags [DF], proto TCP (6), length 40)
10.244.143.193.http > nodeC.39274: Flags [.], cksum 0xeff2 (correct), seq 1, ack 114, win 219, length 0
11:52:34.798547 IP (tos 0x0, ttl 64, id 33396, offset 0, flags [none], proto UDP (17), length 458)
nodeA.local.37789 > nodeC.local.4789: [no cksum] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 63, id 26252, offset 0, flags [DF], proto TCP (6), length 408)
10.244.143.193.http > nodeC.39274: Flags [P.], cksum 0xe168 (correct), seq 1:369, ack 114, win 219, length 368: HTTP, length: 368
HTTP/1.1 200 OK
Date: Mon, 20 Jan 2020 16:52:34 GMT
Content-Length: 250
Content-Type: text/plain; charset=utf-8
Hostname: whoami-66686d967d-mzk8p
IP: 127.0.0.1
IP: ::1
IP: 10.244.143.193
IP: fe80::d830:89ff:fe7f:2703
RemoteAddr: 10.244.90.192:39274
GET / HTTP/1.1
Host: localhost:32081
User-Agent: Wget/1.14 (linux-gnu)
Accept: */*
Connection: Keep-Alive
11:52:34.798602 IP (tos 0x0, ttl 64, id 30611, offset 0, flags [none], proto UDP (17), length 90)
nodeC.local.60563 > nodeA.local.4789: [no cksum] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 8162, offset 0, flags [DF], proto TCP (6), length 40)
nodeC.39274 > 10.244.143.193.http: Flags [.], cksum 0xedff (correct), seq 114, ack 369, win 350, length 0
11:52:34.799551 IP (tos 0x0, ttl 64, id 30612, offset 0, flags [none], proto UDP (17), length 90)
nodeC.local.60563 > nodeA.local.4789: [no cksum] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 8163, offset 0, flags [DF], proto TCP (6), length 40)
nodeC.39274 > 10.244.143.193.http: Flags [F.], cksum 0xedfe (correct), seq 114, ack 369, win 350, length 0
11:52:34.799731 IP (tos 0x0, ttl 64, id 33397, offset 0, flags [none], proto UDP (17), length 90)
nodeA.local.37789 > nodeC.local.4789: [no cksum] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 63, id 26253, offset 0, flags [DF], proto TCP (6), length 40)
10.244.143.193.http > nodeC.39274: Flags [F.], cksum 0xee80 (correct), seq 369, ack 115, win 219, length 0
11:52:34.799779 IP (tos 0x0, ttl 64, id 30613, offset 0, flags [none], proto UDP (17), length 90)
nodeC.local.60563 > nodeA.local.4789: [no cksum] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 8164, offset 0, flags [DF], proto TCP (6), length 40)
nodeC.39274 > 10.244.143.193.http: Flags [.], cksum 0xedfd (correct), seq 115, ack 370, win 350, length 0
11:52:39.805144 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has nodeA.local tell nodeC.local, length 28
11:52:39.805399 ARP, Ethernet (len 6), IPv4 (len 4), Reply nodeA.local is-at 00:50:56:a3:d4:91 (oui Unknown), length 46
TCPDUMP from Node A (hosting service pod)
# tcpdump -vv host [node_c_ip]
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
11:51:31.656705 IP (tos 0x0, ttl 64, id 59791, offset 0, flags [none], proto UDP (17), length 102)
nodeC.local.26898 > nodeA.local.4789: [bad udp cksum 0xffff -> 0x76dc!] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 8153, offset 0, flags [DF], proto TCP (6), length 52)
10.244.90.192.39274 > 10.244.143.193.http: Flags [S], cksum 0xe849 (correct), seq 2295512834, win 43690, options [mss 65495,nop,nop,sackOK,nop,wscale 7], length 0
11:51:32.657297 IP (tos 0x0, ttl 64, id 60020, offset 0, flags [none], proto UDP (17), length 102)
nodeC.local.26898 > nodeA.local.4789: [bad udp cksum 0xffff -> 0x76dc!] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 8154, offset 0, flags [DF], proto TCP (6), length 52)
10.244.90.192.39274 > 10.244.143.193.http: Flags [S], cksum 0xe849 (correct), seq 2295512834, win 43690, options [mss 65495,nop,nop,sackOK,nop,wscale 7], length 0
11:51:34.661301 IP (tos 0x0, ttl 64, id 61933, offset 0, flags [none], proto UDP (17), length 102)
nodeC.local.26898 > nodeA.local.4789: [bad udp cksum 0xffff -> 0x76dc!] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 8155, offset 0, flags [DF], proto TCP (6), length 52)
10.244.90.192.39274 > 10.244.143.193.http: Flags [S], cksum 0xe849 (correct), seq 2295512834, win 43690, options [mss 65495,nop,nop,sackOK,nop,wscale 7], length 0
11:51:36.669200 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has nodeA.local tell nodeC.local, length 46
11:51:36.669224 ARP, Ethernet (len 6), IPv4 (len 4), Reply nodeA.local is-at 00:50:56:a3:d4:91 (oui Unknown), length 28
11:51:38.669297 IP (tos 0x0, ttl 64, id 65370, offset 0, flags [none], proto UDP (17), length 102)
nodeC.local.26898 > nodeA.local.4789: [bad udp cksum 0xffff -> 0x76dc!] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 8156, offset 0, flags [DF], proto TCP (6), length 52)
10.244.90.192.39274 > 10.244.143.193.http: Flags [S], cksum 0xe849 (correct), seq 2295512834, win 43690, options [mss 65495,nop,nop,sackOK,nop,wscale 7], length 0
11:51:46.685305 IP (tos 0x0, ttl 64, id 3142, offset 0, flags [none], proto UDP (17), length 102)
nodeC.local.26898 > nodeA.local.4789: [bad udp cksum 0xffff -> 0x76dc!] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 8157, offset 0, flags [DF], proto TCP (6), length 52)
10.244.90.192.39274 > 10.244.143.193.http: Flags [S], cksum 0xe849 (correct), seq 2295512834, win 43690, options [mss 65495,nop,nop,sackOK,nop,wscale 7], length 0
11:52:02.733268 IP (tos 0x0, ttl 64, id 5028, offset 0, flags [none], proto UDP (17), length 102)
nodeC.local.26898 > nodeA.local.4789: [bad udp cksum 0xffff -> 0x76dc!] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 8158, offset 0, flags [DF], proto TCP (6), length 52)
10.244.90.192.39274 > 10.244.143.193.http: Flags [S], cksum 0xe849 (correct), seq 2295512834, win 43690, options [mss 65495,nop,nop,sackOK,nop,wscale 7], length 0
11:52:34.797359 IP (tos 0x0, ttl 64, id 30608, offset 0, flags [none], proto UDP (17), length 102)
nodeC.local.60563 > nodeA.local.4789: [no cksum] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 8159, offset 0, flags [DF], proto TCP (6), length 52)
10.244.90.192.39274 > 10.244.143.193.http: Flags [S], cksum 0xe849 (correct), seq 2295512834, win 43690, options [mss 65495,nop,nop,sackOK,nop,wscale 7], length 0
11:52:34.797533 IP (tos 0x0, ttl 64, id 33394, offset 0, flags [none], proto UDP (17), length 102)
nodeA.local.36276 > nodeC.local.4789: [no cksum] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 52)
10.244.143.193.http > 10.244.90.192.39274: Flags [S.], cksum 0x4348 (correct), seq 4013223269, ack 2295512835, win 28000, options [mss 1400,nop,nop,sackOK,nop,wscale 7], length 0
11:52:34.797895 IP (tos 0x0, ttl 64, id 30609, offset 0, flags [none], proto UDP (17), length 90)
nodeC.local.60563 > nodeA.local.4789: [no cksum] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 8160, offset 0, flags [DF], proto TCP (6), length 40)
10.244.90.192.39274 > 10.244.143.193.http: Flags [.], cksum 0xefe8 (correct), seq 1, ack 1, win 342, length 0
11:52:34.797960 IP (tos 0x0, ttl 64, id 30610, offset 0, flags [none], proto UDP (17), length 203)
nodeC.local.60563 > nodeA.local.4789: [no cksum] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 8161, offset 0, flags [DF], proto TCP (6), length 153)
10.244.90.192.39274 > 10.244.143.193.http: Flags [P.], cksum 0x7ffc (correct), seq 1:114, ack 1, win 342, length 113: HTTP, length: 113
GET / HTTP/1.1
User-Agent: Wget/1.14 (linux-gnu)
Accept: */*
Host: localhost:32081
Connection: Keep-Alive
11:52:34.797995 IP (tos 0x0, ttl 64, id 33395, offset 0, flags [none], proto UDP (17), length 90)
nodeA.local.37789 > nodeC.local.4789: [no cksum] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 63, id 26251, offset 0, flags [DF], proto TCP (6), length 40)
10.244.143.193.http > 10.244.90.192.39274: Flags [.], cksum 0xeff2 (correct), seq 1, ack 114, win 219, length 0
11:52:34.798460 IP (tos 0x0, ttl 64, id 33396, offset 0, flags [none], proto UDP (17), length 458)
nodeA.local.37789 > nodeC.local.4789: [no cksum] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 63, id 26252, offset 0, flags [DF], proto TCP (6), length 408)
10.244.143.193.http > 10.244.90.192.39274: Flags [P.], cksum 0xe168 (correct), seq 1:369, ack 114, win 219, length 368: HTTP, length: 368
HTTP/1.1 200 OK
Date: Mon, 20 Jan 2020 16:52:34 GMT
Content-Length: 250
Content-Type: text/plain; charset=utf-8
Hostname: whoami-66686d967d-mzk8p
IP: 127.0.0.1
IP: ::1
IP: 10.244.143.193
IP: fe80::d830:89ff:fe7f:2703
RemoteAddr: 10.244.90.192:39274
GET / HTTP/1.1
Host: localhost:32081
User-Agent: Wget/1.14 (linux-gnu)
Accept: */*
Connection: Keep-Alive
11:52:34.798635 IP (tos 0x0, ttl 64, id 30611, offset 0, flags [none], proto UDP (17), length 90)
nodeC.local.60563 > nodeA.local.4789: [no cksum] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 8162, offset 0, flags [DF], proto TCP (6), length 40)
10.244.90.192.39274 > 10.244.143.193.http: Flags [.], cksum 0xedff (correct), seq 114, ack 369, win 350, length 0
11:52:34.799589 IP (tos 0x0, ttl 64, id 30612, offset 0, flags [none], proto UDP (17), length 90)
nodeC.local.60563 > nodeA.local.4789: [no cksum] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 8163, offset 0, flags [DF], proto TCP (6), length 40)
10.244.90.192.39274 > 10.244.143.193.http: Flags [F.], cksum 0xedfe (correct), seq 114, ack 369, win 350, length 0
11:52:34.799655 IP (tos 0x0, ttl 64, id 33397, offset 0, flags [none], proto UDP (17), length 90)
nodeA.local.37789 > nodeC.local.4789: [no cksum] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 63, id 26253, offset 0, flags [DF], proto TCP (6), length 40)
10.244.143.193.http > 10.244.90.192.39274: Flags [F.], cksum 0xee80 (correct), seq 369, ack 115, win 219, length 0
11:52:34.799814 IP (tos 0x0, ttl 64, id 30613, offset 0, flags [none], proto UDP (17), length 90)
nodeC.local.60563 > nodeA.local.4789: [no cksum] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 8164, offset 0, flags [DF], proto TCP (6), length 40)
10.244.90.192.39274 > 10.244.143.193.http: Flags [.], cksum 0xedfd (correct), seq 115, ack 370, win 350, length 0
11:52:39.805231 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has nodeA.local tell nodeC.local, length 46
11:52:39.805256 ARP, Ethernet (len 6), IPv4 (len 4), Reply nodeA.local is-at 00:50:56:a3:d4:91 (oui Unknown), length 28
Possible Solution
# ethtool --offload vxlan.calico rx off tx off
Steps to Reproduce (for bugs)
- kubeadm init
- install calico configured for vxlan
- install a simple web pod and a nodeport service
- attempt to access nodeport from cluster
Context
There are times when we need to be able to access a service from a node (eg, log shipping from the node to a hosted service, hosted app api access, k8s hosted registry) where this defect will interfere with normal communications
Your Environment
- Calico version: v3.11
- Orchestrator version (e.g. kubernetes, mesos, rkt): k8s 1.17
- Operating System and version: RHEL 7 (version 3.10.0-1062.9.1.el7.x86_64)
- Link to your project (optional):
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 5
- Comments: 31 (8 by maintainers)
Last Kubernetes release shoud fix this issue. https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#changes-by-kind
I tried it in the following environment:
Problem still there, i have to
sudo ethtool --offload vxlan.calico rx off tx off
on all hosts to have a working cluster. Did you found a fix that doesn’t require to disable vxlan.calico interface offload? This workaround isn’t persistent after reboot so it can’t be applied in a production environment.Hello,
We have the exact same issue on CentOS 7 (3.10.0-1062.9.1.el7.x86_64) running Calico/Canal and flannel. After spending the better part of a week trying to figure out why two thirds of our cluster was unable to reliably to talk to the other third I stumbled upon this issue.
I can report that disabling offloading completely works around this issue. In our case, the command is:
sudo ethtool --offload flannel.1 rx off tx off
(because we are running flannel).Your Environment
I’m just updating this thread based on a recent Slack conversation with regards to a very similar problem with a valid workaround. We were seeing traffic fail after converting the overlay to VXLAN, specifically for UDP traffic when going from kubenode -> kubernetes service clusterIP, TCP seemed fine. @fasaxc recommended that we try Calico 3.20.0 and use a new feature detection override that’s available, which I can confirm as making this traffic flow now work.
Details of the setup are
Kubernetes: 1.20.8 OS: Flatcar 2765.2.1 Kernel: 5.10.21 Calico: v3.20.0 Overlay Type: VXLAN
Example configuration used to apply the
ChecksumOffloadBroken=true
config:If anyone is interested by a reboot-proof solution to apply the workarround (disable offloading on
vxlan.calico
), you can use these 2 files:/etc/systemd/system/disable-offloading-on-vxlan.service
/usr/local/bin/disable-offloading-on-vxlan
On my Kubernetes 1.18.5 and CentOS 7 cluster i use a kube-proxy custom image.
ethtool --offload interface rx off tx off
thx it works. flannel+k8s1.17.2 centos7. but how do you know disable tcp offload works?
FTR Calico does SNAT traffic from hosts to service VIPs that get load balanced to remote pods when in VXLAN (or IPIP mode). We do that because the source IP selection algorithm chooses the source IP based on the the route that appears to apply to the service VIP. The, after the source IP selection, the dest IP is changed by kube-proxy’s DNAT. The source IP is then “wrong”; it’ll be the eth0 IP, when it needs to be the Calico “tunnel IP” to reach a remote pod. Hence, we SNAT to the tunnel IP in that case.
FELIX_FEATUREDETECTOVERRIDE="MASQFullyRandom=false"
in order to disable our use of--random-fully
.FWIW, the upstream kernel patch that fixed the issues we had root-caused for OpenShift and some Kubernetes use-cases is https://github.com/torvalds/linux/commit/ea64d8d6c675c0bb712689b13810301de9d8f77a and is present in the 5.7 and later kernels, and the RHEL 8.2’s kernel-4.18.0-193.13.2.el8_2 and later as of 2020-Jul-21. I presume CentOS 8.2 has this fix already.
Other distros (Ubuntu 20.04.1) may not yet have it, if they haven’t updated their kernel or backported the patch.
Still using a custom kube-proxy Dockerfile. Kubernetes 1.18.6 fresh install doesn’t fix the problem. Maybe Calico 1.16 with the following option can fix the problem https://github.com/projectcalico/libcalico-go/pull/1264
I don’t know exactly what cases were triggering that upstream. There are weird edge cases, like the rule that says if a pod connects to a service that it is an endpoint of, it has to be SNAT’ed. So given pods A and B in Service X on different nodes, a connection from pod A to Service X’s Cluster IP to pod B would end up both SNAT’ed and tunneled over the VXLAN.
But in general, if you have any rules that do “if marked then masquerade”, it is best to rewrite them to do “if marked then unmark and then masquerade”.
Same issue with k8s 1.18 + ubuntu 16.04 /w calico 3.11, it causes 3s delays only.
ethtool --offload interface rx off tx off
can workaround perfectly!There’s a thread in SIG network about this: https://groups.google.com/forum/#!topic/kubernetes-sig-network/JxkTLd4M8WM
Summary so far seems to be that this is a kernel bug related to VXLAN offload where the checksum calculation is not properly offloaded.