flannel: Incorrect MAC Address observed for flannel interface across kubernetes cluster nodes
Incorrect MAC Address observed for flannel interface across cluster nodes.
Expected Behavior
While we perform ARP resolution for the IP of flannel, from all nodes of cluster, it is supposed to show the correct MAC address. Thats not happening in this case, we observe different values for MAC address for same flannel, across all the nodes.
Current Behavior
$ "arp -an | grep 192.168.143.192"
// Result of the above command from all the nodes
=========== 10.14.7.42 ===========
? (192.168.143.192) at 32:48:e7:bb:74:d8 [ether] PERM on flannel.1
=========== 10.14.7.44 ===========
? (192.168.143.192) at 52:83:c1:6b:df:08 [ether] PERM on flannel.1
=========== 10.14.7.55 ===========
? (192.168.143.192) at 32:48:e7:bb:74:d8 [ether] PERM on flannel.1
=========== 10.14.7.56 ===========
? (192.168.143.192) at 52:83:c1:6b:df:08 [ether] PERM on flannel.1
=========== 10.14.7.62 ===========
? (192.168.143.192) at 32:48:e7:bb:74:d8 [ether] PERM on flannel.1
=========== 10.14.7.63 ===========
? (192.168.143.192) at 2a:36:32:7b:41:32 [ether] PERM on flannel.1
=========== 10.14.7.64 ===========
? (192.168.143.192) at 32:48:e7:bb:74:d8 [ether] PERM on flannel.1
=========== 10.14.7.43 ===========
Non-zero exit status: 1
$ ifconfig flannel.1
flannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet 192.168.143.192 netmask 255.255.255.255 broadcast 0.0.0.0
inet6 fe80::2836:32ff:fe7b:4132 prefixlen 64 scopeid 0x20<link>
ether 2a:36:32:7b:41:32 txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 5 overruns 0 carrier 0 collisions 0
The values in etcd are observed to be correct, FYI
$ etcdctl --prefix /flannel/network get
/flannel/network/config
{
"EnableIPv4": true,
"Network": "192.168.128.0/18",
"SubnetLen": 26,
"Backend": {
"Type": "vxlan",
"DirectRouting": false
}
}
/flannel/network/subnets/192.168.134.192-26
{"PublicIP":"10.14.7.44","PublicIPv6":null,"BackendType":"vxlan","BackendData":{"VNI":1,"VtepMAC":"b6:f1:e3:69:06:ad"}}
/flannel/network/subnets/192.168.137.64-26
{"PublicIP":"10.14.7.55","PublicIPv6":null,"BackendType":"vxlan","BackendData":{"VNI":1,"VtepMAC":"66:14:5a:2b:ae:b0"}}
/flannel/network/subnets/192.168.140.192-26
{"PublicIP":"10.14.7.62","PublicIPv6":null,"BackendType":"vxlan","BackendData":{"VNI":1,"VtepMAC":"f2:17:19:89:06:eb"}}
/flannel/network/subnets/192.168.143.192-26
{"PublicIP":"10.14.7.43","PublicIPv6":null,"BackendType":"vxlan","BackendData":{"VNI":1,"VtepMAC":"2a:36:32:7b:41:32"}}
/flannel/network/subnets/192.168.144.128-26
{"PublicIP":"10.14.7.42","PublicIPv6":null,"BackendType":"vxlan","BackendData":{"VNI":1,"VtepMAC":"46:6e:38:8a:7e:c6"}}
/flannel/network/subnets/192.168.146.192-26
{"PublicIP":"10.14.7.64","PublicIPv6":null,"BackendType":"vxlan","BackendData":{"VNI":1,"VtepMAC":"1a:01:1e:7f:fa:1d"}}
/flannel/network/subnets/192.168.148.128-26
{"PublicIP":"10.14.7.56","PublicIPv6":null,"BackendType":"vxlan","BackendData":{"VNI":1,"VtepMAC":"02:26:18:53:4f:a8"}}
/flannel/network/subnets/192.168.152.128-26
{"PublicIP":"10.14.7.63","PublicIPv6":null,"BackendType":"vxlan","BackendData":{"VNI":1,"VtepMAC":"06:e9:34:03:4e:f4"}}
Possible Solution
Periodically resync the ARP entries.
Steps to Reproduce (for bugs)
Nothing special was done to reproduce this, but it was observed regularly in our environment.
Context
This is regularly causing communication issues between the pods in our environment. As part of preliminary investigation, as we tried to check for flannel logs, in journalctl, observed the following
Jul 27 05:23:15 system-test-03-cc30210035-node-2 flanneld[27730]: E0727 05:23:15.200382 27730 iptables.go:307] Failed to bootstrap IPTables: failed to apply partial iptables-restore unable to run iptables-restore (, ): exit status 4
Jul 27 05:23:15 system-test-03-cc30210035-node-2 flanneld[27730]: I0727 05:23:15.245615 27730 iptables.go:421] Some iptables rules are missing; deleting and recreating rules
Jul 27 05:23:15 system-test-03-cc30210035-node-2 flanneld[27730]: E0727 05:23:15.276349 27730 iptables.go:307] Failed to bootstrap IPTables: failed to apply partial iptables-restore unable to run iptables-restore (, ): exit status 4
Jul 27 05:23:15 system-test-03-cc30210035-node-2 flanneld[27730]: E0727 05:23:15.313107 27730 iptables.go:320] Failed to ensure iptables rules: error setting up rules: failed to apply partial iptables-restore unable to run iptables-re
store (, ): exit status 4
Jul 27 05:23:15 system-test-03-cc30210035-node-2 flanneld[27730]: I0727 05:23:15.320792 27730 iptables.go:421] Some iptables rules are missing; deleting and recreating rules
Jul 27 05:23:15 system-test-03-cc30210035-node-2 flanneld[27730]: I0727 05:23:15.491607 27730 iptables.go:283] bootstrap done
The firewall rules, for which it was failing, we tried to manually execute iptables-restore for the same. Then also ran into the issue:
$ cat test1.txt
*filter
-D FORWARD -m comment --comment "flanneld forward" -j FLANNEL-FWD
-A FORWARD -m comment --comment "flanneld forward" -j FLANNEL-FWD
-A FLANNEL-FWD -s 192.168.192.0/18 -m comment --comment "flanneld forward" -j ACCEPT
-A FLANNEL-FWD -d 192.168.192.0/18 -m comment --comment "flanneld forward" -j ACCEPT
COMMIT
$ sudo iptables-restore < test1.txt
iptables-restore v1.4.21: Couldn't load target `FLANNEL-FWD':No such file or directoryError occurred at line: 2
Try `iptables-restore -h' or 'iptables-restore --help' for more information.
Your Environment
- Flannel version: 0.22.0
- Backend used (e.g. vxlan or udp): VXLAN
- Etcd version: 3.5.8
- Kubernetes version (if used): 1.27.3
- Operating System and version: CentOS Linux release 7.9.2009 (Core)
- Link to your project (optional):
About this issue
- Original URL
- State: open
- Created a year ago
- Comments: 23 (15 by maintainers)
Right now when a new nodes is added to the cluster flannel will call that part of the code that you mentioned where it adds the MAC address that it get from the etcd entry. MAC address of an interface shouldn’t change if you somehow didn’t force that change. I am doing some tests to check if there are strange cases where it applies.