flannel: Incorrect MAC Address observed for flannel interface across kubernetes cluster nodes

Incorrect MAC Address observed for flannel interface across cluster nodes.

Expected Behavior

While we perform ARP resolution for the IP of flannel, from all nodes of cluster, it is supposed to show the correct MAC address. Thats not happening in this case, we observe different values for MAC address for same flannel, across all the nodes.

Current Behavior

$ "arp -an | grep 192.168.143.192"
// Result of the above command from all the nodes
=========== 10.14.7.42 ===========
? (192.168.143.192) at 32:48:e7:bb:74:d8 [ether] PERM on flannel.1
=========== 10.14.7.44 ===========
? (192.168.143.192) at 52:83:c1:6b:df:08 [ether] PERM on flannel.1
=========== 10.14.7.55 ===========
? (192.168.143.192) at 32:48:e7:bb:74:d8 [ether] PERM on flannel.1
=========== 10.14.7.56 ===========
? (192.168.143.192) at 52:83:c1:6b:df:08 [ether] PERM on flannel.1
=========== 10.14.7.62 ===========
? (192.168.143.192) at 32:48:e7:bb:74:d8 [ether] PERM on flannel.1
=========== 10.14.7.63 ===========
? (192.168.143.192) at 2a:36:32:7b:41:32 [ether] PERM on flannel.1
=========== 10.14.7.64 ===========
? (192.168.143.192) at 32:48:e7:bb:74:d8 [ether] PERM on flannel.1
=========== 10.14.7.43 ===========
Non-zero exit status: 1

$ ifconfig flannel.1
flannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet 192.168.143.192  netmask 255.255.255.255  broadcast 0.0.0.0
        inet6 fe80::2836:32ff:fe7b:4132  prefixlen 64  scopeid 0x20<link>
        ether 2a:36:32:7b:41:32  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 5 overruns 0  carrier 0  collisions 0

The values in etcd are observed to be correct, FYI

$ etcdctl --prefix /flannel/network get
/flannel/network/config
{
      "EnableIPv4": true,
      "Network": "192.168.128.0/18",
      "SubnetLen": 26,
      "Backend": {
          "Type": "vxlan",
          "DirectRouting": false
      }
}
/flannel/network/subnets/192.168.134.192-26
{"PublicIP":"10.14.7.44","PublicIPv6":null,"BackendType":"vxlan","BackendData":{"VNI":1,"VtepMAC":"b6:f1:e3:69:06:ad"}}
/flannel/network/subnets/192.168.137.64-26
{"PublicIP":"10.14.7.55","PublicIPv6":null,"BackendType":"vxlan","BackendData":{"VNI":1,"VtepMAC":"66:14:5a:2b:ae:b0"}}
/flannel/network/subnets/192.168.140.192-26
{"PublicIP":"10.14.7.62","PublicIPv6":null,"BackendType":"vxlan","BackendData":{"VNI":1,"VtepMAC":"f2:17:19:89:06:eb"}}
/flannel/network/subnets/192.168.143.192-26
{"PublicIP":"10.14.7.43","PublicIPv6":null,"BackendType":"vxlan","BackendData":{"VNI":1,"VtepMAC":"2a:36:32:7b:41:32"}}
/flannel/network/subnets/192.168.144.128-26
{"PublicIP":"10.14.7.42","PublicIPv6":null,"BackendType":"vxlan","BackendData":{"VNI":1,"VtepMAC":"46:6e:38:8a:7e:c6"}}
/flannel/network/subnets/192.168.146.192-26
{"PublicIP":"10.14.7.64","PublicIPv6":null,"BackendType":"vxlan","BackendData":{"VNI":1,"VtepMAC":"1a:01:1e:7f:fa:1d"}}
/flannel/network/subnets/192.168.148.128-26
{"PublicIP":"10.14.7.56","PublicIPv6":null,"BackendType":"vxlan","BackendData":{"VNI":1,"VtepMAC":"02:26:18:53:4f:a8"}}
/flannel/network/subnets/192.168.152.128-26
{"PublicIP":"10.14.7.63","PublicIPv6":null,"BackendType":"vxlan","BackendData":{"VNI":1,"VtepMAC":"06:e9:34:03:4e:f4"}}

Possible Solution

Periodically resync the ARP entries.

Steps to Reproduce (for bugs)

Nothing special was done to reproduce this, but it was observed regularly in our environment.

Context

This is regularly causing communication issues between the pods in our environment. As part of preliminary investigation, as we tried to check for flannel logs, in journalctl, observed the following

Jul 27 05:23:15 system-test-03-cc30210035-node-2 flanneld[27730]: E0727 05:23:15.200382   27730 iptables.go:307] Failed to bootstrap IPTables: failed to apply partial iptables-restore unable to run iptables-restore (, ): exit status 4
Jul 27 05:23:15 system-test-03-cc30210035-node-2 flanneld[27730]: I0727 05:23:15.245615   27730 iptables.go:421] Some iptables rules are missing; deleting and recreating rules
Jul 27 05:23:15 system-test-03-cc30210035-node-2 flanneld[27730]: E0727 05:23:15.276349   27730 iptables.go:307] Failed to bootstrap IPTables: failed to apply partial iptables-restore unable to run iptables-restore (, ): exit status 4
Jul 27 05:23:15 system-test-03-cc30210035-node-2 flanneld[27730]: E0727 05:23:15.313107   27730 iptables.go:320] Failed to ensure iptables rules: error setting up rules: failed to apply partial iptables-restore unable to run iptables-re
store (, ): exit status 4
Jul 27 05:23:15 system-test-03-cc30210035-node-2 flanneld[27730]: I0727 05:23:15.320792   27730 iptables.go:421] Some iptables rules are missing; deleting and recreating rules
Jul 27 05:23:15 system-test-03-cc30210035-node-2 flanneld[27730]: I0727 05:23:15.491607   27730 iptables.go:283] bootstrap done 

The firewall rules, for which it was failing, we tried to manually execute iptables-restore for the same. Then also ran into the issue:

$ cat test1.txt
*filter
-D FORWARD -m comment --comment "flanneld forward" -j FLANNEL-FWD
-A FORWARD -m comment --comment "flanneld forward" -j FLANNEL-FWD
-A FLANNEL-FWD -s 192.168.192.0/18 -m comment --comment "flanneld forward" -j ACCEPT
-A FLANNEL-FWD -d 192.168.192.0/18 -m comment --comment "flanneld forward" -j ACCEPT
COMMIT

$ sudo iptables-restore < test1.txt
iptables-restore v1.4.21: Couldn't load target `FLANNEL-FWD':No such file or directoryError occurred at line: 2
Try `iptables-restore -h' or 'iptables-restore --help' for more information. 

Your Environment

  • Flannel version: 0.22.0
  • Backend used (e.g. vxlan or udp): VXLAN
  • Etcd version: 3.5.8
  • Kubernetes version (if used): 1.27.3
  • Operating System and version: CentOS Linux release 7.9.2009 (Core)
  • Link to your project (optional):

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Comments: 23 (15 by maintainers)

Most upvoted comments

Right now when a new nodes is added to the cluster flannel will call that part of the code that you mentioned where it adds the MAC address that it get from the etcd entry. MAC address of an interface shouldn’t change if you somehow didn’t force that change. I am doing some tests to check if there are strange cases where it applies.