flannel: Master to Pod communication is broken in kube-flannel

I am trying to setup a Kubernetes cluster using kube-flannel mode using vxlan backend. Node to Node communication is working. But Master to Pod network is not working. I am not a Linux networking expert. I see that master flannel.1 is assigned the network address. This seems to causing issues with arp.

# cat /run/flannel/subnet.env
FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.0.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true
# ip route show
default via 159.203.160.1 dev eth0 
10.17.0.0/16 dev eth0  proto kernel  scope link  src 10.17.0.8 
10.132.0.0/16 dev eth1  proto kernel  scope link  src 10.132.22.4 
10.244.0.0/16 dev flannel.1  proto kernel  scope link  src 10.244.0.0
159.203.160.0/20 dev eth0  proto kernel  scope link  src 159.203.168.74 
172.17.0.0/16 dev docker0  proto kernel  scope link  src 172.17.0.1 
# tcpdump -e -i flannel.1 -n arp
root@k-211935-master:~# tcpdump -e -i flannel.1 -n arp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on flannel.1, link-type EN10MB (Ethernet), capture size 262144 bytes
07:03:26.552296 96:f0:7d:42:39:7c > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 10.244.0.1 tell 10.244.0.0, length 28
07:03:27.552313 96:f0:7d:42:39:7c > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 10.244.0.1 tell 10.244.0.0, length 28
07:03:27.552326 96:f0:7d:42:39:7c > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 10.244.1.0 tell 10.244.0.0, length 28
07:03:28.552290 96:f0:7d:42:39:7c > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 10.244.0.1 tell 10.244.0.0, length 28
07:03:28.552307 96:f0:7d:42:39:7c > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 10.244.1.0 tell 10.244.0.0, length 28
07:03:28.560535 12:05:88:6f:fb:01 > 96:f0:7d:42:39:7c, ethertype ARP (0x0806), length 42: Request who-has 10.244.0.1 tell 10.244.1.0, length 28
07:03:29.560472 12:05:88:6f:fb:01 > 96:f0:7d:42:39:7c, ethertype ARP (0x0806), length 42: Request who-has 10.244.0.1 tell 10.244.1.0, length 28
07:03:30.560456 12:05:88:6f:fb:01 > 96:f0:7d:42:39:7c, ethertype ARP (0x0806), length 42: Request who-has 10.244.0.1 tell 10.244.1.0, length 28

The problem seems to be that Master flannel.1 is assigned the first IP of the subnet zero, which is indistinguishable from the network address. Can you please confirm that this will fail Master to Pod communication?

I am thinking about using the next Subnet of Node.Spec.PodCIDR in kubeSubnetManager. Will that fix this issue?

cc: @mikedanese

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Reactions: 4
  • Comments: 22 (1 by maintainers)

Most upvoted comments

My issue was solved. Its vagrant environment specific issue. Vagrant assign 10.0.2.15 IP to each machine which flannel was using as key, so it was creating only one subnet, ideally there should be two subnets for each of the nodes. Solution was to provide –iface=eth1 while launching flanneld. I noticed this after deploying etcd and flannel natively on clean VMs.

Same logic was applied in startup command of flannel in kubernetes.

Kube-Flannel yaml:

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: flannel
---
kind: ConfigMap
apiVersion: v1
metadata:
  name: kube-flannel-cfg
  labels:
    tier: node
    app: flannel
data:
  cni-conf.json: |
    {
      "name": "cbr0",
      "type": "flannel",
      "delegate": {
        "isDefaultGateway": true
      }
    }
  net-conf.json: |
    {
      "Network": "10.244.0.0/16",
      "Backend": {
        "Type": "vxlan"
      }
    }
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: kube-flannel-ds
  labels:
    tier: node
    app: flannel
spec:
  template:
    metadata:
      labels:
        tier: node
        app: flannel
    spec:
      hostNetwork: true
      nodeSelector:
        beta.kubernetes.io/arch: amd64
      serviceAccountName: flannel
      containers:
      - name: kube-flannel
        image: quay.io/coreos/flannel:v0.7.0
        command: [ "/opt/bin/flanneld", "--ip-masq", "--kube-subnet-mgr" , "--iface=enp0s8"]
        securityContext:
          privileged: true
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        volumeMounts:
        - name: run
          mountPath: /run
        - name: flannel-cfg
          mountPath: /etc/kube-flannel/
      - name: install-cni
        image: quay.io/coreos/flannel:v0.7.0
        command: [ "/bin/sh", "-c", "set -e -x; cp -f /etc/kube-flannel/cni-conf.json /etc/cni/net.d/10-flannel.conf; while true; do sleep 3600; done" ]
        volumeMounts:
        - name: cni
          mountPath: /etc/cni/net.d
        - name: flannel-cfg
          mountPath: /etc/kube-flannel/
      volumes:
        - name: run
          hostPath:
            path: /run
        - name: cni
          hostPath:
            path: /etc/cni/net.d
        - name: flannel-cfg
          configMap:
            name: kube-flannel-cfg

Note: --iface=.

[root@kmaster ~]#
[root@kmaster ~]# kubectl describe hello-service
the server doesn't have a resource type "hello-service"
[root@kmaster ~]# kubectl describe service hello-service
Name:                   hello-service
Namespace:              default
Labels:                 <none>
Selector:               app=hello
Type:                   ClusterIP
IP:                     10.104.194.162
Port:                   http    80/TCP
Endpoints:              10.244.0.2:8080,10.244.1.3:8080,10.244.1.4:8080
Session Affinity:       None
No events.
[root@kmaster ~]#

Shows DNS resolution.

[root@kmaster ~]# dig +short  @10.96.0.10 _http._tcp.hello-service.default.svc.cluster.local SRV
10 100 80 hello-service.default.svc.cluster.local.
[root@kmaster ~]# dig +short  @10.96.0.10 hello-service.default.svc.cluster.local.
10.104.194.162

The service is reachable.

[root@kmaster ~]# curl http://10.104.194.162:80
Hello, "/"
HOST: hello-deployment-1725651635-pb9mv
ADDRESSES:
    127.0.0.1/8
    10.244.1.4/24
    ::1/128
    fe80::f067:16ff:fe96:7295/64
[root@kmaster ~]#
[root@kmaster ~]#
[root@kmaster ~]# curl http://10.104.194.162:80
Hello, "/"
HOST: hello-deployment-1725651635-0t8xx
ADDRESSES:
    127.0.0.1/8
    10.244.1.3/24
    ::1/128
    fe80::c59:b2ff:fe82:ee1a/64
[root@kmaster ~]#
[root@kmaster ~]# curl http://10.104.194.162:80
Hello, "/"
HOST: hello-deployment-1725651635-51df9
ADDRESSES:
    127.0.0.1/8
    10.244.0.2/24
    ::1/128
    fe80::c4a1:84ff:fe82:ec83/64
[root@kmaster ~]#

This issue is fixed for me with v7.0 .

@autostatic I am glad that your cluster is working. The flannel brdige gets created the first time CNI plguin is called. Since kubernetes does not run regular pod on master, cbr0 bridge has not been created yet.

It also seems that you don’t need my patch. I needed this patch because we run a HAproxy based ingress controller on the master that load balances across pods on regular nodes. So, I needed Haproxy on master to be able to connect to pods on regular nodes.