flannel: ClusterIP services not accessible when using flannel CNI from host machines in Kubernetes

I am trying to access a Kubernetes service through its ClusterIP, from a pod that is attached to its host’s network and has access to DNS, with:

  hostNetwork: true
  dnsPolicy: ClusterFirstWithHostNet

However the host machine has no ip routes setup for the service CIDR, for example

➜  ~ k get services
NAME             TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
kubernetes       ClusterIP   100.64.0.1      <none>        443/TCP    25m
redis-headless   ClusterIP   None            <none>        6379/TCP   19m
redis-master     ClusterIP   100.64.63.204   <none>        6379/TCP   19m
➜  ~ k get pods -o wide
NAME                       READY   STATUS      RESTARTS   AGE   IP              NODE                                             NOMINATED NODE   READINESS GATES
redis-master-0             1/1     Running     0          18m   100.96.1.3      ip-172-20-39-241.eu-central-1.compute.internal   <none>           <none>
root@ip-172-20-39-241:/home/admin# ip route
default via 172.20.32.1 dev eth0
10.32.0.0/12 dev weave proto kernel scope link src 10.46.0.0
100.96.0.0/24 via 100.96.0.0 dev flannel.11 onlink
100.96.1.0/24 dev cni0 proto kernel scope link src 100.96.1.1
100.96.2.0/24 via 100.96.2.0 dev flannel.11 onlink
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
172.20.32.0/19 dev eth0 proto kernel scope link src 172.20.39.241

Expected Behavior

I expect that I should be able to reach services running on Kubernetes from the host machines, but I can only access headless services - those that return a pod IP.

The pod CIDR has ip routes setup, but the services CIDR doesn’t.

Current Behavior

Can’t access services through their ClusterIPs from host network.

Possible Solution

If I manually add an ip route to 100.64.0.0/16 via 100.96.1.1, ClusterIP are accessible. But this route is not there by default.

Your Environment

  • Flannel version: v0.11.0
  • kops version: Version 1.17.0-alpha.1 (git-501baf7e5)
  • Backend used (e.g. vxlan or udp): vxlan
  • Kubernetes version (if used):
  • Operating System and version:
  • Link to your project (optional):

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 47 (13 by maintainers)

Most upvoted comments

@nonsense I fixed it by changing the backend of flannel to host-gw instead of vxlan:

kubectl edit cm -n kube-system kube-flannel-cfg
  • replace vxlan with host-gw
  • save
  • not sure if needed, but I did it anyway: kubectl delete pods -l app=flannel -n kube-system

maybe this works for you as well

the ‘host-gw’ option is only possible to infrastructures that support layer2 interaction. most cloud providers don’t.

the same problem to me. try add a route to cni0 fixed for me:

ip r add 10.96.0.0/16 dev cni0

Just changed to host-gw and realized then that the problem was much bigger than I supposed: There is a big routing problem with k8 1.17 and flannel with vxlan , which affects ClusterIP, NodePorts and even LoadBalancerIPs managed by metallb.

Changing to host-gw fixes all of them. I wonder why this is not fixed or at least documented in a very prominent way.

Here ist my report of response time of a minio-Service (in seconds) before and after changing. The checks run on the nodes itself.

Screenshot_20200221_085815

Screenshot_20200221_085848

Can anyone explain why this issue does not occur on older Kubernetes releases? At least I’m not facing it in 1.16.4 with the exact same setup as in 1.17.5 and 1.18.2. Did Kubernetes disable checksum offloading in the past?

@malikbenkirane change ipfs/testground to testground/infra - repo moved - https://github.com/testground/infra/blob/master/k8s/sidecar.yaml

Thanks, I like the idea. Though I’ve found using calico rather than flannel working for me. I just had set --flannel-backend=none and followed calico k3s steps changing pod cidr accordingly.

@malikbenkirane change ipfs/testground to testground/infra - repo moved - https://github.com/testground/infra/blob/master/k8s/sidecar.yaml

exec those commands on every node will be ok:

ethtool -K flannel.1 tx-checksum-ip-generic off

I’m having this issue with vxlan backend both with flannel version 0.11 and 0.12 aswell. Affected kubernetes versions 1.16.X, 1.17.x and 1.18.x.

Finally setting up a static route on my nodes to service network through cni0 interface helped me instantly: ip route add 10.96.0.0/12 dev cni0

os: CentOS 7 install method: kubeadm underlying plattform: Virtualbox 6

@Gacko could you link the issue/PR for that, please?

Well, things get weird…

Just reset my whole cluster to 1.16.9. Everything works as expected. Then did the following:

kubectl edit daemonset -n kube-system kube-proxy

… and set the image version of kube-proxy to 1.17.5. Well, that’s not really an update nor a recommended change but: My test does not work anymore. When I’m rolling it back to 1.16.9 everything starts to work again.

So it really depends on the version of kube-proxy. I totally understand that there might be some kind of kernel bug and that changes made to kube-proxy are totally legit. But I’m still interested in what’s different between those version and the way they setup iptables for example.

I may found something relevant… out of desperation went poking around a diff of the release-1.16 branches and release-1.17 branches of kubernetes.

I think this has something to do with it: kubernetes/kubernetes#83576

The changes in that dependency specifically has aspects relating to vxlan and checksum handling.

This is speculation at this point, but it does speak to the workarounds people are finding.

issue on kubernetes/kubernetes: kubernetes/kubernetes#87852

Our workaround is to manually add the route to DNS through a DaemonSet as soon as there is at least one pod running on all workers (so that the cni0 interface appears).

@nonsense have an example?

Yes, here it is: https://github.com/ipfs/testground/blob/master/infra/k8s/sidecar.yaml#L23

Note that this won’t work, unless you have one pod on every host (i.e. another DaemonSet), so that cni0 exists. I know this is a hack, but I don’t have a better solution.

In our case the first pod we expect on every host is s3fs - https://github.com/ipfs/testground/blob/master/infra/k8s/kops-weave/s3bucket.yml