calico: Calico networking broken when host OS uses iptables >= 1.8

Pods cannot communicate with each other or the internet when running with Calico networking on Debian Testing (aka Buster)

Expected Behavior

Installing Calico using the getting started manifests (k8s datastore, not etcd) should result in a cluster where pods can talk to each other.

Current Behavior

I bootstrapped a single-node k8s cluster on a Debian Testing (Buster) machine, using kubeadm init --pod-network-cidr=192.168.0.0/16 and KUBECONFIG=/etc/kubernetes/admin.conf kubectl taint nodes --all node-role.kubernetes.io/master-.

I then installed Calico using the instructions at: https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/calico#installing-with-the-kubernetes-api-datastore50-nodes-or-less .

Calico pods start, and once the CNI config is installed other pods start up as well.

However, no pods can talk to any other pods, or to the internet. Packets flow correctly out of the container and onto the host, but never flow back out from there.

Switching the OS back to Debian Stable (stretch), Calico works flawlessly again.

Possible Solution

I suspect, although I have no proof, that the root cause is the release of iptables 1.8. See related bug kubernetes/kubernetes#71305 . iptables 1.8 switches to using nf_tables in the kernel, and splits the tooling into iptables (translation layer for nf_tables) and iptables-legacy (the “classic” iptables). So, you end up with nf_tables in the kernel, an aware iptables 1.8 on the host OS, but legacy iptables 1.6 in the networking containers (including calico-node).

A breakage in netfilter is consistent with the symptoms I’ve found in my debugging so far. I’m going to add the debugging I’ve done so far in a separate post, since it’s a lot of data and I want to keep the initial report fairly crisp.

Steps to Reproduce (for bugs)

Create a trivial k8s cluster on Debian Buster machines using kubeadm, then install Calico. Observe that pod<>pod and pod<>internet routing is broken.

Context

I’m the main developer of MetalLB. I’ve been working on creating a VM-based test harness for MetalLB that cross-tests compatibility against a bunch of k8s network addons, including Calico. I’ve struggled for the past 2 days with bizarre “none of my network addons seem to work” issues, which I’ve just figured out is caused by “something that changed recently in Debian Buster” (because I have older Debian Buster clusters on which Calico worked fine).

Your Environment

Calico version

  • Calico: v3.3.1
  • Orchestrator version (e.g. kubernetes, mesos, rkt): Kubernetes 1.12.3
  • Operating System and version: Debian Buster aka Debian Testing aka “the rolling release of pretty recent versions of everything”

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 5
  • Comments: 29 (5 by maintainers)

Most upvoted comments

We’re including support in Calico v3.8.1+ which will allow Calico to run on hosts which use iptables in NFT mode.

Setting the FELIX_IPTABLESBACKEND=NFT option will tell Calico to use the nftables backend. For now, this will need to be set explicitly.

We have also run into this issue when upgrading our kubernetes nodes to Debian Buster with iptables 1.8

We were able to get around this issue by using

update-alternatives --set iptables /usr/sbin/iptables-legacy

https://wiki.debian.org/nftables#Current_status

v3.8.1 requires that you set FELIX_IPTABLESBACKEND=NFT. Automatic detection will hopefully be added in the future, we don’t know when that feature will added.

It’s definitely iptables 1.8. After some friendly inspiration (hi @bradfitz!), I grabbed the iptables 1.6 package out of debian stable, and hackily overwrote all the binaries on the host OS so that the host node ends up using the same version of iptables as all the containers (1.6 from debian stable)

apt-get install wget binutils xz-utils
wget http://ftp.us.debian.org/debian/pool/main/i/iptables/iptables_1.6.0+snapshot20161117-6_amd64.deb
ar x iptables_1.6.0+snapshot20161117-6_amd64.deb
tar xvf data.tar.xz
cp -f ./sbin/* /sbin
cp -f ./usr/sbin/* /usr/sbin
cp -f ./usr/bin/* /usr/bin
reboot

I did a reboot to fully clear all the system state and go from a clean slate. After the reboot finishes and k8s comes back up, coredns is no longer crashlooping (meaning the pod is able to reach the upstream internet DNS resolver), and I can ping pod-to-pod just fine.

So, the root cause definitely seems to be mixing iptables 1.6 and iptables 1.8 against the same kernel. If you use all iptables 1.6, everything is fine. I’m guessing if you use only iptables 1.8 (which translates into nftables but faithfully emulates the userspace interfaces), everything would also work fine. But with the host OS using iptables 1.8 (which programs nftables) and containers like calico-node using iptables 1.6 (which programs legacy iptables), packet forwarding seems to break.

Given that, my guess as to a fix would be for calico-node to have both versions of iptables available, and pick which one to use based on what the host OS is doing, somehow (e.g. check via netlink if nftables are non-empty?). Either that, or spin separate containers and document that users have to be careful with which one they use.

I wonder, why don’t it behave like FELIX_IPTABLESBACKEND=Auto by default w/o configuration? e.g why https://docs.projectcalico.org/reference/felix/configuration Default: Legacy well?)

Just spent 3 days debugging this issue as I’m using Debian Buster. Could we add a note to https://docs.projectcalico.org/getting-started/kubernetes/self-managed-public-cloud/gce to highlight this nuance. I believe that would a lot of people some debugging time.

update-alternatives --set iptables /usr/sbin/iptables-legacy

This just solved two days of debugging, when we tried to install Rancher on a Debian Buster cluster. Since Google did not provide any matches on the error we got i will paste the error message below so that others googling this issue will find this thread:

 [ERROR] Failed to connect to peer wss://10.42.X.X/v3/connect [local ID=10.42.X.X]: dial tcp 10.42.X.X:443: i/o timeout

Thanks, @mrak

Just a note for somebody who got the same problem and switched to iptables-legacy but still doesn’t help: Don’t forget to check the rules that docker created by iptables-nft, it’s still there if you don’t manage it by yourself. The syntax is quite the same as iptables: iptables-nft-save In our case docker uses iptables-nft to create rules when it’s starting up and set the default FORWARD policy to DROP. We covered this case with iptables but didn’t notice about iptables-nft rules I’m using Debian10 , calico 3.5.1.