kubernetes: kube-proxy currently incompatible with `iptables >= 1.8`
What happened:
When creating nodes on machines with iptables >= 1.8
kube-proxy is unable initialize and route service traffic. The following is logged:
kube-proxy-22hmk kube-proxy E1120 07:08:50.135017 1 proxier.go:647] Failed to ensure that nat chain KUBE-SERVICES exists: error creating chain "KUBE-SERVICES": exit status 3: iptables v1.6.0: can't initialize iptables table `nat': Table does not exist (do you need to insmod?)
kube-proxy-22hmk kube-proxy Perhaps iptables or your kernel needs to be upgraded.
This is compat issue in iptables
which I believe is called directly from kube-proxy. This is likely due to module reorganization with iptables move to nf_tables: https://marc.info/?l=netfilter&m=154028964211233&w=2
iptables 1.8 is backwards compatible with iptables 1.6 modules:
root@vm77:~# iptables --version
iptables v1.6.1
root@vm77:~# docker run --cap-add=NET_ADMIN drags/iptables:1.6 iptables -t nat -Ln
iptables: No chain/target/match by that name.
root@vm77:~# docker run --cap-add=NET_ADMIN drags/iptables:1.8 iptables -t nat -Ln
iptables: No chain/target/match by that name.
root@vm83:~# iptables --version
iptables v1.8.1 (nf_tables)
root@vm83:~# docker run --cap-add=NET_ADMIN drags/iptables:1.6 iptables -t nat -Ln
iptables v1.6.0: can't initialize iptables table `nat': Table does not exist (do you need to insmod?)
Perhaps iptables or your kernel needs to be upgraded.
root@vm83:~# docker run --cap-add=NET_ADMIN drags/iptables:1.8 iptables -t nat -Ln
iptables: No chain/target/match by that name.
However kube-proxy is based off of debian:stretch which iptables-1.8 may only make it to as part of stretch-backports
How to reproduce it (as minimally and precisely as possible):
Install a node onto a host with iptables-1.8 installed (ex: Debian Testing/Buster)
Anything else we need to know?:
I can keep these nodes in this config for a while, feel free to ask for any helpful output.
Environment:
- Kubernetes version (use
kubectl version
):
Client Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.2", GitCommit:"17c77c7898218073f14c8d573582e8d2313dc740", GitTreeState:"clean", BuildDate:"2018-10-24T06:54:59Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.4", GitCommit:"bf9a868e8ea3d3a8fa53cbb22f566771b3f8068b", GitTreeState:"clean", BuildDate:"2018-10-25T19:06:30Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}```
- Cloud provider or hardware configuration:
libvirt
- OS (e.g. from /etc/os-release):
PRETTY_NAME="Debian GNU/Linux buster/sid"
NAME="Debian GNU/Linux"
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
- Kernel (e.g.
uname -a
):
Linux vm28 4.16.0-1-amd64 #1 SMP Debian 4.16.5-1 (2018-04-29) x86_64 GNU/Linux
- Install tools:
kubeadm
- Others:
/kind bug
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 82 (61 by maintainers)
Links to this issue
Commits related to this issue
- Pin Debian release to stretch instead of stable Debian stable is now buster, which causes breakage due to iptables update, see https://github.com/kubernetes/kubernetes/issues/71305. — committed to cloudboss/keights by rjosephwright 5 years ago
- Pin Debian release to stretch instead of stable Debian stable is now buster, which causes breakage due to iptables update, see https://github.com/kubernetes/kubernetes/issues/71305. — committed to cloudboss/keights by rjosephwright 5 years ago
- Pin Debian release to stretch instead of stable Debian stable is now buster, which causes breakage due to iptables update, see https://github.com/kubernetes/kubernetes/issues/71305. — committed to cloudboss/keights by rjosephwright 5 years ago
- Pin Debian release to stretch instead of stable Debian stable is now buster, which causes breakage due to iptables update, see https://github.com/kubernetes/kubernetes/issues/71305. — committed to cloudboss/keights by rjosephwright 5 years ago
- Pin Debian release to stretch instead of stable Debian stable is now buster, which causes breakage due to iptables update, see https://github.com/kubernetes/kubernetes/issues/71305. — committed to cloudboss/keights by rjosephwright 5 years ago
- Fix iptables/nftables issue Both iptables and nftables use netfilter framework in the kernel for packet filtering. Many distributions are moving in the direction of using nftables over iptables. Alth... — committed to sridhargaddam/submariner-charts by sridhargaddam 5 years ago
- Exec iptables from the host filesystem Depends-On: https://github.com/submariner-io/submariner-charts/pull/3 Both iptables and nftables use netfilter framework in the kernel for packet filtering. Ma... — committed to sridhargaddam/submariner by sridhargaddam 5 years ago
- Exec iptables from the host filesystem Depends-On: https://github.com/submariner-io/submariner-charts/pull/3 Both iptables and nftables use netfilter framework in the kernel for packet filtering. Ma... — committed to sridhargaddam/submariner by sridhargaddam 5 years ago
- Exec iptables from the host filesystem Depends-On: https://github.com/submariner-io/submariner-charts/pull/3 Both iptables and nftables use netfilter framework in the kernel for packet filtering. Ma... — committed to sridhargaddam/submariner by sridhargaddam 5 years ago
- Add instructions for switching to iptables-legacy See https://github.com/kubernetes/kubernetes/issues/71305 for more context. Instructions were taken from https://wiki.debian.org/iptables. — committed to praseodym/kubernetes-website by praseodym 5 years ago
- Add instructions for switching to iptables-legacy See https://github.com/kubernetes/kubernetes/issues/71305 for more context. Instructions were taken from https://wiki.debian.org/iptables. — committed to praseodym/kubernetes-website by praseodym 5 years ago
- Add instructions for switching to iptables-legacy See https://github.com/kubernetes/kubernetes/issues/71305 for more context. Instructions were taken from https://wiki.debian.org/iptables. — committed to praseodym/kubernetes-website by praseodym 5 years ago
- Add instructions for switching to iptables-legacy See https://github.com/kubernetes/kubernetes/issues/71305 for more context. Instructions were taken from https://wiki.debian.org/iptables. — committed to praseodym/kubernetes-website by praseodym 5 years ago
- Add instructions for switching to iptables-legacy See https://github.com/kubernetes/kubernetes/issues/71305 for more context. Instructions were taken from https://wiki.debian.org/iptables. — committed to praseodym/kubernetes-website by praseodym 5 years ago
- Add instructions for switching to iptables-legacy See https://github.com/kubernetes/kubernetes/issues/71305 for more context. Instructions were taken from https://wiki.debian.org/iptables. — committed to praseodym/kubernetes-website by praseodym 5 years ago
- Use iptables-legacy. Ref. https://github.com/kubernetes/kubernetes/issues/71305#issuecomment-479558920 — committed to raynix/ansible-kubeadm by deleted user 5 years ago
- Downgrade centos base image to centos7 to dowgrade iptables from 1.8 to 1.4 See https://github.com/kubernetes/kubernetes/issues/71305#issuecomment-448052889 — committed to chenchun/galaxy by chenchun 5 years ago
- Downgrade centos base image to centos7 to dowgrade iptables from 1.8 to 1.4 See https://github.com/kubernetes/kubernetes/issues/71305#issuecomment-448052889 — committed to tkestack/galaxy by chenchun 5 years ago
- https://github.com/kubernetes/kubernetes/issues/71305#issuecomment-509394152 — committed to alvistack/ansible-role-kube_kubelet by hswong3i 4 years ago
- [neutron] Pin build to centos 7 iptables in RHEL 8 containers is at version 1.8, which is not compatible with a host system at RHEL 7 using iptables 1.4. https://github.com/kubernetes/kubernetes/iss... — committed to ChameleonCloud/kolla-containers by deleted user 3 years ago
this works for me
update-alternatives --set iptables /usr/sbin/iptables-legacy
There are 2 sets of modules for packet filtering in the kernel: ip_tables, and nf_tables. Until recently, you controlled the ip_tables ruleset with the
iptables
family of tools, and nf_tables with thenft
tools.In iptables 1.8, the maintainers have “deprecated” the classic ip_tables: the
iptables
tool now does userspace translation from the legacy UI/UX, and uses nf_tables under the hood. So, the commands look and feel the same, but they’re now programming a different kernel subsystem.The problem arises when you mix and match invocations of iptables 1.6 (the previous stable) and 1.8 on the same machine, because although they look identical, they’re programming different kernel subsystems. The problem is that at least Docker does some stuff with
iptables
on the host (uncontained), and so you end up with some rules in nf_tables and some rules (including those programmed by kube-proxy and most CNI addons) in legacy ip_tables.Empirically, this causes weird and wonderful things to happen - things like if you trace a packet coming from a pod, you see it flowing through both ip_tables and nf_tables, but even if both accept the packet, it then vanishes entirely and never gets forwarded (this is the failure mode I reported to Calico and Weave - bug links upthread - after trying to run k8s on debian testing, which now has iptables 1.8 on the host).
Bottom line, the networking containers on a machine have to be using the same minor version of the
iptables
binary as exists on the host.As a preface, one thing to note: iptables 1.8 ships two binaries,
iptables
andiptables-legacy
. The latter always programs ip_tables. So, there’s fortunately no need to bundle two versions of iptables into a container, you can bundle just iptables 1.8 and be judicious about which binary you invoke… At least until the -legacy binary gets deleted, presumably in a future release.Here’s some requirements I think an ideal solution would have:
apt-get upgrade
in the background).So far I’ve only thought up crappy options for dealing with this. I’ll throw them out in the hopes that it leads to better ideas.
iptables
andiptables-legacy
for the presence of rules installed by the host. Hopefully, there will be rules in only one of the two, and that can tell kube-proxy which one to use. This is subject to race conditions, and is fragile to host mutations that happen after kube-proxy startup (e.g.apt-get upgrade
that upgrades iptables and restarts the docker daemon, shifting its rules over to nf_tables). Can solve it with periodic reconciling (i.e. “oops, host seems to have switched to nf_tables, wipe all ip_tables rules and reinstall them in nf_tables!”)KubeProxyConfiguration
cluster object. IOW, just document that “it’s your responsibility to correctly tell kube-proxy which version of iptables you’re using, or things will break.” Relies on humans to get things right, which I predict will cause a rash of broken clusters. If we do this, we should absolutely also wire something into node-problem-detector that fires when both ip_tables and nf_tables have rules programmed.nft
tools, and mandate that host OSes for k8s must do everything in nf_tables, no ip_tables allowed. Likely intractable given the variety of addons and non-k8s software that does stuff to the firewall (same reasoniptables
has endured all these years even though nftables is measurably better in every way).Of all of these, I think “probe with both binaries and try to conform to whatever is already there” is the most tractable if kube-proxy were the only problem pod… But given the ecosystem of CNI addons and other third-party things, I foresee never ending duels of controllers flapping between ip_tables and nf_tables endlessly, all trying to vaguely converge on a single stack, but never succeeding.
I can confirm that updating the host to use iptables-legacy works on Raspbian 10 (arm) and Debian 10 (amd64) to resolve the iptables mismatch issue.
For completeness it may be beneficial to update all of the network tools to use the legacy versions to avoid issues. These commands may or may not be pertinent depending upon specific host configuration but will avoid mixing legacy and nft modes if invoked from outside docker/kubernetes.
update-alternatives --set iptables /usr/sbin/iptables-legacy
update-alternatives --set ip6tables /usr/sbin/ip6tables-legacy
update-alternatives --set arptables /usr/sbin/arptables-legacy
update-alternatives --set ebtables /usr/sbin/ebtables-legacy
@danwinship @thockin Looks like it would be good to document any workarounds for 1.16 release notes since folks are hitting this already? (see https://github.com/kubernetes/kubernetes/issues/82361 for example)
The official kubernetes packages, and in particular kubeadm-based installs, are fixed as of 1.17. Other distributions of kubernetes may have been fixed earlier or might not be fixed yet.
When using nf_tables mode rules are added indefinitely to the
KUBE-FIREWALL
chain;in both proxy-mode ipvs and iptables.
Kind of a creepy idea, but you could use nsenter to run the iptables command on the host in the hosts environment.
kube-proxy
itself seems compatible with iptables >=1.8 so the slogan in this issue is somewhat misleading. I have made basic tests and see no problems when using the correct version of the user-spaceiptables
(and ipv6 with ip6tables) and the supporting libs. I don’t think this problem can be fixed by altering some code in kube-proxy.Tested versions; iptables v1.8.2, linux 4.19.3
The problem seems to be that that
iptables
user-space program (and libs) is (and has always been) dependent on the kernel version on the host. When theiptables
user-space program is in a container with a old version this problem is bound to happen sooner or later, and it will happen again.The kernel/user-space dependency is one of the problem that
nft
is supposed to fix. A long-term solution may be to replaceiptables
with ntf or bpf.To get rid of that libvirt error, my permanent workaround in Debian 11 (as a host) with libvirtd daemon is to block the loading of iptables-related modules:
Create a file in
/etc/modprobe.d/nft-only.conf
:libvirtd
daemon now starts without any error.Post-analysis: Apparently, I had
iptables
module loaded alongside with many nft-related modules; onceiptables
was gone, the pesky error message went away.We should be able to give containers a working set of iptables-legacy or iptables-nft binaries directly rather than needing a proxy. Just give them an entire chroot rather than just the binaries. (ie, build a Debian container image containing only the
iptables
package and the packages it depends on (eg, glibc), and then mount that somewhere in the pod). Then instead of overwriting their/usr/sbin/iptables
with a proxy binary, you overwrite it with a shell script that doeschroot /iptables-binary-volume-sadkjf -- iptables "$@"
, etc. Or that works with thehostBinaries
volume idea too; the volume would just contain the chroot within it in addition to the wrapper scripts.Every single container that uses iptables in the root network namespace. It’s fine for, eg, istio, to use whatever iptables mode it wants in the pod namespace. (Though if you have multiple sidecars in a pod they all need to use the same mode…)
Probably as long as we care about people running Kubernetes on RHEL/CentOS 7. (People will probably be running RHEL 7 longer than people are running CentOS 7, but we might care about those users less. Either way, by the time we stop caring about that, everyone else should be using nft mode.)
That is correct. Pods need to be hostNetwork and either privileged or
CAP_NET_ADMIN
for them to matter.Your main point stands, but the Debian (and Ubuntu, and …) packages aren’t fine-grained:
iptables
contains all the(arp|ep|ip|ip6|x)tables
tools, and the kernel package contains all the modules. There are package splits in theiptables
source, but they split out library packages, not tool packages, and theiptables
package depends on them all anyway.@thockin Debian Buster is the main one (as Debian is used as the default distro by many of the k8s components), Ubuntu 19.04, RHEL 8 (and the upcoming Centos 8 by extension), Alpine 3.10, Fedora >= 29.
A built-in “figure it out” mode seems right. This is frankly ridiculous. This is what APIs are for, and like it or not
exec iptables
is an API. Forcing all parties to coordinate and use the same binaries is ridonculous and clearly not workable.It sounds like openshift has implemented a “figure it out” mode on its own. But I know many customers who are not going to be happy hostPath mounting / into kube-proxy. Do we REALLY need the host’s binaries or can we install iptables 1.8+ and call our own iptables.sh which does the same detection?
What distros are known to have 1.8 available so I can do some playing?
The two modes are supposed to be equivalent in terms of behavior. (The advantage of using iptables in nft mode is that it lets other parts of the system use nft directly, and get nft’s advantages, and their rules will interoperate correctly with the iptables-nft rules. Whereas if you use iptables-legacy, the iptables rules and nft rules would conflict with each other in complicated ways.)
So anyway, the two modes are supposed to be equivalent, so we shouldn’t have to test against both modes, and if we did, and something in kubernetes didn’t work right in one mode, that would indicate an iptables or kernel bug, not a kubernetes bug. It’s possible we might end up wanting to add workarounds to kubernetes for a bug in one or the other mode at some point, if someone discovers such a bug, but I don’t think we need to be testing against both modes continuously.
(And in practice, OCP on RHEL 8 using nft mode works just fine, other than possibly one problem with a
-j REJECT
mysteriously not actually rejecting and behaving like it was-j DROP
.)Our approach in OpenShift is to have the relevant pods mount the entire host filesystem, and in the corresponding image we install wrapper scripts in /usr/sbin that chroot to the host filesystem and exec the copy of iptables there.
These images then work on any system regardless of whether it has old or new iptables, and in the latter case, whether that iptables is configured to use “legacy” or “nft” mode. (In particular, these images work on both RHEL 7, using legacy iptables, and RHEL 8, using new iptables in nft mode.)
But I’m thinking we should get iptables upstream to add a new “figure it out” mode to the client binaries, which would internally do something along the lines of what @dcbw suggested above to figure out if the system iptables was using nft mode or legacy mode, and then it would just use the same mode. Then we just tell everyone “make sure your containers are using the
iptables-for-containers
package from iptables version 1.8.whatever or later” and they don’t have to worry beyond that.Just wanted to post another confirmation here that when installing / running a K8s cluster on Raspberry Pis with Raspbian 10 / Buster, I had to run:
Otherwise I was getting lots of errors with networking from various non-k8s-core pods (e.g. coredns, ingress, kube-proxy, kube-apiserver were seemingly fine, but flannel, metrics-server, nfs-client-provisioner were crashlooping).
I ran the above command on each node and rebooted all nodes, and everything quickly switched to
Running
status.@danwinship the symlink itself will actually tell you what the binary is. Following the ‘iptables’ symlink will either:
Even that’s a bit more complicated. Possibly the best choice here is to simply accept that if a container wants to modify the host OS then it may need to run tools provided by the host OS and not blindly assume that stuff it ships internally can always be used. Mount the host bin/lib/etc into /host and then have your internal /usr/sbin/iptables be a chroot wrapper into those dirs so that when kube-proxy calls iptables it actually runs the wrapper and does the right thing.
I remain unconvinced that containers that wish to modify the host OS can just blindly go about whatever they want to do and assume the host OS doesn’t matter.
I experienced the same issue in #72370. As a workaround I found this in the oracle docs, which made the pods be able to communicate with each other as well as with the outside world again.