kubernetes: kube-proxy generates the wrong iptables dnat rule
What happened?
kube-proxy generates the wrong iptables dnat rule, as shown following:
[root@controller-node-1 ~]# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.233.0.1 <none> 443/TCP 3h18m
my-dep ClusterIP 10.233.60.220 <none> 80/TCP 26m
my-svc ClusterIP 10.233.48.3 <none> 80/TCP 26m
[root@worker1 ~]# iptables-save -t nat | grep '10.233.0.1'
-A KUBE-SERVICES -d 10.233.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-SVC-NPX46M4PTMTKRN6Y
-A KUBE-SVC-NPX46M4PTMTKRN6Y ! -s 10.233.64.0/18 -d 10.233.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-MARK-MASQ
[root@worker1 ~]# iptables-save -t nat | grep KUBE-SVC-NPX46M4PTMTKRN6Y
:KUBE-SVC-NPX46M4PTMTKRN6Y - [0:0]
-A KUBE-SERVICES -d 10.233.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-SVC-NPX46M4PTMTKRN6Y
-A KUBE-SVC-NPX46M4PTMTKRN6Y ! -s 10.233.64.0/18 -d 10.233.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-MARK-MASQ
-A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https -> 10.6.214.21:6443" -j KUBE-SEP-CC3HXZSKU6BR4DDB
[root@worker1 ~]# iptables-save -t nat | grep KUBE-SEP-CC3HXZSKU6BR4DDB
:KUBE-SEP-CC3HXZSKU6BR4DDB - [0:0]
-A KUBE-SEP-CC3HXZSKU6BR4DDB -s 10.6.214.21/32 -m comment --comment "default/kubernetes:https" -j KUBE-MARK-MASQ
-A KUBE-SEP-CC3HXZSKU6BR4DDB -p tcp -m comment --comment "default/kubernetes:https" -m tcp -j DNAT --to-destination :0 --persistent --to-destination :0 --persistent --to-destination 0.0.0.0 --persistent
-A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https -> 10.6.214.21:6443" -j KUBE-SEP-CC3HXZSKU6BR4DDB
This causes the worker node to be unable to access the apiserver because of this incorrect iptables rule.
What did you expect to happen?
kube-proxy should be generates correct iptables rule
How can we reproduce it (as minimally and precisely as possible)?
- Create a cluster via kubespray, All are works.
[root@controller-node-1 ~]# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
controller-node-1 Ready control-plane 129m v1.25.3 10.6.214.12 <none> CentOS Linux 7 (Core) 5.19.10-1.el7.elrepo.x86_64 containerd://1.6.8
- Join a node
[root@controller-node-1 ~]# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
controller-node-1 Ready control-plane 129m v1.25.3 10.6.214.12 <none> CentOS Linux 7 (Core) 5.19.10-1.el7.elrepo.x86_64 containerd://1.6.8
worker1 NotReady <none> 85m v1.25.3 10.6.214.13 <none> CentOS Linux 7 (Core) 5.4.197-1.el7.elrepo.x86_64 containerd://1.6.8
And I found some hostNetwork Pod on work1 failed to start. Since these pods failed to visit apiServer, I check that the firewall has been disable.
I found kube-proxy generates incorrect iptables rule:
[root@worker1 ~]# iptables-save -t nat | grep KUBE-SEP-CC3HXZSKU6BR4DDB
:KUBE-SEP-CC3HXZSKU6BR4DDB - [0:0]
-A KUBE-SEP-CC3HXZSKU6BR4DDB -s 10.6.214.21/32 -m comment --comment "default/kubernetes:https" -j KUBE-MARK-MASQ
-A KUBE-SEP-CC3HXZSKU6BR4DDB -p tcp -m comment --comment "default/kubernetes:https" -m tcp -j DNAT --to-destination :0 --persistent --to-destination :0 --persistent --to-destination 0.0.0.0 --persistent
-A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https -> 10.6.214.21:6443" -j KUBE-SEP-CC3HXZSKU6BR4DDB
--to-destination :0 --persistent --to-destination :0 --persistent --to-destination 0.0.0.0 --persistent
?
In controller-node-1, it works fine.
[root@controller-node-1 ~]# iptables-save -t nat | grep KUBE-SEP-CC3HXZSKU6BR4DDB
:KUBE-SEP-CC3HXZSKU6BR4DDB - [0:0]
-A KUBE-SEP-CC3HXZSKU6BR4DDB -s 10.6.214.21/32 -m comment --comment "default/kubernetes:https" -j KUBE-MARK-MASQ
-A KUBE-SEP-CC3HXZSKU6BR4DDB -p tcp -m comment --comment "default/kubernetes:https" -m tcp -j DNAT --to-destination 10.6.214.21:6443
-A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https -> 10.6.214.21:6443" -j KUBE-SEP-CC3HXZSKU6BR4DDB
I tried to create a new service, but the iptables rules generated by kube-proxy have the same problem.
[root@controller-node-1 ~]# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.233.0.1 <none> 443/TCP 3h18m
my-dep ClusterIP 10.233.60.220 <none> 80/TCP 26m
my-svc ClusterIP 10.233.48.3 <none> 80/TCP 26m
[root@worker1 ~]# iptables-save -t nat | grep 10.233.60.220
-A KUBE-SERVICES -d 10.233.60.220/32 -p tcp -m comment --comment "default/my-dep cluster IP" -m tcp --dport 80 -j KUBE-SVC-YIDRKHK4K7YFNT5I
-A KUBE-SVC-YIDRKHK4K7YFNT5I ! -s 10.233.64.0/18 -d 10.233.60.220/32 -p tcp -m comment --comment "default/my-dep cluster IP" -m tcp --dport 80 -j KUBE-MARK-MASQ
[root@worker1 ~]# iptables-save -t nat | grep KUBE-SVC-YIDRKHK4K7YFNT5I
:KUBE-SVC-YIDRKHK4K7YFNT5I - [0:0]
-A KUBE-SERVICES -d 10.233.60.220/32 -p tcp -m comment --comment "default/my-dep cluster IP" -m tcp --dport 80 -j KUBE-SVC-YIDRKHK4K7YFNT5I
-A KUBE-SVC-YIDRKHK4K7YFNT5I ! -s 10.233.64.0/18 -d 10.233.60.220/32 -p tcp -m comment --comment "default/my-dep cluster IP" -m tcp --dport 80 -j KUBE-MARK-MASQ
-A KUBE-SVC-YIDRKHK4K7YFNT5I -m comment --comment "default/my-dep -> 10.233.74.77:80" -j KUBE-SEP-CKROCXU3WMRQYCUN
[root@worker1 ~]# iptables-save -t nat | grep KUBE-SEP-CKROCXU3WMRQYCUN
:KUBE-SEP-CKROCXU3WMRQYCUN - [0:0]
-A KUBE-SEP-CKROCXU3WMRQYCUN -s 10.233.74.77/32 -m comment --comment "default/my-dep" -j KUBE-MARK-MASQ
-A KUBE-SEP-CKROCXU3WMRQYCUN -p tcp -m comment --comment "default/my-dep" -m tcp -j DNAT --to-destination :0 --persistent --to-destination :0 --persistent --to-destination
-A KUBE-SVC-YIDRKHK4K7YFNT5I -m comment --comment "default/my-dep -> 10.233.74.77:80" -j KUBE-SEP-CKROCXU3WMRQYCUN
[root@worker1 ~]# curl 10.233.60.220
Anything else we need to know?
No response
Kubernetes version
$ kubectl version
# paste output here
[root@controller-node-1 ~]# kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short. Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.3", GitCommit:"434bfd82814af038ad94d62ebe59b133fcb50506", GitTreeState:"clean", BuildDate:"2022-10-12T10:57:26Z", GoVersion:"go1.19.2", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.3", GitCommit:"434bfd82814af038ad94d62ebe59b133fcb50506", GitTreeState:"clean", BuildDate:"2022-10-12T10:49:09Z", GoVersion:"go1.19.2", Compiler:"gc", Platform:"linux/amd64"}
Cloud provider
OS version
# On Linux:
$ cat /etc/os-release
[root@worker1 ~]# cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
$ uname -a
[root@controller-node-1 ~]# uname -a
Linux controller-node-1 5.19.10-1.el7.elrepo.x86_64 #1 SMP PREEMPT_DYNAMIC Sat Sep 17 11:34:40 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux
[root@worker1 ~]# uname -a
Linux worker1 5.4.197-1.el7.elrepo.x86_64 #1 SMP Sat Jun 4 08:43:19 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux
# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here
Install tools
kubespray
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, …) and versions (if applicable)
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 27 (27 by maintainers)
Kernel guys suggest that this is purely an iptables-1.4 display problem; They believe the kernel has the correct representation of the rule (as seen by the fact that iptables 1.8 can consistently show it correctly), it’s just that iptables 1.4 isn’t displaying it correctly.
So in that case, whatever bug you’re hitting is somewhere else, and is unrelated to this particular iptables rule…