kubernetes: Unable use localhost: with IPVS
/kind bug /sig network /area ipvs
What happened: Connecting to a service using a nodeport and the loopback address does not work
What you expected to happen: Being able to connect to a nodeport using the loopback (it works with IPVS)
How to reproduce it (as minimally and precisely as possible): Create a nodeport service and try to connect to it using localhost:<nodeport>
Anything else we need to know?: The IPVS configuration looked fine:
sudo ipvsadm -Ln -t 127.0.0.1:32116 --stats
Prot LocalAddress:Port Conns InPkts OutPkts InBytes OutBytes
-> RemoteAddress:Port
TCP 127.0.0.1:32116 1 2 0 120 0
-> 10.x.0.99:5000 0 0 0 0 0
-> 10.x.0.115:5000 1 2 0 120 0
-> 10.x.0.130:5000 0 0 0 0 0
The masquerading mark is applied properly
Chain KUBE-MARK-MASQ (3 references)
pkts bytes target prot opt in out source destination
2 120 MARK all -- * * 0.0.0.0/0 0.0.0.0/0 MARK or 0x4000
Chain KUBE-NODE-PORT (1 references)
pkts bytes target prot opt in out source destination
2 120 KUBE-MARK-MASQ all -- * * 0.0.0.0/0 0.0.0.0/0
Chain KUBE-SERVICES (2 references)
pkts bytes target prot opt in out source destination
2 120 KUBE-NODE-PORT all -- * * 0.0.0.0/0 0.0.0.0/0 /* Kubernetes nodeport TCP port for masquerade purpose */ match-set KUBE-NODE-PORT-TCP dst
However the MASQUERADE rule in the KUBE-POSTROUTING is not reached
Chain KUBE-POSTROUTING (1 references)
pkts bytes target prot opt in out source destination
0 0 MASQUERADE all -- * * 0.0.0.0/0 0.0.0.0/0 /* kubernetes service traffic requiring SNAT */ mark match 0x4000/0x4000
tcpdump does not show any traffic targeting port 5000. It looks like the kernel is dropping connections with source 127.0.0.1 and an external destination before reaching the POSTROUTING chain.
Changing the local route table fixed the issue:
sudo ip route change 127.0.0.1 dev lo proto kernel scope host src <node ip> table local
Environment:
- kubelet version:
v1.10.5 - kube-proxy version:
v1.11.1 - OS (e.g. from /etc/os-release):
ubuntu 1804 - Kernel:
4.15
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 4
- Comments: 48 (30 by maintainers)
Commits related to this issue
- Address registry mirror on docker bridge interface localhost does not work on IPVS when used with NodePort - https://github.com/kubernetes/kubernetes/issues/67730 — committed to utilitywarehouse/tf_kube_ignition by george-angel 4 years ago
- Address registry mirror on docker bridge interface localhost does not work on IPVS when used with NodePort - https://github.com/kubernetes/kubernetes/issues/67730 — committed to utilitywarehouse/tf_kube_ignition by george-angel 4 years ago
After a little dig into the kernel, failing to connect
localhost:<nodeport>can be explained.Assume we are visiting
http://127.0.0.1:<nodeport>, every packet will first pass throughip_vs_nat_xmit. Then after some check, it runs to https://github.com/torvalds/linux/blob/v4.18/net/netfilter/ipvs/ip_vs_xmit.c#L756.__ip_vs_get_out_rtis used to search for route to remote server, k8s pods in our case. Take a deeper look at__ip_vs_get_out_rt, after validating route cache or finding route viado_output_route4, it comes tocrosses_local_route_boundaryto judge whether the searched route can pass cross-local-route-boundary check.Copy the code of
crosses_local_route_boundaryhere and go deeper.For nat mode of IPVS,
rt_modeis assigned asIP_VS_RT_MODE_LOCAL | IP_VS_RT_MODE_NON_LOCAL | IP_VS_RT_MODE_RDRas https://github.com/torvalds/linux/blob/v4.18/net/netfilter/ipvs/ip_vs_xmit.c#L757-L759 indicates andnew_rt_is_localis 0 due to https://github.com/torvalds/linux/blob/v4.18/net/netfilter/ipvs/ip_vs_xmit.c#L363.source_is_loopbackwill betruebecause source addressip_hdr(skb)->saddris127.0.0.1. The five booleans defined incrosses_local_route_boundarywill all betruein this case. So we finally fall into hereand never pass the cross-local-route-boundary check.
This can also supports @lbernail 's trial by modifying local route table. The src of the generated packet is not
127.0.0.1any more, so it can pass the cross-local-route-boundary check and get into pods.@miaoshixuan
We can work together on it if you really want to see the issue get fixed though I suspect few people will use localhost for nodeport in production environment.
@lbernail I changed my local route, everything works well. But when I restart kubelet, It will keep crash. here is my kubelet config
log here
Environment:
@miaoshixuan Interesting, I never tested this fix in depth but just tried to find a way to make it work. Maybe changing the default route for 127.0.0.1 prevents binding on localhost? Sound suprising. I’ll try and reproduce next week
/area kube-proxy
Checking on whether PR https://github.com/kubernetes/kubernetes/pull/69206 will be merged to solve this issue.
We are facing this exact same issue - when running kube-proxy in IPVS mode, localhost:node_port is not accessible, whereas node_ip:node_port is accessible. In Iptables mode, both methods work.
The workaround suggested when the issue was filed works, but looking for a more formal way to solve this problem.