kubernetes: Unable use localhost: with IPVS

/kind bug /sig network /area ipvs

What happened: Connecting to a service using a nodeport and the loopback address does not work

What you expected to happen: Being able to connect to a nodeport using the loopback (it works with IPVS)

How to reproduce it (as minimally and precisely as possible): Create a nodeport service and try to connect to it using localhost:<nodeport>

Anything else we need to know?: The IPVS configuration looked fine:

sudo ipvsadm -Ln -t 127.0.0.1:32116 --stats
Prot LocalAddress:Port               Conns   InPkts  OutPkts  InBytes OutBytes
  -> RemoteAddress:Port
TCP  127.0.0.1:32116                     1        2        0      120        0
  -> 10.x.0.99:5000                     0        0        0        0        0
  -> 10.x.0.115:5000                    1        2        0      120        0
  -> 10.x.0.130:5000                    0        0        0        0        0

The masquerading mark is applied properly

Chain KUBE-MARK-MASQ (3 references)
    pkts      bytes target     prot opt in     out     source               destination
       2      120 MARK       all  --  *      *       0.0.0.0/0            0.0.0.0/0            MARK or 0x4000

Chain KUBE-NODE-PORT (1 references)
    pkts      bytes target     prot opt in     out     source               destination
       2      120 KUBE-MARK-MASQ  all  --  *      *       0.0.0.0/0            0.0.0.0/0

Chain KUBE-SERVICES (2 references)
    pkts      bytes target     prot opt in     out     source               destination
       2      120 KUBE-NODE-PORT  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* Kubernetes nodeport TCP port for masquerade purpose */ match-set KUBE-NODE-PORT-TCP dst

However the MASQUERADE rule in the KUBE-POSTROUTING is not reached

Chain KUBE-POSTROUTING (1 references)
    pkts      bytes target     prot opt in     out     source               destination
       0        0 MASQUERADE  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes service traffic requiring SNAT */ mark match 0x4000/0x4000

tcpdump does not show any traffic targeting port 5000. It looks like the kernel is dropping connections with source 127.0.0.1 and an external destination before reaching the POSTROUTING chain.

Changing the local route table fixed the issue:

sudo ip route change 127.0.0.1 dev lo proto  kernel scope host src <node ip> table local

Environment:

  • kubelet version: v1.10.5
  • kube-proxy version: v1.11.1
  • OS (e.g. from /etc/os-release): ubuntu 1804
  • Kernel: 4.15

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 4
  • Comments: 48 (30 by maintainers)

Commits related to this issue

Most upvoted comments

After a little dig into the kernel, failing to connect localhost:<nodeport> can be explained.

Assume we are visiting http://127.0.0.1:<nodeport>, every packet will first pass through ip_vs_nat_xmit. Then after some check, it runs to https://github.com/torvalds/linux/blob/v4.18/net/netfilter/ipvs/ip_vs_xmit.c#L756. __ip_vs_get_out_rt is used to search for route to remote server, k8s pods in our case. Take a deeper look at __ip_vs_get_out_rt, after validating route cache or finding route via do_output_route4, it comes to crosses_local_route_boundary to judge whether the searched route can pass cross-local-route-boundary check.

Copy the code of crosses_local_route_boundary here and go deeper.

static inline bool crosses_local_route_boundary(int skb_af, struct sk_buff *skb,
						int rt_mode,
						bool new_rt_is_local)
{
	bool rt_mode_allow_local = !!(rt_mode & IP_VS_RT_MODE_LOCAL);
	bool rt_mode_allow_non_local = !!(rt_mode & IP_VS_RT_MODE_NON_LOCAL);
	bool rt_mode_allow_redirect = !!(rt_mode & IP_VS_RT_MODE_RDR);
	bool source_is_loopback;
	bool old_rt_is_local;

#ifdef CONFIG_IP_VS_IPV6
	/* omit ipv6 */
#endif
	{
		source_is_loopback = ipv4_is_loopback(ip_hdr(skb)->saddr);
		old_rt_is_local = skb_rtable(skb)->rt_flags & RTCF_LOCAL;
	}

	if (unlikely(new_rt_is_local)) {
		if (!rt_mode_allow_local)
			return true;
		if (!rt_mode_allow_redirect && !old_rt_is_local)
			return true;
	} else {
		if (!rt_mode_allow_non_local)
			return true;
		if (source_is_loopback)
			return true;
	}
	return false;
}

For nat mode of IPVS, rt_mode is assigned as IP_VS_RT_MODE_LOCAL | IP_VS_RT_MODE_NON_LOCAL | IP_VS_RT_MODE_RDR as https://github.com/torvalds/linux/blob/v4.18/net/netfilter/ipvs/ip_vs_xmit.c#L757-L759 indicates and new_rt_is_local is 0 due to https://github.com/torvalds/linux/blob/v4.18/net/netfilter/ipvs/ip_vs_xmit.c#L363.

source_is_loopback will be true because source address ip_hdr(skb)->saddr is 127.0.0.1. The five booleans defined in crosses_local_route_boundary will all be true in this case. So we finally fall into here

if (source_is_loopback)
	return true;

and never pass the cross-local-route-boundary check.

This can also supports @lbernail 's trial by modifying local route table. The src of the generated packet is not 127.0.0.1 any more, so it can pass the cross-local-route-boundary check and get into pods.

@miaoshixuan

We can work together on it if you really want to see the issue get fixed though I suspect few people will use localhost for nodeport in production environment.

@lbernail I changed my local route, everything works well. But when I restart kubelet, It will keep crash. here is my kubelet config

kubelet --logtostderr=false --log-dir=/var/log/kubernetes/kubelet --v=2 \
--address=192.168.1.103 --node-ip=192.168.1.103 --hostname-override=192.168.1.103 --read-only-port=0 \
--allow-privileged=true --cgroup-driver=systemd \
--anonymous-auth=false --client-ca-file=/etc//ssl/ca.pem --tls-cert-file=/etc/ssl/kube/kubelet.pem --tls-private-key-file=/etc/ssl/kube/kubelet-key.pem 
--cluster-dns=192.168.0.100 \
--kubeconfig=/etc/kubernetes/kubelet.kubeconfig

log here

I1228 17:17:40.616552   31553 docker_service.go:271] Setting cgroupDriver to systemd
I1228 17:17:40.616729   31553 kubelet.go:682] Starting the GRPC server for the docker CRI shim.
I1228 17:17:40.616901   31553 docker_server.go:59] Start dockershim grpc server
F1228 17:17:40.618414   31553 docker_service.go:408] Streaming server stopped unexpectedly: listen tcp 127.0.0.1:0: bind: cannot assign requested address
goroutine 48 [running]:
k8s.io/kubernetes/vendor/github.com/golang/glog.stacks(0xc00000e001, 0xc000e2e000, 0x57c, 0x2710)
	/home/xh/k8scode/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/github.com/golang/glog/glog.go:766 +0xd4
k8s.io/kubernetes/vendor/github.com/golang/glog.(*loggingT).output(0x6a6eaa0, 0xc000000003, 0xc000da0000, 0x65b8cb2, 0x11, 0x198, 0x0)
	/home/xh/k8scode/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/github.com/golang/glog/glog.go:720 +0x18e
k8s.io/kubernetes/vendor/github.com/golang/glog.(*loggingT).printf(0x6a6eaa0, 0x3, 0x3c078e5, 0x29, 0xc00007ffb8, 0x1, 0x1)
	/home/xh/k8scode/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/github.com/golang/glog/glog.go:655 +0x14b
k8s.io/kubernetes/vendor/github.com/golang/glog.Fatalf(0x3c078e5, 0x29, 0xc00007ffb8, 0x1, 0x1)
	/home/xh/k8scode/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/github.com/golang/glog/glog.go:1145 +0x67
k8s.io/kubernetes/pkg/kubelet/dockershim.(*dockerService).Start.func1(0xc00098e8c0)
	/home/xh/k8scode/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/kubelet/dockershim/docker_service.go:408 +0x99
created by k8s.io/kubernetes/pkg/kubelet/dockershim.(*dockerService).Start
	/home/xh/k8scode/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/kubelet/dockershim/docker_service.go:406 +0x7d

Environment:

  • Kubernetes version: 1.11.2
  • OS: centos7.4
  • Kernel: 3.10.0-693.el7.x86_64

@miaoshixuan Interesting, I never tested this fix in depth but just tried to find a way to make it work. Maybe changing the default route for 127.0.0.1 prevents binding on localhost? Sound suprising. I’ll try and reproduce next week

/area kube-proxy

Checking on whether PR https://github.com/kubernetes/kubernetes/pull/69206 will be merged to solve this issue.

We are facing this exact same issue - when running kube-proxy in IPVS mode, localhost:node_port is not accessible, whereas node_ip:node_port is accessible. In Iptables mode, both methods work.

The workaround suggested when the issue was filed works, but looking for a more formal way to solve this problem.