kubernetes: kube-proxy ipvs mode cannot access clusterip:port when node reboot

What happened:

Because of known issue https://github.com/kubernetes/kubernetes/issues/71071, we update kube-proxy to 1.12.5 when using k8s 1.12.3, it does solve the problem that kube-proxy stuck after k8s run normally for a period of time. But when node reboot, sometimes we cannot access all clusterip:port It doesn’t happen every time, but it has happened in many clusters. If I manually restart kube-proxy, it all recovers and works fine until the node reboot again

for example, kubectl get svc |grep kubernetes kubernetes ClusterIP 192.168.0.1 <none> 443/TCP 51d

curl 192.168.0.1:443 shows Failed connect to 192.168.0.1:443; Connection refused

however it’s ok when curl {masterip}:6443

detailed information: kube-proxy in ipvs mode lsmod | grep -e ipvs -e nf_conntrack_ipv4 output: nf_conntrack_ipv4 16384 261 nf_defrag_ipv4 16384 1 nf_conntrack_ipv4 nf_conntrack 135168 11 xt_conntrack,nf_nat_masquerade_ipv4,nf_conntrack_ipv6,nf_conntrack_ipv4,nf_nat,nf_nat_ipv6,ipt_MASQUERADE,nf_nat_ipv4,xt_nat,nf_conntrack_netlink,ip_vs

cut -f1 -d " " /proc/modules | grep -e ip_vs -e nf_conntrack_ipv4 output: nf_conntrack_ipv4 ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs

and the clusterip:port does exist in ipvs,
ipvsadm -ln |grep 192.168.0.1 -C 2 output: TCP {masterip}:21177 rr TCP {masterip}:25598 rr TCP 192.168.0.1:443 rr -> {masterip}:6443 Masq 1 10 0 TCP 192.168.0.3:53 rr

kube-proxy logs: server_others.go:189] Using ipvs Proxier. proxier.go:314] missing br-netfilter module or unset sysctl br-nf-call-iptables; proxy may not work as intended proxier.go:368] IPVS scheduler not specified, use rr by default server_others.go:216] Tearing down inactive rules. server.go:447] Version: v1.12.5 onntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_max' to 917504 conntrack.go:52] Setting nf_conntrack_max to 917504 conntrack.go:83] Setting conntrack hashsize to 229376 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600 config.go:102] Starting endpoints config controller controller_utils.go:1027] Waiting for caches to sync for endpoints config controller config.go:202] Starting service config controller controller_utils.go:1027] Waiting for caches to sync for service config controller controller_utils.go:1034] Caches are synced for endpoints config controller controller_utils.go:1034] Caches are synced for service config controller graceful_termination.go:160] Trying to delete rs: 192.168.13.138:44134/TCP/192.168.111.49:44134 graceful_termination.go:174] Deleting rs: 192.168.13.138:44134/TCP/192.168.111.49:44134 graceful_termination.go:160] Trying to delete rs: 192.168.0.3:53/TCP/192.168.111.43:53 graceful_termination.go:171] Not deleting, RS 192.168.0.3:53/TCP/192.168.111.43:53: 0 ActiveConn, 1 InactiveConn graceful_termination.go:160] Trying to delete rs: 192.168.0.3:9153/TCP/192.168.111.43:9153 graceful_termination.go:174] Deleting rs: 192.168.0.3:9153/TCP/192.168.111.43:9153

What you expected to happen: it’s ok when curl {clusterip}:{port}

How to reproduce it (as minimally and precisely as possible): happen by chance when node reboot it has happened in many clusters

Environment:

Kubernetes version: 1.12.3(only kube-proxy 1.12.5)
OS: centos 7.5.1804
Kernel: 4.17.11-1
Install tools: kubespray
Network plugin and version (if this is a network-related bug): calico v3.1.3

About this issue

Original URL
State: closed
Created 5 years ago
Comments: 16 (6 by maintainers)

Most upvoted comments

@zh168654

Unfortunately, this is not the case for me. I have recently upgraded my cluster to v1.15.9, which still exists.

[root@k8s-1 ~]# kubectl version --short
Client Version: v1.15.9-beta.0
Server Version: v1.15.9-beta.0

[root@k8s-1 ~]# kubectl -n kube-system get daemonset calico-node -o yaml |grep -i "image:"
        image: calico/node:v3.11.1
        image: calico/cni:v3.11.1
      - image: calico/pod2daemon-flexvol:v3.11.1

[root@k8s-5 ~]# curl https://10.254.0.1

[root@k8s-5 ~]# ipvsadm -Lnc |grep 10.254.0.1
TCP 00:50  SYN_RECV    10.254.0.1:56808   10.254.0.1:443     10.111.32.241:6443

gitbeyond on Feb 14, 2020