kubernetes: Kube-proxy/ipvs;Frequently cannot simultaneous access clusterip and endpoint ip(real server) in pod

What happened: I use calico cni and kube_proxy is on ipvs mode. i found it’s not stable when i simultaneous access apiserver in pod use clusterip and endpoint ip(real server)

** 172.31.0.1 is cluster ip ** 172.28.49.135 is pod ip ** 10.7.210.11 is endpoint ip (real server)

image

sh-4.2# cat /proc/net/nf_conntrack|grep 40008
ipv4     2 tcp      6 2 CLOSE src=172.28.49.135 dst=172.31.0.1 sport=40008 dport=443 src=172.31.0.1 dst=172.28.49.135 sport=443 dport=40008 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=0 use=2
ipv4     2 tcp      6 63 SYN_SENT src=172.28.49.135 dst=10.7.210.11 sport=40008 dport=16443 [UNREPLIED] src=10.7.210.11 dst=172.28.49.135 sport=16443 dport=40008 mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=0 use=2

What you expected to happen:

Stay stable when i simultaneous access apiserver in pod use clusterip and endpoint ip(real server)

How to reproduce it (as minimally and precisely as possible):

enter an pod and execute two command: 1.while true; do curl -k https://{api-server-clusterip}:{api-server-service-port}/healthz; done 2.while true; do curl -k https://{api-server-realserver-ip}:{api-server-realserver-port}/healthz; done

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):1.18
  • Cloud provider or hardware configuration:
  • OS (e.g: cat /etc/os-release): CentOS 7
  • Kernel (e.g. uname -a): 3.10.0-957.el7.x86_64
  • Install tools:
  • Network plugin and version (if this is a network-related bug): calico
  • Others:

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 52 (26 by maintainers)

Most upvoted comments

Hi guys!

I reproduced this problem with kernel =5.9 and k8s=1.23. Client with same source port can’t access to Virtual Server and Real Server at same time in IPVS mode. Please see shown the below:

[root@master ~]# uname -a
Linux master 5.9.8-1.el8.elrepo.x86_64 #1 SMP Wed Nov 11 09:27:50 EST 2020 x86_64 x86_64 x86_64 GNU/Linux
[root@master ~]# kubectl version --short
Client Version: v1.23.6
Server Version: v1.23.7
[root@master ~]# kubectl get po -o wide
NAME     READY   STATUS    RESTARTS      AGE   IP               NODE     NOMINATED NODE   READINESS GATES
client   1/1     Running   1 (22h ago)   38h   192.168.219.77   master   <none>           <none>
ep       1/1     Running   0             18h   192.168.219.79   master   <none>           <none>
[root@master ~]# kubectl get svc
NAME         TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
kubernetes   ClusterIP   10.96.0.1        <none>        443/TCP    16d
vip1         ClusterIP   10.108.122.180   <none>        8080/TCP   36h

the pod “ep” is RS of vip1.

[root@master ~]# kubectl exec -it client bash
# client-bash1:
root@client:~# nc -p 5566 10.108.122.180 8080

Then will see the connection is ESTABLISHED:

[root@master ~]# conntrack -E -o ktimestamp -s 192.168.219.77
    [NEW] tcp      6 120 SYN_SENT src=192.168.219.77 dst=10.108.122.180 sport=5566 dport=8080 [UNREPLIED] src=192.168.219.79 dst=192.168.219.77 sport=8080 dport=5566
 [UPDATE] tcp      6 60 SYN_RECV src=192.168.219.77 dst=10.108.122.180 sport=5566 dport=8080 src=192.168.219.79 dst=192.168.219.77 sport=8080 dport=5566
 [UPDATE] tcp      6 432000 ESTABLISHED src=192.168.219.77 dst=10.108.122.180 sport=5566 dport=8080 src=192.168.219.79 dst=192.168.219.77 sport=8080 dport=5566 [ASSURED]

and new a terminal for client-pod, and access to Real Server with same source port indirectly:

[root@master ~]# kubectl exec -it client bash
root@client:~# nc -p 5566 192.168.219.79 8080
Ncat: TIMEOUT.

Wait a while, Ncat reports TIMEOUT. and look up conntrack entry on master:

[NEW] tcp 6 120 SYN_SENT src=192.168.219.77 dst=192.168.219.79 sport=5566 dport=8080 [UNREPLIED] src=192.168.219.79 dst=192.168.219.77 sport=8080 dport=3663
[UPDATE] tcp 6 60 SYN_RECV src=192.168.219.77 dst=192.168.219.79 sport=5566 dport=8080 src=192.168.219.79 dst=192.168.219.77 sport=8080 dport=3663
[UPDATE] tcp 6 60 SYN_RECV src=192.168.219.77 dst=192.168.219.79 sport=5566 dport=8080 src=192.168.219.79 dst=192.168.219.77 sport=8080 dport=3663
[UPDATE] tcp 6 60 SYN_RECV src=192.168.219.77 dst=192.168.219.79 sport=5566 dport=8080 src=192.168.219.79 dst=192.168.219.77 sport=8080 dport=3663
[UPDATE] tcp 6 60 SYN_RECV src=192.168.219.77 dst=192.168.219.79 sport=5566 dport=8080 src=192.168.219.79 dst=192.168.219.77 sport=8080 dport=3663
[DESTROY] tcp 6 src=192.168.219.77 dst=192.168.219.79 sport=5566 dport=8080 src=192.168.219.79 dst=192.168.219.77 sport=8080 dport=3663

RS retry sends SYN_RECV four times but no response, then the connection is destroyed. I notice that the dport of reply conntrack is 3663, Instead of the expected 5566. I don’t know why the port has changed, I checked the kernel source code and didn’t find the logic to modify the port (maybe I missed it).

On client side(client pod), The status of the connection stays at SYN_SENT

root@client:~# netstat -putan
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      1 192.168.219.77:5566     192.168.219.79:8080     SYN_SENT    9615/nc
tcp        0      0 192.168.219.77:5566     10.108.122.180:8080     ESTABLISHED 9602/nc
tcp6       0      0 :::8080                 :::*                    LISTEN      1/./usr/bin/main

On server side(RS), The status of the connection stays at SYN_RECV

root@ep:~# netstat -putan
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp6 0 0 :::8080 :::* LISTEN 1/./usr/bin/main
tcp 0 0 192.168.219.79:8080 192.168.219.77:3663 SYN_RECV -
tcp 0 0 192.168.219.79:8080 192.168.219.77:5566 ESTABLISHED 1/./usr/bin/main

I try to find some message about this by enable net.netfilter.nf_conntrack_log_invalid:

[root@master ~]# sysctl -w net.netfilter.nf_conntrack_log_invalid=6
net.netfilter.nf_conntrack_log_invalid = 6

I found out that kernel thinks this reply packet is not invalid, so drop it.

[94416.421511] nf_ct_proto_6: invalid packet ignored in state SYN_RECV  IN= OUT= SRC=192.168.219.77 DST=192.168.219.79 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=7877 DF PROTO=TCP SPT=5566 DPT=8080 SEQ=4068803588 ACK=0 WINDOW=64860 RES=0x00 SYN URGP=0 OPT (020405820402080A3EF6B3600000000001030307) MARK=0x40000
[94416.421652] nf_ct_proto_6: invalid packet ignored in state SYN_RECV  IN= OUT= SRC=192.168.219.79 DST=192.168.219.77 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=8080 DPT=3663 SEQ=678911481 ACK=4068803589 WINDOW=64308 RES=0x00 ACK SYN URGP=0 OPT (020405820402080AC88C1AF63EF6AF4B01030307) MARK=0x40000
[94418.469510] nf_ct_proto_6: invalid packet ignored in state SYN_RECV  IN= OUT= SRC=192.168.219.79 DST=192.168.219.77 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=8080 DPT=3663 SEQ=678911481 ACK=4068803589 WINDOW=64308 RES=0x00 ACK SYN URGP=0 OPT (020405820402080AC88C22F63EF6AF4B01030307) MARK=0x40000
[94418.469558] nf_ct_proto_6: invalid packet ignored in state SYN_RECV  IN= OUT= SRC=192.168.219.77 DST=192.168.219.79 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=7878 DF PROTO=TCP SPT=5566 DPT=8080 SEQ=4068803588 ACK=0 WINDOW=64860 RES=0x00 SYN URGP=0 OPT (020405820402080A3EF6BB600000000001030307) MARK=0x40000
[94422.502506] nf_ct_proto_6: invalid packet ignored in state SYN_RECV  IN= OUT= SRC=192.168.219.77 DST=192.168.219.79 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=7879 DF PROTO=TCP SPT=5566 DPT=8080 SEQ=4068803588 ACK=0 WINDOW=64860 RES=0x00 SYN URGP=0 OPT (020405820402080A3EF6CB210000000001030307) MARK=0x40000
[94422.502654] nf_ct_proto_6: invalid packet ignored in state SYN_RECV  IN= OUT= SRC=192.168.219.79 DST=192.168.219.77 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=8080 DPT=3663 SEQ=678911481 ACK=4068803589 WINDOW=64308 RES=0x00 ACK SYN URGP=0 OPT (020405820402080AC88C32B73EF6AF4B01030307) MARK=0x40000

But the strange thing is:if I access to RS firstly and then VS, it works. both connections are successfully established:

# In RS-bash:
root@ep:~# netstat -putan
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp6 0 0 :::8080 :::* LISTEN 1/./usr/bin/main
tcp6 0 0 192.168.219.79:8080 192.168.219.77:22284 ESTABLISHED 1/./usr/bin/main
tcp6 0 0 192.168.219.79:8080 192.168.219.77:5566 ESTABLISHED 1/./usr/bin/main

#  On master:

[NEW] tcp 6 120 SYN_SENT src=192.168.219.77 dst=192.168.219.79 sport=5566 dport=8080 [UNREPLIED] src=192.168.219.79 dst=192.168.219.77 sport=8080 dport=5566
[UPDATE] tcp 6 59 SYN_RECV src=192.168.219.77 dst=192.168.219.79 sport=5566 dport=8080 src=192.168.219.79 dst=192.168.219.77 sport=8080 dport=5566
[UPDATE] tcp 6 432000 ESTABLISHED src=192.168.219.77 dst=192.168.219.79 sport=5566 dport=8080 src=192.168.219.79 dst=192.168.219.77 sport=8080 dport=5566 [ASSURED]

[NEW] tcp 6 120 SYN_SENT src=192.168.219.77 dst=10.108.122.180 sport=5566 dport=8080 [UNREPLIED] src=192.168.219.79 dst=192.168.219.77 sport=8080 dport=22284
[UPDATE] tcp 6 60 SYN_RECV src=192.168.219.77 dst=10.108.122.180 sport=5566 dport=8080 src=192.168.219.79 dst=192.168.219.77 sport=8080 dport=22284
[UPDATE] tcp 6 432000 ESTABLISHED src=192.168.219.77 dst=10.108.122.180 sport=5566 dport=8080 src=192.168.219.79 dst=192.168.219.77 sport=8080 dport=22284 [ASSURED]

I’m so confused. I agree this is kernel bug instead of kube-proxy, I don’t know what’s the root cause.

NOTE: mode iptables is works well.

@uablrek Thanks for the troubleshooting! proxy-mode=iptables is GOOD for us, but we want to found the root cause ,because we have the same problem when we use loadbalancer(based ipvs nat,like:keepalived) .

IPVS NAT not support simultaneous access vip and realserver ip by same client port ? image

I speculate answer is not. (I test it with ‘nc’ and ‘ncat’,failed connected). iptables is good because it do the NAT for the port.

@SerialVelocity

This sounds a bit like something I’ve debugged before. Can you try setting --local-port 40008 in curl for both loops?

curl may not support two client use the some local port.

image