cilium: kube-proxy replacement: service with multiple externalIPs problems
Bug report
General Information
- Cilium version (run
cilium version)
Client: 1.7.4 c7ee6d62b 2020-05-15T16:07:35+02:00 go version go1.13.10 linux/amd64
Daemon: 1.7.4 c7ee6d62b 2020-05-15T16:07:35+02:00 go version go1.13.10 linux/amd64
- Kernel version (run
uname -a)
Linux api1 5.4.0-31-generic #35-Ubuntu SMP Thu May 7 20:20:34 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
- Orchestration system version in use (e.g.
kubectl version, Mesos, …)
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.3", GitCommit:"2e7996e3e2712684bc73f0dec0200d64eec7fe40", GitTreeState:"clean", BuildDate:"2020-05-20T12:52:00Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.3", GitCommit:"2e7996e3e2712684bc73f0dec0200d64eec7fe40", GitTreeState:"clean", BuildDate:"2020-05-20T12:43:34Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
- Link to relevant artifacts (policies, deployments scripts, …)
helm template cilium cilium/cilium --namespace kube-system --set global.k8sServiceHost=10.2.5.130 --set global.k8sServicePort=6443 --set global.kubeProxyReplacement=strict --set global.tunnel=disabled --set global.autoDirectNodeRoutes=true --set global.nodePort.mode=dsr > cilium.yaml
kubectl apply -f cilium.yaml
- Upload a system dump (run
curl -sLO https://github.com/cilium/cilium-sysdump/releases/latest/download/cilium-sysdump.zip && python cilium-sysdump.zipand then attach the generated zip file)
How to reproduce the issue
- 3 node k8s cluster:
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
api1 Ready master 49m v1.18.3 10.2.5.130 <none> Ubuntu 20.04 LTS 5.4.0-31-generic docker://19.3.8
api2 Ready <none> 48m v1.18.3 10.2.5.131 <none> Ubuntu 20.04 LTS 5.4.0-31-generic docker://19.3.8
api3 Ready <none> 48m v1.18.3 10.2.5.132 <none> Ubuntu 20.04 LTS 5.4.0-31-generic docker://19.3.8
- Add IP address 10.2.5.140/32 to the node api2, and 10.2.5.141/32 to api3.
- Publish service with 2 externalIPs:
kubectl create deployment nginx --image nginx
kubectl expose deployment nginx --port=80 --target-port=80 --type=NodePort
Then kubectl edit svc nginx and add this section to service spec
externalIPs:
- 10.2.5.140
- 10.2.5.141
- From the client VM test the service:
ab -c 10 -n 10000 10.2.5.141/
ab -c 10 -n 10000 10.2.5.140/
In my case 10.2.5.141 is working as expected (with strange delays, but tolerable), but 10.2.5.140 doesn’t.
ab -c 10 -n 10000 10.2.5.140/
This is ApacheBench, Version 2.3 <$Revision: 1807734 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 10.2.5.140 (be patient)
apr_pollset_poll: The timeout specified has expired (70007)
In tcpdump output I see
08:56:35.441515 IP 10.2.5.68.54088 > 10.2.5.140.80: Flags [S], seq 4187745149, win 64240, options [mss 1460,sackOK,TS val 2815165716
ecr 0,nop,wscale 7], length 0
08:56:35.441747 IP 10.2.5.68.54090 > 10.2.5.140.80: Flags [S], seq 3356518998, win 64240, options [mss 1460,sackOK,TS val 2815165717
ecr 0,nop,wscale 7], length 0
08:56:35.441876 IP 10.2.5.68.54092 > 10.2.5.140.80: Flags [S], seq 1265403744, win 64240, options [mss 1460,sackOK,TS val 2815165717
ecr 0,nop,wscale 7], length 0
08:56:35.441957 IP 10.2.5.141.80 > 10.2.5.68.54088: Flags [S.], seq 2030348502, ack 4187745150, win 65160, options [mss 1460,sackOK,T
S val 1975248262 ecr 2815165716,nop,wscale 7], length 0
08:56:35.441968 IP 10.2.5.68.54088 > 10.2.5.141.80: Flags [R], seq 4187745150, win 0, length 0
08:56:35.441976 IP 10.2.5.141.80 > 10.2.5.68.54090: Flags [S.], seq 2394654131, ack 3356518999, win 65160, options [mss 1460,sackOK,T
S val 1975248262 ecr 2815165717,nop,wscale 7], length 0
08:56:35.441978 IP 10.2.5.68.54090 > 10.2.5.141.80: Flags [R], seq 3356518999, win 0, length 0
08:56:35.442030 IP 10.2.5.141.80 > 10.2.5.68.54092: Flags [S.], seq 1011437172, ack 1265403745, win 65160, options [mss 1460,sackOK,T
S val 1975248262 ecr 2815165717,nop,wscale 7], length 0
08:56:35.442036 IP 10.2.5.68.54092 > 10.2.5.141.80: Flags [R], seq 1265403745, win 0, length 0
08:56:35.442208 IP 10.2.5.68.54094 > 10.2.5.140.80: Flags [S], seq 3949975832, win 64240, options [mss 1460,sackOK,TS val 2815165717
ecr 0,nop,wscale 7], length 0
08:56:35.442303 IP 10.2.5.68.54096 > 10.2.5.140.80: Flags [S], seq 3746109519, win 64240, options [mss 1460,sackOK,TS val 2815165717
ecr 0,nop,wscale 7], length 0
08:56:35.442398 IP 10.2.5.68.54098 > 10.2.5.140.80: Flags [S], seq 901884488, win 64240, options [mss 1460,sackOK,TS val 2815165717 e
cr 0,nop,wscale 7], length 0
08:56:35.442434 IP 10.2.5.141.80 > 10.2.5.68.54094: Flags [S.], seq 2702955072, ack 3949975833, win 65160, options [mss 1460,sackOK,T
S val 1975248262 ecr 2815165717,nop,wscale 7], length 0
08:56:35.442440 IP 10.2.5.68.54094 > 10.2.5.141.80: Flags [R], seq 3949975833, win 0, length 0
08:56:35.442447 IP 10.2.5.141.80 > 10.2.5.68.54096: Flags [S.], seq 3086612402, ack 3746109520, win 65160, options [mss 1460,sackOK,T
S val 1975248262 ecr 2815165717,nop,wscale 7], length 0
08:56:35.442449 IP 10.2.5.68.54096 > 10.2.5.141.80: Flags [R], seq 3746109520, win 0, length 0
08:56:35.442475 IP 10.2.5.141.80 > 10.2.5.68.54098: Flags [S.], seq 2698710660, ack 901884489, win 65160, options [mss 1460,sackOK,TS
val 1975248262 ecr 2815165717,nop,wscale 7], length 0
08:56:35.442478 IP 10.2.5.68.54098 > 10.2.5.141.80: Flags [R], seq 901884489, win 0, length 0
08:56:35.442630 IP 10.2.5.68.54100 > 10.2.5.140.80: Flags [S], seq 2038584870, win 64240, options [mss 1460,sackOK,TS val 2815165718
ecr 0,nop,wscale 7], length 0
08:56:35.442717 IP 10.2.5.141.80 > 10.2.5.68.54100: Flags [S.], seq 2617589878, ack 2038584871, win 65160, options [mss 1460,sackOK,T
S val 1975248262 ecr 2815165718,nop,wscale 7], length 0
08:56:35.442725 IP 10.2.5.68.54100 > 10.2.5.141.80: Flags [R], seq 2038584871, win 0, length 0
About this issue
- Original URL
- State: open
- Created 4 years ago
- Comments: 16 (8 by maintainers)
We are facing the same issue. Our setup has a service with 2 ExternalIP mapped to same backend and when client reuses the source port to access the same backend via two VIPs, second request gets timedout because the response is still from vip 1 for the request made to vip 2.
e.g. let says we vip two vips(two ExternalIPs service mapped to same backend). Vip1: 1.1.1.1 Vip2: 2.2.2.2 Backend: 3.3.3.3 Client: 4.4.4.4
First Request sent to vip 1 works fine. 4.4.4.4:27182 —> 1.1.1.1:80 —> 3.3.3.3:80
Second Request sent to vip2 with same client ip and source port has issue, because response for the request is sent with source ip as vip 1 ip and hence the client Reset and initiate the new SYN and goes in the same flow until it timeout. 4.4.4.4:27182 —> 2.2.2.2:80 —> 3.3.3.3:80
@borkmann yes, i also checked the DSR mode - issue is still here.