calico: Connection errors when running Calico on eBPF mode with more than one backend pod
Following the discussion on Slack about running Calico on eBPF: https://calicousers.slack.com/archives/CUKP5S64R/p1601550116016000
When running Calico on eBPF mode with a Kubernetes Service listening on a NodePort and running two backend pods, I have experienced 43 connection failures, out of around 250 thousand requests.
Running the same scenario, but with one backend pod, resulted in absolutely no connection errors.
The same setup was used with Calico on iptables mode and there were also no connection errors, regardless of the number of backend pods.
Setup
Kubernetes v1.19.2 (self-managed, the hard way) running on AWS with one master node and two worker nodes. Nodes run on Ubuntu 20.04.1 LTS with Kernel 5.4.0-1029-aws and Docker-runtime 19.3.8.
Calico v3.17.1 on eBPF mode and flags FELIX_BPFENABLED=true, CALICO_IPV4POOL_IPIP=Never, CALICO_IPV4POOL_VXLAN=Never, CALICO_IPV4POOL_NAT_OUTGOING=true.
Taurus was configured to do 20 concurrent requests for 10 minutes and making requests to both nodes on the exposed node port. Therefore, each node received around 125 thousand requests each.
There was no conversion from eBPF to iptables or vice-versa, the cluster was created from scratch with the specific setup.
- Client IP =
10.199.1.4 - Node one
- Node =
10.209.0.203:30080 - Pod =
10.210.76.132:8080
- Node =
- Node two
- Node =
10.209.2.205:30080 - Pod =
10.210.79.5:8080
- Node =
Running only the bare minimum pods.
# kubectl --context instapro.calico get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
echoserver echoserver-588cbfb4d6-4zqdq 1/1 Running 0 39m <- Scaled up and down during the runs
kube-system calico-kube-controllers-85764cbd48-jpzhj 1/1 Running 0 3h8m
kube-system calico-node-qb6rn 1/1 Running 0 104m
kube-system calico-node-w2p2h 1/1 Running 0 104m
kube-system calico-node-zs7x4 1/1 Running 0 104m
kube-system coredns-65f6755d5c-f4vmf 1/1 Running 0 120m
kube-system coredns-65f6755d5c-jgst9 1/1 Running 0 3h6m
kube-system coredns-65f6755d5c-sh68p 1/1 Running 0 120m
Service with NodePort
# kubectl --context instapro.calico --namespace echoserver get service echoserver --output yaml
apiVersion: v1
kind: Service
metadata:
name: echoserver
namespace: echoserver
spec:
clusterIP: 10.211.138.184
externalTrafficPolicy: Cluster
ports:
- name: http
nodePort: 30080
port: 80
protocol: TCP
targetPort: http
selector:
app.kubernetes.io/name: echoserver
sessionAffinity: None
type: NodePort
status:
loadBalancer: {}
Deployment
# kubectl --context instapro.calico --namespace echoserver get deployment echoserver --output yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: echoserver
namespace: echoserver
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: echoserver
template:
metadata:
labels:
app.kubernetes.io/name: echoserver
spec:
containers:
- name: echoserver
image: gcr.io/google_containers/echoserver:1.10
ports:
- name: http
containerPort: 8080
One pod, no failures
I have run it multiple times. It hasn’t failed so far.
18:43:29 INFO: Test duration: 0:10:13
18:43:29 INFO: Samples count: 309985, 0.00% failures
18:43:29 INFO: Average times: total 0.038, latency 0.038, connect 0.001
18:43:29 INFO: Percentiles:
┌───────────────┬───────────────┐
│ Percentile, % │ Resp. Time, s │
├───────────────┼───────────────┤
│ 0.0 │ 0.025 │
│ 50.0 │ 0.034 │
│ 90.0 │ 0.045 │
│ 95.0 │ 0.062 │
│ 99.0 │ 0.089 │
│ 99.9 │ 0.269 │
│ 100.0 │ 1.95 │
└───────────────┴───────────────┘
18:43:29 INFO: Request label stats:
┌───────────────────────────┬────────┬─────────┬────────┬───────┐
│ label │ status │ succ │ avg_rt │ error │
├───────────────────────────┼────────┼─────────┼────────┼───────┤
│ http://10.209.0.203:30080 │ OK │ 100.00% │ 0.038 │ │
│ http://10.209.2.205:30080 │ OK │ 100.00% │ 0.038 │ │
└───────────────────────────┴────────┴─────────┴────────┴───────┘
One pod on each node, 43 failures
tcpdump’s are available here: https://www.dropbox.com/sh/hydbxyrlo9qdvwa/AACMxFb9YlbwSo1NzZLNGdyHa
I have run it many times. The number of failures is not static. It always fails.
18:59:49 INFO: Test duration: 0:13:04
18:59:49 INFO: Samples count: 258401, 0.02% failures
18:59:49 INFO: Average times: total 0.050, latency 0.036, connect 0.001
18:59:49 INFO: Percentiles:
┌───────────────┬───────────────┐
│ Percentile, % │ Resp. Time, s │
├───────────────┼───────────────┤
│ 0.0 │ 0.024 │
│ 50.0 │ 0.033 │
│ 90.0 │ 0.041 │
│ 95.0 │ 0.056 │
│ 99.0 │ 0.076 │
│ 99.9 │ 0.233 │
│ 100.0 │ 256.64 │
└───────────────┴───────────────┘
18:59:49 INFO: Request label stats:
┌───────────────────────────┬────────┬────────┬────────┬────────────────────────────────────────────────┐
│ label │ status │ succ │ avg_rt │ error │
├───────────────────────────┼────────┼────────┼────────┼────────────────────────────────────────────────┤
│ http://10.209.0.203:30080 │ FAIL │ 99.98% │ 0.048 │ Non HTTP response message: Connection reset │
│ │ │ │ │ Non HTTP response message: Operation timed out │
│ http://10.209.2.205:30080 │ FAIL │ 99.98% │ 0.051 │ Non HTTP response message: Connection reset │
│ │ │ │ │ Non HTTP response message: Operation timed out │
└───────────────────────────┴────────┴────────┴────────┴────────────────────────────────────────────────┘
tcpdump’s commands
Node one
# tcpdump -i any port 30080 and src host 10.199.1.4 -nlvv -w node-10.209.0.203-30080.tcpdump
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes
277387 packets captured
277387 packets received by filter
0 packets dropped by kernel
# tcpdump -i calibc48453fc8e src host 10.199.1.4 -nlvv -w node-10.209.0.203-pod.tcpdump
tcpdump: listening on calibc48453fc8e, link-type EN10MB (Ethernet), capture size 262144 bytes
278440 packets captured
278440 packets received by filter
0 packets dropped by kernel
Node two
# tcpdump -i any port 30080 and src host 10.199.1.4 -nlvv -w node-10.209.2.205-30080.tcpdump
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes
277433 packets captured
277433 packets received by filter
0 packets dropped by kernel
# tcpdump -i cali8d5c3169ef1 src host 10.199.1.4 -nlvv -w node-10.209.2.205-pod.tcpdump
tcpdump: listening on cali8d5c3169ef1, link-type EN10MB (Ethernet), capture size 262144 bytes
276258 packets captured
276258 packets received by filter
0 packets dropped by kernel
Other runs
2 pods, all on node one = 19 failures 2 pods, all on node two = 22 failures
10 pods, all on node one = 31 failures 10 pods, all on node two = 26 failures
10 pods, 5 on each node = 24 failures 50 pods, 25 on each node = 38 failures
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 19 (8 by maintainers)
Yes, I just got back to that and added a test so I’m just waiting on review now.