cilium: Request fails in all nodes every 5 minutes and when k8s service metadata is updated.
Bug reports
Title
Hi again, Thank you for #4510. After that time, I have tested a lot and found some problems 😃.
I tested sending requests from pod to cluster IP. I monitored the result, but there were request fails without any controls. It irregularly happened in all nodes about every 5 minutes. In this case, the revision number of Kubernetes resource is increased. Furthermore, when I tried to update k8s service, there were always fails either. (Even I changed metadata’s annotations, labels…)
General Information
- Cilium version (run
cilium version
) I have tested many versions of Cilium.
I see these versions have the problem. 1.2.0, 1.2.1, latest(https://github.com/cilium/cilium/commit/a270d52ef158371fa2be06db53df39866af0080c)
These versions are OK for this issue. 1.0.9 (sorry I didn’t test 1.1)
-
Kernel version (run
uname -a
) 4.16.14-1.20180611.el7.centos.x86_64 -
Orchestration system version in use (e.g.
kubectl version
, Mesos, …) kubernetes 1.11 -
Link to relevant artifacts (policies, deployments scripts, …)
simple nodejs app
nodejs_deployment.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: nodejs-simple-deployment
spec:
replicas: 10
template:
metadata:
labels:
app: nodejs-simple-server
spec:
containers:
- name: nodejs-simple-server
image: heroku/nodejs-hello-world
imagePullPolicy: Always
ports:
- containerPort: 8000
nodejs_simple_service.yaml
apiVersion: v1
kind: Service
metadata:
name: nodejs-simple-server
labels:
app: nodejs-simple-server
spec:
selector:
app: nodejs-simple-server
ports:
- port: 3000
protocol: TCP
locust_value.yaml
Name: locust
image:
repository: quay.io/honestbee/locust
tag: 0.7.5
pullPolicy: IfNotPresent
service:
name: master-web
type: NodePort
externalPort: 8089
internalPort: 8089
nodePort: 0
annotations: {}
extraLabels: {}
master:
config:
target-host: http://10.108.193.145:3000
resources:
limits:
cpu: 100m
memory: 128Mi
requests:
cpu: 100m
memory: 128Mi
worker:
config:
# all files from tasks folder are mounted under `/locust-tasks`
locust-script: "/locust-tasks/tasks.py"
/charts/locust/tasks/task.py
from locust import HttpLocust, TaskSet, task
class ElbTasks(TaskSet):
@task
def status(self):
self.client.get("/")
class ElbWarmer(HttpLocust):
task_set = ElbTasks
min_wait = 1000
max_wait = 3000
locust -> nodejs_simple_service(cluster-ip:10.108.193.145:3000) request test
test app(locust): https://github.com/helm/charts/tree/master/stable/locust
How to reproduce the issue
- Create simple app (in my case, nodejs) nodejs_deployment.yaml nodejs_simple_service.yaml
$ kubectl apply -f nodejs_deployment.yaml
$ kubectl apply -f nodejs_simple_service.yaml
$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 9h
nodejs-simple-server ClusterIP 10.108.193.145 <none> 3000/TCP 21m
# service cluster ip: 10.108.193.145
- Create locust (https://github.com/helm/charts/tree/master/stable/locust)
helm install --name nodejs-tester -f locust_values.yaml <locust helm chart location>
# please check /charts/locust/tasks/task.py
-
Send requests using locust to clusterIP (10.108.193.145)
-
You can see request fails about every 5 minutes(little irregular)
-
In addition, there is a definite fails with updating metadata while sending HTTP requests.
$ kubectl edit svc nodejs-simple-server
I have some ideas for these symptom. I assume that Service watch API of k8s with --rsync always updates all Service themselves every 5 minutes by k8s, and it may affect Cilium because Cilium may be also updating at that time.
$ cilium status --all-controllers
KVStore: Ok etcd: 3/3 connected: http://10.127.146.91:2379 - 3.2.22 (Leader); http://10.127.157.99:2379 - 3.2.22; http://10.127.158.106:2379 - 3.2.22
ContainerRuntime: Ok containerD events watcher: Ok; cri-containerd client: Ok - &StatusResponse{Status:&RuntimeStatus{Conditions:[&RuntimeCondition{Type:RuntimeReady,Status:true,Reason:,Message:,} &RuntimeCondition{Type:NetworkReady,Status:true,Reason:,Message:,}],},Info:map[string]string{},}
Kubernetes: Ok 1.11 (v1.11.2) [linux/amd64]
Kubernetes APIs: ["core/v1::Endpoint", "core/v1::Pods", "core/v1::Node", "core/v1::Namespace", "CustomResourceDefinition", "cilium/v2::CiliumNetworkPolicy", "networking.k8s.io/v1::NetworkPolicy", "core/v1::Service"]
Cilium: Ok OK
NodeMonitor: Disabled
Cilium health daemon: Ok
IPv4 address pool: 5/127 allocated
IPv6 address pool: 4/65535 allocated
Controller Status: 24/24 healthy
Name Last success Last error Count Message
cilium-health-ep 12s ago never 0 no error
dns-poller 5s ago never 0 no error
etcd-lease-keepalive-0xc4215a4dc0 1m17s ago never 0 no error
ipcache-bpf-garbage-collection 1m15s ago never 0 no error
kvstore-etcd-session-renew never never 0 no error
kvstore-sync-store-cilium/state/nodes/v1 16s ago never 0 no error
lxcmap-bpf-host-sync 5s ago never 0 no error
metricsmap-bpf-prom-sync 4s ago never 0 no error
resolve-identity-21092 1m13s ago never 0 no error
sync-IPv4-identity-mapping (14120) 1m13s ago never 0 no error
sync-IPv4-identity-mapping (21092) 1m13s ago never 0 no error
sync-IPv4-identity-mapping (40104) 1m13s ago never 0 no error
sync-IPv6-identity-mapping (14120) 1m13s ago never 0 no error
sync-IPv6-identity-mapping (21092) 1m13s ago never 0 no error
sync-IPv6-identity-mapping (40104) 1m13s ago never 0 no error
sync-identity-to-k8s-pod (14120) 12s ago never 0 no error
sync-identity-to-k8s-pod (21092) 13s ago never 0 no error
sync-identity-to-k8s-pod (40104) 13s ago never 0 no error
sync-policymap-14120 13s ago never 0 no error
sync-policymap-21092 13s ago never 0 no error
sync-policymap-40104 13s ago never 0 no error
sync-to-k8s-ciliumendpoint (14120) 8s ago never 0 no error
sync-to-k8s-ciliumendpoint (40104) 8s ago never 0 no error
sync-to-k8s-ciliumendpoint-gc (kube-ing-node003-nucleo-al7-jp2v-dev) 12s ago never 0 no error
Proxy Status: OK, ip 192.168.5.1, port-range 10000-20000
Cluster health: 11/11 reachable (2018-08-30T02:41:05Z)
sync-IPv4-identity-mapping (14120) 1m13s ago never 0 no error
sync-IPv4-identity-mapping (21092) 1m13s ago never 0 no error
sync-IPv4-identity-mapping (40104) 1m13s ago never 0 no error
I’m not sure that the endpoints of sync-IPv4-identity-mapping may work for something… when fails occur.
I attach the failure log below when I updated a metadata of Service. (Log in rsync update situation is too much…)
2018-08-30T10:51:26.251674603+09:00 level=debug msg="Received service update" k8sApiVersion= k8sNamespace=default k8sNamespace.old=default k8sSvcName=nodejs-simple-server k8sSvcName.old=nodejs-simple-server k8sSvcType=ClusterIP k8sSvcType.old=ClusterIP subsys=daemon
2018-08-30T10:51:26.25219497+09:00 level=debug msg="created new L3n4Addr" ipAddr="{10.108.193.145 {TCP 3000}}" subsys=loadbalancer
2018-08-30T10:51:26.252481502+09:00 level=debug msg="Resolving service" l3n4Addr="{IP:10.108.193.145 L4Addr:{Protocol:TCP Port:3000}}" subsys=service
2018-08-30T10:51:26.252546353+09:00 level=debug msg="Got feAddr ID for service" k8sNamespace=default k8sSvcName=nodejs-simple-server obj="{ServiceName:nodejs-simple-server Namespace:default}" serviceID=28 serviceName= subsys=daemon
2018-08-30T10:51:26.253541481+09:00 level=debug msg="created new L3n4Addr" ipAddr="{10.108.193.145 {TCP 3000}}" subsys=loadbalancer
2018-08-30T10:51:26.2544059+09:00 level=debug msg="adding service" obj="[{L3n4Addr:{IP:192.168.2.20 L4Addr:{Protocol:TCP Port:3000}} Weight:0} {L3n4Addr:{IP:192.168.2.226 L4Addr:{Protocol:TCP Port:3000}} Weight:0} {L3n4Addr:{IP:192.168.2.246 L4Addr:{Protocol:TCP Port:3000}} Weight:0} {L3n4Addr:{IP:192.168.3.110 L4Addr:{Protocol:TCP Port:3000}} Weight:0} {L3n4A
ddr:{IP:192.168.3.200 L4Addr:{Protocol:TCP Port:3000}} Weight:0} {L3n4Addr:{IP:192.168.3.79 L4Addr:{Protocol:TCP Port:3000}} Weight:0} {L3n4Addr:{IP:192.168.2.27 L4Addr:{Protocol:TCP Port:3000}} Weight:0} {L3n4Addr:{IP:192.168.3.201 L4Addr:{Protocol:TCP Port:3000}} Weight:0} {L3n4Addr:{IP:192.168.3.208 L4Addr:{Protocol:TCP Port:3000}} Weight:0} {L3n4Addr:{IP:1
92.168.3.58 L4Addr:{Protocol:TCP Port:3000}} Weight:0}]" serviceID="10.108.193.145:3000" subsys=daemon
2018-08-30T10:51:26.255550825+09:00 level=debug msg="converting Cilium load-balancer service (frontend -> backend(s)) into BPF service" lbBackend="[{{192.168.2.20 {TCP 3000}} 0} {{192.168.2.226 {TCP 3000}} 0} {{192.168.2.246 {TCP 3000}} 0} {{192.168.3.110 {TCP 3000}} 0} {{192.168.3.200 {TCP 3000}} 0} {{192.168.3.79 {TCP 3000}} 0} {{192.168.2.27 {TCP 3000}} 0}
{{192.168.3.201 {TCP 3000}} 0} {{192.168.3.208 {TCP 3000}} 0} {{192.168.3.58 {TCP 3000}} 0}]" lbFrontend="10.108.193.145:3000" subsys=map-lb
2018-08-30T10:51:26.255590375+09:00 level=debug msg="converting L3n4Addr to ServiceKey" l3n4AddrID="{{10.108.193.145 {TCP 3000}} 28}" subsys=map-lb
2018-08-30T10:51:26.25665206+09:00 level=debug msg="associating frontend -> backend" lbBackend="192.168.2.20:3000 (28)" lbFrontend="10.108.193.145:3000" subsys=map-lb
2018-08-30T10:51:26.257316963+09:00 level=debug msg="associating frontend -> backend" lbBackend="192.168.2.226:3000 (28)" lbFrontend="10.108.193.145:3000" subsys=map-lb
2018-08-30T10:51:26.257938189+09:00 level=debug msg="associating frontend -> backend" lbBackend="192.168.2.246:3000 (28)" lbFrontend="10.108.193.145:3000" subsys=map-lb
2018-08-30T10:51:26.258599388+09:00 level=debug msg="associating frontend -> backend" lbBackend="192.168.3.110:3000 (28)" lbFrontend="10.108.193.145:3000" subsys=map-lb
2018-08-30T10:51:26.258627241+09:00 level=debug msg="associating frontend -> backend" lbBackend="192.168.3.200:3000 (28)" lbFrontend="10.108.193.145:3000" subsys=map-lb
2018-08-30T10:51:26.258668114+09:00 level=debug msg="associating frontend -> backend" lbBackend="192.168.3.79:3000 (28)" lbFrontend="10.108.193.145:3000" subsys=map-lb
2018-08-30T10:51:26.258951796+09:00 level=debug msg="associating frontend -> backend" lbBackend="192.168.2.27:3000 (28)" lbFrontend="10.108.193.145:3000" subsys=map-lb
2018-08-30T10:51:26.260034233+09:00 level=debug msg="associating frontend -> backend" lbBackend="192.168.3.201:3000 (28)" lbFrontend="10.108.193.145:3000" subsys=map-lb
2018-08-30T10:51:26.26018491+09:00 level=debug msg="associating frontend -> backend" lbBackend="192.168.3.208:3000 (28)" lbFrontend="10.108.193.145:3000" subsys=map-lb
2018-08-30T10:51:26.261117951+09:00 level=debug msg="associating frontend -> backend" lbBackend="192.168.3.58:3000 (28)" lbFrontend="10.108.193.145:3000" subsys=map-lb
2018-08-30T10:51:26.261236444+09:00 level=debug msg="converted LBSVC (frontend -> backend(s)), to Service Key and Value" lbBackend="[{{192.168.2.20 {TCP 3000}} 0} {{192.168.2.226 {TCP 3000}} 0} {{192.168.2.246 {TCP 3000}} 0} {{192.168.3.110 {TCP 3000}} 0} {{192.168.3.200 {TCP 3000}} 0} {{192.168.3.79 {TCP 3000}} 0} {{192.168.2.27 {TCP 3000}} 0} {{192.168.3.201
{TCP 3000}} 0} {{192.168.3.208 {TCP 3000}} 0} {{192.168.3.58 {TCP 3000}} 0}]" lbFrontend="10.108.193.145:3000" obj="[192.168.2.20:3000 (28) 192.168.2.226:3000 (28) 192.168.2.246:3000 (28) 192.168.3.110:3000 (28) 192.168.3.200:3000 (28) 192.168.3.79:3000 (28) 192.168.2.27:3000 (28) 192.168.3.201:3000 (28) 192.168.3.208:3000 (28) 192.168.3.58:3000 (28)]" servic
eID="10.108.193.145:3000" subsys=map-lb
2018-08-30T10:51:26.261605181+09:00 level=debug msg="adding service to BPF maps" serviceName="10.108.193.145:3000" subsys=daemon
2018-08-30T10:51:26.261704926+09:00 level=debug msg="deleting service from BPF maps" serviceName="10.108.193.145:3000" subsys=daemon
2018-08-30T10:51:26.262970532+09:00 level=debug msg="deleting backend # for slave ServiceKey" idx.backend=10 key="10.108.193.145:3000" subsys=daemon
2018-08-30T10:51:26.263020875+09:00 level=debug msg="deleting backend # for slave ServiceKey" idx.backend=9 key="10.108.193.145:3000" subsys=daemon
2018-08-30T10:51:26.263190085+09:00 level=debug msg="deleting backend # for slave ServiceKey" idx.backend=8 key="10.108.193.145:3000" subsys=daemon
2018-08-30T10:51:26.263214564+09:00 level=debug msg="deleting backend # for slave ServiceKey" idx.backend=7 key="10.108.193.145:3000" subsys=daemon
2018-08-30T10:51:26.264299165+09:00 level=debug msg="deleting backend # for slave ServiceKey" idx.backend=6 key="10.108.193.145:3000" subsys=daemon
2018-08-30T10:51:26.264515412+09:00 level=debug msg="deleting backend # for slave ServiceKey" idx.backend=5 key="10.108.193.145:3000" subsys=daemon
2018-08-30T10:51:26.264586494+09:00 level=debug msg="deleting backend # for slave ServiceKey" idx.backend=4 key="10.108.193.145:3000" subsys=daemon
2018-08-30T10:51:26.266019287+09:00 level=debug msg="deleting backend # for slave ServiceKey" idx.backend=3 key="10.108.193.145:3000" subsys=daemon
2018-08-30T10:51:26.266081834+09:00 level=debug msg="deleting backend # for slave ServiceKey" idx.backend=2 key="10.108.193.145:3000" subsys=daemon
2018-08-30T10:51:26.266096899+09:00 level=debug msg="deleting backend # for slave ServiceKey" idx.backend=1 key="10.108.193.145:3000" subsys=daemon
2018-08-30T10:51:26.266407345+09:00 level=debug msg="done deleting service slaves, now deleting master service" serviceID=28 subsys=daemon
2018-08-30T10:51:26.266426521+09:00 level=debug msg="deleting service from loadbalancer" serviceName="10.108.193.145:3000" sha=e3d018bbcc14ecb10372879557e2ddae5197339945947a94ac84eba3d03e2dd9 subsys=loadbalancer
2018-08-30T10:51:26.26750007+09:00 level=debug msg="adding frontend for backend to BPF maps" backend="192.168.2.20:3000 (28)" frontend="10.108.193.145:3000" subsys=map-lb
2018-08-30T10:51:26.267731167+09:00 level=debug msg="adding frontend for backend to BPF maps" backend="192.168.2.226:3000 (28)" frontend="10.108.193.145:3000" subsys=map-lb
2018-08-30T10:51:26.268145215+09:00 level=debug msg="Controller func execution time: 38.324922ms" name="sync-to-k8s-ciliumendpoint (54767)" subsys=controller uuid=b9ab5bbd-abf5-11e8-a21f-fa163eed3276
2018-08-30T10:51:26.268186103+09:00 level=debug msg="adding frontend for backend to BPF maps" backend="192.168.2.246:3000 (28)" frontend="10.108.193.145:3000" subsys=map-lb
2018-08-30T10:51:26.268764817+09:00 level=debug msg="adding frontend for backend to BPF maps" backend="192.168.3.110:3000 (28)" frontend="10.108.193.145:3000" subsys=map-lb
2018-08-30T10:51:26.269315069+09:00 level=debug msg="adding frontend for backend to BPF maps" backend="192.168.3.200:3000 (28)" frontend="10.108.193.145:3000" subsys=map-lb
2018-08-30T10:51:26.270186567+09:00 level=debug msg="adding frontend for backend to BPF maps" backend="192.168.3.79:3000 (28)" frontend="10.108.193.145:3000" subsys=map-lb
2018-08-30T10:51:26.270231496+09:00 level=debug msg="adding frontend for backend to BPF maps" backend="192.168.2.27:3000 (28)" frontend="10.108.193.145:3000" subsys=map-lb
2018-08-30T10:51:26.270247677+09:00 level=debug msg="adding frontend for backend to BPF maps" backend="192.168.3.201:3000 (28)" frontend="10.108.193.145:3000" subsys=map-lb
2018-08-30T10:51:26.270285331+09:00 level=debug msg="adding frontend for backend to BPF maps" backend="192.168.3.208:3000 (28)" frontend="10.108.193.145:3000" subsys=map-lb
2018-08-30T10:51:26.270294253+09:00 level=debug msg="adding frontend for backend to BPF maps" backend="192.168.3.58:3000 (28)" frontend="10.108.193.145:3000" subsys=map-lb
2018-08-30T10:51:26.270300317+09:00 level=debug msg="adding revNat to lbmap" bpfMapKey=28 bpfMapValue="10.108.193.145:3000" subsys=map-lb
2018-08-30T10:51:26.270312506+09:00 level=debug msg="adding frontend for backend to BPF maps" backend="0.0.0.0:0 (0)" frontend="10.108.193.145:3000" subsys=map-lb
2018-08-30T10:51:26.270319363+09:00 level=debug msg="adding service to RevNATMap" serviceName="10.108.193.145:3000" subsys=daemon
2018-08-30T10:51:26.270326234+09:00 level=debug msg="adding service to loadbalancer" serviceName="10.108.193.145:3000" sha=e3d018bbcc14ecb10372879557e2ddae5197339945947a94ac84eba3d03e2dd9 subsys=loadbalancer
2018-08-30T10:51:26.338106017+09:00 level=debug msg="created new L3n4Addr" ipAddr="{10.108.193.145 {TCP 3000}}" subsys=loadbalancer
2018-08-30T10:51:26.338195762+09:00 level=debug msg="adding service" obj="[{L3n4Addr:{IP:192.168.2.226 L4Addr:{Protocol:TCP Port:3000}} Weight:0} {L3n4Addr:{IP:192.168.3.201 L4Addr:{Protocol:TCP Port:3000}} Weight:0} {L3n4Addr:{IP:192.168.3.208 L4Addr:{Protocol:TCP Port:3000}} Weight:0} {L3n4Addr:{IP:192.168.2.20 L4Addr:{Protocol:TCP Port:3000}} Weight:0} {L3n
4Addr:{IP:192.168.2.246 L4Addr:{Protocol:TCP Port:3000}} Weight:0} {L3n4Addr:{IP:192.168.2.27 L4Addr:{Protocol:TCP Port:3000}} Weight:0} {L3n4Addr:{IP:192.168.3.110 L4Addr:{Protocol:TCP Port:3000}} Weight:0} {L3n4Addr:{IP:192.168.3.200 L4Addr:{Protocol:TCP Port:3000}} Weight:0} {L3n4Addr:{IP:192.168.3.58 L4Addr:{Protocol:TCP Port:3000}} Weight:0} {L3n4Addr:{IP
:192.168.3.79 L4Addr:{Protocol:TCP Port:3000}} Weight:0}]" serviceID="10.108.193.145:3000" subsys=daemon
2018-08-30T10:51:26.338195762+09:00 level=debug msg="adding service" obj="[{L3n4Addr:{IP:192.168.2.226 L4Addr:{Protocol:TCP Port:3000}} Weight:0} {L3n4Addr:{IP:192.168.3.201 L4Addr:{Protocol:TCP Port:3000}} Weight:0} {L3n4Addr:{IP:192.168.3.208 L4Addr:{Protocol:TCP Port:3000}} Weight:0} {L3n4Addr:{IP:192.168.2.20 L4Addr:{Protocol:TCP Port:3000}} Weigh[61/1103]
4Addr:{IP:192.168.2.246 L4Addr:{Protocol:TCP Port:3000}} Weight:0} {L3n4Addr:{IP:192.168.2.27 L4Addr:{Protocol:TCP Port:3000}} Weight:0} {L3n4Addr:{IP:192.168.3.110 L4Addr:{Protocol:TCP Port:3000}} Weight:0} {L3n4Addr:{IP:192.168.3.200 L4Addr:{Protocol:TCP Port:3000}} Weight:0} {L3n4Addr:{IP:192.168.3.58 L4Addr:{Protocol:TCP Port:3000}} Weight:0} {L3n4Addr:{IP
:192.168.3.79 L4Addr:{Protocol:TCP Port:3000}} Weight:0}]" serviceID="10.108.193.145:3000" subsys=daemon
2018-08-30T10:51:26.339749706+09:00 level=debug msg="converting Cilium load-balancer service (frontend -> backend(s)) into BPF service" lbBackend="[{{192.168.2.226 {TCP 3000}} 0} {{192.168.3.201 {TCP 3000}} 0} {{192.168.3.208 {TCP 3000}} 0} {{192.168.2.20 {TCP 3000}} 0} {{192.168.2.246 {TCP 3000}} 0} {{192.168.2.27 {TCP 3000}} 0} {{192.168.3.110 {TCP 3000}} 0}
{{192.168.3.200 {TCP 3000}} 0} {{192.168.3.58 {TCP 3000}} 0} {{192.168.3.79 {TCP 3000}} 0}]" lbFrontend="10.108.193.145:3000" subsys=map-lb
2018-08-30T10:51:26.339788476+09:00 level=debug msg="converting L3n4Addr to ServiceKey" l3n4AddrID="{{10.108.193.145 {TCP 3000}} 28}" subsys=map-lb
2018-08-30T10:51:26.340858808+09:00 level=debug msg="associating frontend -> backend" lbBackend="192.168.2.226:3000 (28)" lbFrontend="10.108.193.145:3000" subsys=map-lb
2018-08-30T10:51:26.340878376+09:00 level=debug msg="associating frontend -> backend" lbBackend="192.168.3.201:3000 (28)" lbFrontend="10.108.193.145:3000" subsys=map-lb
2018-08-30T10:51:26.340947532+09:00 level=debug msg="associating frontend -> backend" lbBackend="192.168.3.208:3000 (28)" lbFrontend="10.108.193.145:3000" subsys=map-lb
2018-08-30T10:51:26.341053407+09:00 level=debug msg="associating frontend -> backend" lbBackend="192.168.2.20:3000 (28)" lbFrontend="10.108.193.145:3000" subsys=map-lb
2018-08-30T10:51:26.342233022+09:00 level=debug msg="associating frontend -> backend" lbBackend="192.168.2.246:3000 (28)" lbFrontend="10.108.193.145:3000" subsys=map-lb
2018-08-30T10:51:26.342456951+09:00 level=debug msg="associating frontend -> backend" lbBackend="192.168.2.27:3000 (28)" lbFrontend="10.108.193.145:3000" subsys=map-lb
2018-08-30T10:51:26.343052373+09:00 level=debug msg="associating frontend -> backend" lbBackend="192.168.3.110:3000 (28)" lbFrontend="10.108.193.145:3000" subsys=map-lb
2018-08-30T10:51:26.343225834+09:00 level=debug msg="associating frontend -> backend" lbBackend="192.168.3.200:3000 (28)" lbFrontend="10.108.193.145:3000" subsys=map-lb
2018-08-30T10:51:26.34425476+09:00 level=debug msg="associating frontend -> backend" lbBackend="192.168.3.58:3000 (28)" lbFrontend="10.108.193.145:3000" subsys=map-lb
2018-08-30T10:51:26.344278839+09:00 level=debug msg="associating frontend -> backend" lbBackend="192.168.3.79:3000 (28)" lbFrontend="10.108.193.145:3000" subsys=map-lb
2018-08-30T10:51:26.345098522+09:00 level=debug msg="converted LBSVC (frontend -> backend(s)), to Service Key and Value" lbBackend="[{{192.168.2.226 {TCP 3000}} 0} {{192.168.3.201 {TCP 3000}} 0} {{192.168.3.208 {TCP 3000}} 0} {{192.168.2.20 {TCP 3000}} 0} {{192.168.2.246 {TCP 3000}} 0} {{192.168.2.27 {TCP 3000}} 0} {{192.168.3.110 {TCP 3000}} 0} {{192.168.3.20
0 {TCP 3000}} 0} {{192.168.3.58 {TCP 3000}} 0} {{192.168.3.79 {TCP 3000}} 0}]" lbFrontend="10.108.193.145:3000" obj="[192.168.2.226:3000 (28) 192.168.3.201:3000 (28) 192.168.3.208:3000 (28) 192.168.2.20:3000 (28) 192.168.2.246:3000 (28) 192.168.2.27:3000 (28) 192.168.3.110:3000 (28) 192.168.3.200:3000 (28) 192.168.3.58:3000 (28) 192.168.3.79:3000 (28)]" servic
eID="10.108.193.145:3000" subsys=map-lb
2018-08-30T10:51:26.345132353+09:00 level=debug msg="adding service to BPF maps" serviceName="10.108.193.145:3000" subsys=daemon
2018-08-30T10:51:26.345254135+09:00 level=debug msg="deleting service from BPF maps" serviceName="10.108.193.145:3000" subsys=daemon
2018-08-30T10:51:26.345344651+09:00 level=debug msg="deleting backend # for slave ServiceKey" idx.backend=10 key="10.108.193.145:3000" subsys=daemon
2018-08-30T10:51:26.346302941+09:00 level=debug msg="deleting backend # for slave ServiceKey" idx.backend=9 key="10.108.193.145:3000" subsys=daemon
2018-08-30T10:51:26.346445159+09:00 level=debug msg="deleting backend # for slave ServiceKey" idx.backend=8 key="10.108.193.145:3000" subsys=daemon
2018-08-30T10:51:26.347350062+09:00 level=debug msg="deleting backend # for slave ServiceKey" idx.backend=7 key="10.108.193.145:3000" subsys=daemon
2018-08-30T10:51:26.347397172+09:00 level=debug msg="deleting backend # for slave ServiceKey" idx.backend=6 key="10.108.193.145:3000" subsys=daemon
2018-08-30T10:51:26.347771784+09:00 level=debug msg="deleting backend # for slave ServiceKey" idx.backend=5 key="10.108.193.145:3000" subsys=daemon
2018-08-30T10:51:26.347927173+09:00 level=debug msg="deleting backend # for slave ServiceKey" idx.backend=4 key="10.108.193.145:3000" subsys=daemon
2018-08-30T10:51:26.348671538+09:00 level=debug msg="deleting backend # for slave ServiceKey" idx.backend=3 key="10.108.193.145:3000" subsys=daemon
2018-08-30T10:51:26.348726393+09:00 level=debug msg="deleting backend # for slave ServiceKey" idx.backend=2 key="10.108.193.145:3000" subsys=daemon
2018-08-30T10:51:26.348751481+09:00 level=debug msg="deleting backend # for slave ServiceKey" idx.backend=1 key="10.108.193.145:3000" subsys=daemon
2018-08-30T10:51:26.348989148+09:00 level=debug msg="done deleting service slaves, now deleting master service" serviceID=28 subsys=daemon
2018-08-30T10:51:26.349724567+09:00 level=debug msg="deleting service from loadbalancer" serviceName="10.108.193.145:3000" sha=e3d018bbcc14ecb10372879557e2ddae5197339945947a94ac84eba3d03e2dd9 subsys=loadbalancer
2018-08-30T10:51:26.349749492+09:00 level=debug msg="adding frontend for backend to BPF maps" backend="192.168.2.226:3000 (28)" frontend="10.108.193.145:3000" subsys=map-lb
2018-08-30T10:51:26.349755895+09:00 level=debug msg="adding frontend for backend to BPF maps" backend="192.168.3.201:3000 (28)" frontend="10.108.193.145:3000" subsys=map-lb
2018-08-30T10:51:26.349780394+09:00 level=debug msg="adding frontend for backend to BPF maps" backend="192.168.3.208:3000 (28)" frontend="10.108.193.145:3000" subsys=map-lb
2018-08-30T10:51:26.349785501+09:00 level=debug msg="adding frontend for backend to BPF maps" backend="192.168.2.20:3000 (28)" frontend="10.108.193.145:3000" subsys=map-lb
2018-08-30T10:51:26.349795768+09:00 level=debug msg="adding frontend for backend to BPF maps" backend="192.168.2.246:3000 (28)" frontend="10.108.193.145:3000" subsys=map-lb
2018-08-30T10:51:26.349851317+09:00 level=debug msg="adding frontend for backend to BPF maps" backend="192.168.2.27:3000 (28)" frontend="10.108.193.145:3000" subsys=map-lb
2018-08-30T10:51:26.349857436+09:00 level=debug msg="adding frontend for backend to BPF maps" backend="192.168.3.110:3000 (28)" frontend="10.108.193.145:3000" subsys=map-lb
2018-08-30T10:51:26.349866389+09:00 level=debug msg="adding frontend for backend to BPF maps" backend="192.168.3.200:3000 (28)" frontend="10.108.193.145:3000" subsys=map-lb
2018-08-30T10:51:26.3498754+09:00 level=debug msg="adding frontend for backend to BPF maps" backend="192.168.3.58:3000 (28)" frontend="10.108.193.145:3000" subsys=map-lb
2018-08-30T10:51:26.349913218+09:00 level=debug msg="adding frontend for backend to BPF maps" backend="192.168.3.79:3000 (28)" frontend="10.108.193.145:3000" subsys=map-lb
2018-08-30T10:51:26.349936102+09:00 level=debug msg="adding revNat to lbmap" bpfMapKey=28 bpfMapValue="10.108.193.145:3000" subsys=map-lb
2018-08-30T10:51:26.349945553+09:00 level=debug msg="adding frontend for backend to BPF maps" backend="0.0.0.0:0 (0)" frontend="10.108.193.145:3000" subsys=map-lb
2018-08-30T10:51:26.34997483+09:00 level=debug msg="adding service to RevNATMap" serviceName="10.108.193.145:3000" subsys=daemon
2018-08-30T10:51:26.349985865+09:00 level=debug msg="adding service to loadbalancer" serviceName="10.108.193.145:3000" sha=e3d018bbcc14ecb10372879557e2ddae5197339945947a94ac84eba3d03e2dd9 subsys=loadbalancer
Thank you for your fast reply last time. I always expect Cilium and ready for testing and using it! If you have any question or need more tests, then please tell me anytime.
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 2
- Comments: 16 (13 by maintainers)
Commits related to this issue
- lbmap: Guarantee order of backends while scaling service This commits resolves inconsistency when updating loadbalancer BPF map entries. The datapath relies on a consistent mapping of backend index t... — committed to cilium/cilium by tgraf 6 years ago
- lbmap: Guarantee order of backends while scaling service This commits resolves inconsistency when updating loadbalancer BPF map entries. The datapath relies on a consistent mapping of backend index t... — committed to cilium/cilium by tgraf 6 years ago
- lbmap: Guarantee order of backends while scaling service [ upstream commit 7e2f9149ccdcf8300959f4b53eeb0aa6ae567f3e ] This commits resolves inconsistency when updating loadbalancer BPF map entries. ... — committed to cilium/cilium by tgraf 6 years ago
- lbmap: Guarantee order of backends while scaling service [ upstream commit 7e2f9149ccdcf8300959f4b53eeb0aa6ae567f3e ] This commits resolves inconsistency when updating loadbalancer BPF map entries. ... — committed to cilium/cilium by tgraf 6 years ago
- lbmap: Guarantee order of backends while scaling service [ upstream commit 7e2f9149ccdcf8300959f4b53eeb0aa6ae567f3e ] This commits resolves inconsistency when updating loadbalancer BPF map entries. ... — committed to cilium/cilium by tgraf 6 years ago
- lbmap: Guarantee order of backends while scaling service [ upstream commit 7e2f9149ccdcf8300959f4b53eeb0aa6ae567f3e ] This commits resolves inconsistency when updating loadbalancer BPF map entries. ... — committed to cilium/cilium by tgraf 6 years ago
- lbmap: Guarantee order of backends while scaling service [ upstream commit 7e2f9149ccdcf8300959f4b53eeb0aa6ae567f3e ] This commits resolves inconsistency when updating loadbalancer BPF map entries. ... — committed to cilium/cilium by tgraf 6 years ago
- lbmap: Guarantee order of backends while scaling service [ upstream commit 7e2f9149ccdcf8300959f4b53eeb0aa6ae567f3e ] This commits resolves inconsistency when updating loadbalancer BPF map entries. ... — committed to cilium/cilium by tgraf 6 years ago
Awesome. Thanks for testing!
@aanm @tgraf @ianvernon
Hi, first of all, thanks for great project. I’m testing it with @eldin77. We’re running a test with https://github.com/cilium/cilium/commit/05e99f3124e81e1c697a547327908bcb6c33c5d8 commit, and it contains #5469, #5475. As we continue to test with locust, I think the problem is narrowing down overall so I want to share our updated testing result.
Unfortunately, still connection failures occur every five minutes. The connection fail message looks like this:
But now I think the
k8s service
related problem is solved. I mean below thing was a problem, but now the problem is solved.metadata.annotations
of the service by hand to reproduce this. We are testing this assuming that the load balancer content(e.g. feip, feport, beip, beport) does not change but service updated. That is, it is a method to call the following function without changing the service. But there is no connection failure anymore with this.But I do not have a problem with the service, so the other problem seems to be clear. Previously, it seemed that service resync and endpoint resync were executed at the same time and could not be distinguished. I think the problem below is left.
metadata.annotations
like above, you will be able to reproduce the same connection failures by hands.I hope this comment helps solve this problem. Thank you very much.
@leoh0 @eldin77 #5502 is providing a fix for the issue. We are testing this with a streaming connection while updating the annotations in parallel. Would be great if you could confirm that this fixes the issue for you as well. We have built a custom dev image
cilium/cilium-dev:v1.2.2-rc2
to include this fix.@tgraf
I attached a link to the results generated when I updated the
metadata.annotations
on thek8s nodejs-simple-server endpoint
. And currently all worker requests HTTP GET to http://nodejs-simple-server.pt.svc.cluster.local:3000 (which is 10.108.27.179:3000).I think you can check the RST packets below
cilium monitor
gist link.whole
cilium monitor
command log https://gist.github.com/leoh0/46e00a0e1b2e3249a8bcba0c6b8d9776I do not know which interface to catch by what options, so I took it as a whole, but I did not see the RST packet. whole
tcpdump -ennvv
log inside cilium-agent https://gist.github.com/leoh0/3bd88d3cac7d4e71d19dc3d1e549a487Additional resources include:
cilium-agent
log https://gist.github.com/leoh0/14e1a2a580fbe55b86cd300110d61d2ccilium debuginfo
command output https://gist.github.com/leoh0/d440fd0c5911674dc43d424cf2a8857bk8s service, deployment, endpoint
yaml https://gist.github.com/leoh0/27221081f4d9ca6fef0bda3750e0c7aecilium-agent daemonset
yaml https://gist.github.com/leoh0/d8e57bb4f967bccade8ab90b35925220@tgraf Great! I checked all the existing problems for a few hours and confirmed that
cilium/cilium-dev:v1.2.2-rc2
had no problems at all.