cilium: Conntrack entry mismatch leads to policy enforcement on reply packet for service loopback case

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

Steps to reproduce:

minikube start --network-plugin=cni --cni=false --memory=4096 --kubernetes-version=v1.22.8

cilium install --version=v1.10.5

kubectl apply -f - <<EOF
apiVersion: v1
kind: ReplicationController
metadata:
  name: guestbook
  labels:
    k8s-app.guestbook: web
spec:
  replicas: 1
  selector:
    k8s-app.guestbook: web
  template:
    metadata:
      labels:
        k8s-app.guestbook: web
    spec:
      containers:
      - image: gcr.io/google-samples/gb-frontend:v6
        name: guestbook
        ports:
        - containerPort: 80
          name: http-server
          protocol: TCP
---
apiVersion: v1
kind: Service
metadata:
  name: guestbook
  labels:
    k8s-app.guestbook: web
spec:
  ports:
  - port: 81
    protocol: TCP
    targetPort: http-server
  selector:
    k8s-app.guestbook: web
  type: ClusterIP
EOF

kubectl apply -f - <<EOF
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: allow-ingress-on-tcp-80
specs:
- endpointSelector:
    matchLabels:
      k8s-app.guestbook: web
  ingress:
  - toPorts:
    - ports:
      - port: "80"
        protocol: TCP
EOF

It is important that the service port (81) is different from the container port (80). Issue does not occur if they are the same.

$ kubectl get svc guestbook
NAME        TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
guestbook   ClusterIP   10.98.138.60   <none>        81/TCP    42m
$ kubectl get pods -o wide --selector=k8s-app.guestbook=web
NAME              READY   STATUS    RESTARTS   AGE   IP           NODE       NOMINATED NODE   READINESS GATES
guestbook-8b46v   1/1     Running   0          43m   10.0.0.208   minikube   <none>           <none>

Send traffic to itself via cluster IP:

$ kubectl exec guestbook-8b46v -- curl 10.98.138.60:81

Cilium monitor output:

$ cilium monitor --related-to 1221
Press Ctrl-C to quit
level=info msg="Initializing dissection cache..." subsys=monitor
-> endpoint 1221 flow 0x95218252 identity 22887->22887 state established ifindex lxccbaae0c0d570 orig-ip 169.254.42.1: 169.254.42.1:55518 -> 10.0.0.208:80 tcp SYN
Policy verdict log: flow 0xfef8e31c local EP ID 1221, remote ID 22887, proto 6, ingress, action deny, match none, 10.98.138.60:81 -> 10.0.0.208:55518 tcp SYN, ACK
xx drop (Policy denied) flow 0xfef8e31c to endpoint 1221, identity 22887->22887: 10.98.138.60:81 -> 10.0.0.208:55518 tcp SYN, ACK
Policy verdict log: flow 0x8caf00f9 local EP ID 1221, remote ID 22887, proto 6, ingress, action deny, match none, 10.98.138.60:81 -> 10.0.0.208:55518 tcp SYN, ACK

As per my understanding, a pod sending packets to itself via cluster IP should have skipped policy enforcement all together. Although, the first packet (pod -> itself via cluster IP) is skipped policy enforcement, the reply packet is not skipped (incorrectly subjected to policy enforcement). This is happening because the reply packet do not match any conntrack entries.

$ cilium bpf ct list global | grep 10.0.0.208
TCP IN 169.254.42.1:55522 -> 10.0.0.208:80 expires=17529096 RxPackets=3 RxBytes=222 RxFlagsSeen=0x02 LastRxReport=17529035 TxPackets=6 TxBytes=444 TxFlagsSeen=0x12 LastTxReport=17529035 Flags=0x0008 [ LBLoopback ] RevNAT=7 SourceSecurityID=22887 IfIndex=0
TCP OUT 10.0.0.208:55522 -> 10.98.138.60:80 expires=17529096 RxPackets=0 RxBytes=0 RxFlagsSeen=0x00 LastRxReport=0 TxPackets=3 TxBytes=222 TxFlagsSeen=0x02 LastTxReport=17529035 Flags=0x0008 [ LBLoopback ] RevNAT=7 SourceSecurityID=22887 IfIndex=0
TCP OUT 10.98.138.60:81 -> 10.0.0.208:55522 service expires=17529096 RxPackets=0 RxBytes=7 RxFlagsSeen=0x00 LastRxReport=0 TxPackets=0 TxBytes=0 TxFlagsSeen=0x02 LastTxReport=17529035 Flags=0x0000 [ ] RevNAT=7 SourceSecurityID=0 IfIndex=0
ICMP OUT 10.0.0.208:0 -> 10.98.138.60:0 related expires=17529093 RxPackets=0 RxBytes=0 RxFlagsSeen=0x00 LastRxReport=0 TxPackets=1 TxBytes=74 TxFlagsSeen=0x02 LastTxReport=17529035 Flags=0x0018 [ LBLoopback SeenNonSyn ] RevNAT=7 SourceSecurityID=22887 IfIndex=0

As per my understanding, the second entry is causing the problem: 10.0.0.208:55522 -> 10.98.138.60:80 - the destination IP address is the Cluster IP but the port is container port (80) instead of service port (81).

The reply packet on the other hand has 10.98.138.60:81 -> 10.0.0.208:55518 which does not match the conntrack entry and is incorrectly subjected to network policy enforcement.

Cilium Version

1.10.5

Kernel Version

x86_64

Kubernetes Version

1.22.8

Sysdump

cilium-sysdump-20220411-230655.zip

Relevant log output

No response

Anything else?

No response

Code of Conduct

  • I agree to follow this project’s Code of Conduct

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 18 (15 by maintainers)

Most upvoted comments

The issue is resolved on cilium 1.12.15 so most likely fixed by https://github.com/cilium/cilium/pull/27798

-> endpoint 2244 flow 0xabe2de97 , identity 43665->43665 state new ifindex lxc54076bf7fdd8 orig-ip 169.254.42.1: 169.254.42.1:51850 -> 10.244.1.210:8080 tcp SYN
-> endpoint 2244 flow 0xabe2de97 , identity 43665->43665 state established ifindex lxc54076bf7fdd8 orig-ip 169.254.42.1: 169.254.42.1:51850 -> 10.244.1.210:8080 tcp ACK
-> endpoint 2244 flow 0xabe2de97 , identity 43665->43665 state established ifindex lxc54076bf7fdd8 orig-ip 169.254.42.1: 169.254.42.1:51850 -> 10.244.1.210:8080 tcp ACK
-> endpoint 2244 flow 0xabe2de97 , identity 43665->43665 state established ifindex lxc54076bf7fdd8 orig-ip 169.254.42.1: 169.254.42.1:51850 -> 10.244.1.210:8080 tcp ACK
-> endpoint 2244 flow 0xabe2de97 , identity 43665->43665 state established ifindex lxc54076bf7fdd8 orig-ip 169.254.42.1: 169.254.42.1:51850 -> 10.244.1.210:8080 tcp ACK
-> endpoint 2244 flow 0xabe2de97 , identity 43665->43665 state established ifindex lxc54076bf7fdd8 orig-ip 169.254.42.1: 169.254.42.1:51850 -> 10.244.1.210:8080 tcp ACK, FIN
-> endpoint 2244 flow 0xabe2de97 , identity world->43665 state established ifindex 0 orig-ip 169.254.42.1: 169.254.42.1:51850 -> 10.244.1.210:8080 tcp ACK, FIN

If I understand the issue correctly, we already have a test case to verify this scenario - https://github.com/cilium/cilium/blob/master/test/k8s/services.go#L170-L170.

It is important that the service port (81) is different from the container port (80). Issue does not occur if they are the same.

If this is indeed relevant, we should extend the test case. @skmatti Here is context for how we skip the policy enforcement for the service loopback + policy enforcement case - https://github.com/cilium/cilium/commit/52cd6da139c1ac5d67de65a821f953c936034f2e. This hopefully helps in debugging/fixing the issue further.