cilium: Graceful shutdown not working when externalTrafficPolicy=Local or when replica count is 1

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

  1. Running in kube proxy replacement strict mode , with graceful termination enabled.
  2. Create a deployment and LoadBalancer service as per below examples
    • Service either has to have externalTrafficPolicy=Local, and only one pod is allowed to run on a node at a given time
    • Or if set to Cluster, the deployment must only have one replica running
  3. To simplify testing and debug, we connect directly to the nodePort for the service and bypass the LB,
    • send a message to verify connectivity
  4. Delete the pod gracefully
  5. Send another message over the existing connection whilst the pod is in a terminating state
  6. Client gets a TCP RST. A tcpdump shows the RST originating from the nodeIP:nodePort

I’ve looked at hubble, and monitor logs, and neither show the packet that triggers the RST, nor are any drops shown.

----
apiVersion: v1
kind: Service
metadata:
  name: tcp-echo-service
  namespace: default
  labels:
    app: test-echo-server
spec:
  externalTrafficPolicy: Local 
  type: LoadBalancer
  ports:
    - port: 443
      targetPort: 5001
      protocol: TCP
      name: tcp
  selector:
    app: test-echo-server
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: test-echo-server
  namespace: default
  labels:
    app: test-echo-server
spec:
  replicas: 1
  selector:
    matchLabels:
      app: test-echo-server
  template:
    metadata:
      labels:
        app: test-echo-server
    spec:
      terminationGracePeriodSeconds: 3600
      affinity:
        podAntiAffinity:                                 
          requiredDuringSchedulingIgnoredDuringExecution:   
          - topologyKey: kubernetes.io/hostname  
            labelSelector:                               
              matchLabels:                             
                app: test-echo-server
      containers:
        - name: echo-server
          image: "vhiribarren/echo-server:latest"
          ports:
            - name: tcp
              containerPort:  5001
              protocol: TCP
          lifecycle:
            preStop:
              exec: 
                command:
                  ['/bin/sh', '-c', 'echo preStop executing && sleep 3600']

Cilium Version

Client: 1.12.0 9447cd1 2022-07-19T12:22:00+02:00 go version go1.18.4 linux/amd64 Daemon: 1.12.0 9447cd1 2022-07-19T12:22:00+02:00 go version go1.18.4 linux/amd64

Kernel Version

Linux 5.15.0-46-generic #49~20.04.1-Ubuntu SMP Thu Aug 4 19:15:44 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Kubernetes Version

1.23.5

Sysdump

I can provide selective output from sysdump on request.

Relevant log output

The following output and the log files were gathered after starting termination of the test pod, and before sending a packet which triggers the RST.

//NodePort=4300
>>kubectl exec -it -n kube-system cilium-xg5nt -- cilium service list | grep 4300

Defaulted container "cilium-agent" out of: cilium-agent, mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), wait-for-node-init (init), clean-cilium-state (init)
                                                                                                                                                                                       431   10.241.93.59:4300      NodePort       1 => 172.18.2.158:5001 (terminating)
433   172.31.248.52:4300     NodePort       1 => 172.18.2.158:5001 (terminating)
435   0.0.0.0:4300           NodePort       1 => 172.18.2.158:5001 (terminating)
--------
// PodIP=172.18.2.158
>> kubectl exec -it -n kube-system cilium-xg5nt  -- cilium bpf ct list global | grep 172.18.2.158

Defaulted container "cilium-agent" out of: cilium-agent, mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), wait-for-node-init (init), clean-cilium-state (init)
TCP IN 10.148.82.115:62451 -> 172.18.2.158:5001 expires=7827090 RxPackets=8 RxBytes=555 RxFlagsSeen=0x1a LastRxReport=7805489 TxPackets=7 TxBytes=629 TxFlagsSeen=0x1a LastTxReport=7805489 Flags=0x0030 [ SeenNonSyn NodePort ] RevNAT=0 SourceSecurityID=2 IfIndex=0
TCP OUT 10.148.82.115:62451 -> 172.18.2.158:5001 expires=7827090 RxPackets=7 RxBytes=629 RxFlagsSeen=0x1a LastRxReport=7805489 TxPackets=8 TxBytes=555 TxFlagsSeen=0x1a LastTxReport=7805489 Flags=0x0030 [ SeenNonSyn NodePort ] RevNAT=431 SourceSecurityID=2 IfIndex=3
--------
// NodePort=4300
>>  kubectl exec -it -n kube-system cilium-xg5nt  -- cilium bpf ct list global | grep 4300

Defaulted container "cilium-agent" out of: cilium-agent, mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), wait-for-node-init (init), clean-cilium-state (init)
TCP OUT 10.241.93.59:4300 -> 10.148.82.115:62451 service expires=7827090 RxPackets=0 RxBytes=1444 RxFlagsSeen=0x00 LastRxReport=0 TxPackets=0 TxBytes=0 TxFlagsSeen=0x1a LastTxReport=7805489 Flags=0x0010 [ SeenNonSyn ] RevNAT=431 SourceSecurityID=0 IfIndex=0
TCP OUT 172.31.248.52:34300 -> 172.31.248.74:4240 expires=7827599 RxPackets=236902 RxBytes=19188898 RxFlagsSeen=0x1a LastRxReport=7805999 TxPackets=142137 TxBytes=11844706 TxFlagsSeen=0x1a LastTxReport=7805999 Flags=0x0010 [ SeenNonSyn ] RevNAT=0 SourceSecurityID=0 IfIndex=0
UDP OUT 172.20.0.3:53 -> 172.18.2.130:43007 service expires=7805992 RxPackets=0 RxBytes=162 RxFlagsSeen=0x00 LastRxReport=0 TxPackets=0 TxBytes=0 TxFlagsSeen=0x00 LastTxReport=7805932 Flags=0x0000 [ ] RevNAT=98 SourceSecurityID=0 IfIndex=0
UDP OUT 172.18.2.130:43007 -> 172.18.0.47:53 expires=7805992 RxPackets=1 RxBytes=172 RxFlagsSeen=0x00 LastRxReport=7805932 TxPackets=1 TxBytes=79 TxFlagsSeen=0x00 LastTxReport=7805932 Flags=0x0000 [ ] RevNAT=98 SourceSecurityID=59251 IfIndex=0
--------

Anything else?

endpoint.log agent-log.log filtered on endpointID=27

Code of Conduct

  • I agree to follow this project’s Code of Conduct

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 26 (12 by maintainers)

Most upvoted comments

for visibility, I think this may be solved by #24174 , but I don’t have time to test it, I can confirm that graceful shutdown is working fine with my patch, but I don’t know if there are edge cases

Thanks @aojea Im planning to test this soon, hopefully before the end of the week.

for visibility, I think this may be solved by https://github.com/cilium/cilium/pull/24174 , but I don’t have time to test it, I can confirm that graceful shutdown is working fine with my patch, but I don’t know if there are edge cases

I’m parking it in my queue. @jonahmurphy FYI: I don’t have cycles at the moment, but I’ll try to get to reproducing the issue when I have time.