kubernetes: ipvs proxier doesn't respect graceful termination

Is this a BUG REPORT or FEATURE REQUEST?: /kind bug

What happened: Upon removing an endpoint, the ipvs proxier immediately deletes the ipvs real server, causing all connections to get dropped.

What you expected to happen: It should allow the terminating pod to gracefully close connections, just like the iptables proxier.

How to reproduce it (as minimally and precisely as possible):

Enable ipvs proxier
Create a keepalive / long lived connection to a pod (e.g. while :; do echo -e "GET / HTTP/1.1\nhost: $host\n\n"; sleep 5; echo; done | telnet $serviceip 80)
Delete that pod - observe the connection gets closed immediately, further requests will fail. On iptables proxier, it will continue to work (until the pod itself stops or closes the connection).

Anything else we need to know?: The ipvs proxier should instead be setting weight to 0, then reaping the stale real servers after some time period (that should be greater than any pod’s graceful termination time). This may also fix the existing bug around UDP connections getting dropped prematurely (https://github.com/kubernetes/kubernetes/issues/45976).

Environment:

Kubernetes version (use kubectl version): tested on 1.8, but same issue in 1.9 afaict
Cloud provider or hardware configuration: AWS
OS (e.g. from /etc/os-release): Ubuntu 16.04
Kernel (e.g. uname -a): 4.4.0
Install tools:
Others:

About this issue

Original URL
State: closed
Created 6 years ago
Comments: 32 (26 by maintainers)

Most upvoted comments

I think it will break a lot of people’s expectations if IPVS doesn’t support graceful termination with at least TCP - this is the current behavior with iptables and userspace mode.

+10

jsravn on Jun 18, 2018

@jhorwit2 at this point, I don’t think this should block GA. However, we should prioritize releasing a fix for this in a patch release.

On Jun 5, 2018, at 10:02 PM, DuJun notifications@github.com wrote:

@jhorwit2

I don’t think this issue should be a blocker for IPVS GA. Per discussion, the graduation criteria are:

a) CIs are green

b) necessary documents are available there

Please note that iptables is still the default mode though IPVS have become GA.

Anyway, my team will take a look at this issue.

cc @Lion-Wei @islinwb @stewart-yu

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

rramkumar1 on Jun 6, 2018

#64947?

m1093782566 on Jul 9, 2018

@jsravn @m1093782566 @rramkumar1 @jhorwit2 Hi, guys. Recently I have been testing this issue, and got some result. Here is my test step:

Create a pod with liseCycle/preStop specified and terminationGracePeriodSeconds=300. Expose this pod to a service.

apiVersion: v1
kind: Pod
...
    lifecycle:
      preStop:
        exec:
          command:
          - /bin/sh
          - -c
          - sleep 300
...
  terminationGracePeriodSeconds: 300

Check this pod and service.

# k get svc
NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)    AGE
sourceip     ClusterIP   10.0.0.175   <none>        8080/TCP   8s
# k get pod -owide
NAME           READY     STATUS    RESTARTS   AGE       IP           NODE
sourceip       1/1       Running   0          45s       172.17.0.4   127.0.0.1

Use telnet test service clusterIP connection.

# time telnet 10.0.0.133 8080
Trying 10.0.0.133...
Connected to 10.0.0.133.
Escape character is '^]'.
Connection closed by foreign host.

Use kubectl delete pod, considering what we specified in pod preStop, this pod should be in terminating status for 300s.

# k delete pod sourceip
pod "sourceip" deleted
# k get pod
NAME           READY     STATUS        RESTARTS   AGE
sourceip       1/1       Terminating   0          3m

Check ipvs rules.

# ipvs real server have been deleted
# ipvsadm -l
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  10.0.0.1:https rr
  -> 10.162.199.82:6443           Masq    1      1          0
TCP  10.0.0.10:domain rr
  -> 172.17.0.4:domain            Masq    1      0          0
TCP  10.0.0.133:http-alt rr
UDP  10.0.0.10:domain rr
  -> 172.17.0.4:domain            Masq    1      0          0
# check ipvs connections
# ipvsadm -lnc
IPVS connection entries
pro expire state       source             virtual            destination
TCP 14:06  ESTABLISHED 10.0.0.133:58441   10.0.0.133:8080    172.17.0.5:80

We can see ipvs realserver have been deleted, but telnet still in connection, and the connection is in state ESTABLISHED with an expire time 14m6s.

After 5min, the container be deleted, telnet connection closed. Check ipvsadm connection.

# telnet returned
# time telnet 10.0.0.133 8080
Trying 10.0.0.133...
Connected to 10.0.0.133.
Escape character is '^]'.
Connection closed by foreign host.

real    5m12.514s
user    0m0.000s
sys     0m0.003s

# ipvs connection in FIN_WAIT status, with expire time.
# ipvsadm -lnc
IPVS connection entries
pro expire state       source             virtual            destination
TCP 03:56  FIN_WAIT    10.0.0.133:59648   10.0.0.133:8080    172.17.0.5:80

So, according to my test, I think ipvs proxier should have graceful termination for lone lived connection. If you have any question or suggestion about the test process, please let me know. And I’d like people introduce other test process.

Lion-Wei on Jul 3, 2018

Hopefully you can fix this for UDP connections too, since iptables proxier suffers from a bug where it drops the udp connection immediately (causing errors when kube-dns is restarted for instance…). It’d be nice if the ipvs proxier could handle that better.

jsravn on Jan 5, 2018