k3s: Service with no backing application takes approx. 120s to fail (w/ weave CNI)

Environmental Info: K3s Version: 1.22.6 and 1.23.6 (On Centos 7). Support tested with 1.23.6 on ubuntu 20.04

Node(s) CPU architecture, OS, and Version: CentOS 7 / Ubuntu 20.04

Cluster Configuration: Occurs on multinode and single node clusters

Describe the bug: K3s service with no backing application takes approx 120sec to fail when performing netcat

Steps To Reproduce: Create Service:

 apiVersion: v1
kind: Service
metadata:
  labels:
    app: nginx
  name: nginx
spec:
  type: NodePort
  ports:
    - name: "1313"
      port: 1313
      targetPort: 1313
      nodePort: 31111
  selector:
    app: nginx

Run busybox container:

kubectl run -i --tty --rm debug --image=busybox --restart=Never -- sh

Run check:

date; nc nginx 1313; echo $?; date

Expected behavior:

date; nc my-service 1313; echo $?; date
Thu May 12 14:00:47 UTC 2022
1
Thu May 12 14:00:48 UTC 2022

Actual behavior:

date; nc my-service 1313; echo $?; date
Thu May 12 13:35:34 UTC 2022
1
Thu May 12 13:37:42 UTC 2022

Workaround: Is workararound available and implemented? yes

What is the workaround:

“I have a workaround, using the ‘-w 1’ argument to nc, but it seems like this problem can affect anything that uses a Service to connect to something, and if we aren’t being extremely careful, the timeout issue could pop up all over the place unexpectedly.”

Additional Info: When testing on RKE2 1.22.7, there is NO 120second lag.

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 32 (15 by maintainers)

Most upvoted comments

I probably found where is the issue. It’s on the FORWARD chain with K3s we have the following chain

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         
    7   628 KUBE-ROUTER-FORWARD  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* kube-router netpol - TEMCG2JMHZYE7H7T */
    0     0 FLANNEL-FWD  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* flanneld forward */
    0     0 KUBE-PROXY-FIREWALL  all  --  *      *       0.0.0.0/0            0.0.0.0/0            ctstate NEW /* kubernetes load balancer firewall */
    0     0 KUBE-FORWARD  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes forwarding rules */
    0     0 KUBE-SERVICES  all  --  *      *       0.0.0.0/0            0.0.0.0/0            ctstate NEW /* kubernetes service portals */
    0     0 KUBE-EXTERNAL-SERVICES  all  --  *      *       0.0.0.0/0            0.0.0.0/0            ctstate NEW /* kubernetes externally-visible service portals */

The reject rule is on the KUBE-SERVICES chain but if one rule from the previous chain has ACCEPT as action the other following rules are not executed. On K3s every forwarded packet with an IP from a pod is accepted by KUBE-ROUTER-FORWARD a solution could be to disable the network policy --disable-network-policy but the same issue is on flannel because in FLANNEL-FWD the packets from the pods are accepted. The rules needs to be reordered I can fix it on flannel but for kube-router I have to check how it’s done.