k3s: Unable to communicate between pods on different nodes if master node is behind NAT

Environmental Info: K3s Version:

k3s -v
k3s version v1.23.6+k3s1 (418c3fa8)
go version go1.17.5

Node(s) CPU architecture, OS, and Version:

Master:

uname -a
Linux ip-172-31-12-196 5.15.0-1008-aws #10-Ubuntu SMP Wed May 18 17:28:39 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Worker:

uname -a
Linux <hostname> 5.4.0-105-generic #119-Ubuntu SMP Mon Mar 7 18:49:24 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:

1 master node, 1 worker node Describe the bug:

Steps To Reproduce:

  • Installed K3s: Install k3s on master node:
sudo curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="--tls-san <ip> --node-external-ip <ip> --kube-apiserver-arg external-hostname=<ip>" sh -

Install k3s on worker node:

sudo curl -sfL https://get.k3s.io | K3S_URL=https://<master ip>:6443 K3S_NODE_NAME=worker K3S_TOKEN=<token> INSTALL_K3S_EXEC='--node-label worker=true --flannel-iface eth0 --debug -v 3' sh -

Master node is behind NAT. It has public IP, but IP on eth0 interface is different (172.31.12.196).

I deploy 2 pods and services, each on every node:

apiVersion: v1
kind: Pod
metadata:
  name: echo-devel
  namespace: echo
  labels:
    app: echo-devel
spec:
  nodeSelector:
    worker: "true"
  containers:
    - name: echo
      image: ealen/echo-server
      ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: devel
  namespace: echo
spec:
  selector:
    app: echo-devel
  ports:
    - port: 80
      name: "80"
  type: NodePort
---
apiVersion: v1
kind: Pod
metadata:
  name: echo-master
  namespace: echo
  labels:
    app: echo-master
spec:
  nodeSelector:
    node-role.kubernetes.io/master: "true"
  containers:
    - name: echo
      image: ealen/echo-server
      ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: master
  namespace: echo
spec:
  selector:
    app: echo-master
  ports:
    - port: 80
      name: "80"
  type: NodePort

Then I execute the following command from container on master node:

telnet master.echo.svc.cluster.local 80
Connected to master.echo.svc.cluster.local
^C

telnet devel.echo.svc.cluster.local 80
^C

As you can see, it’s unable to connect to pod on another (worker node).

Then I execute the following commands on worker node’s pod:

nslookup devel.echo.svc.cluster.local
;; connection timed out; no servers could be reached 

It doesn;t connect to k3s DNS server. Expected behavior:

Pods can connect to each another, no matter they are on the same node or not. Actual behavior:

Pods can’t connect to each other if they are of different nodes. Additional context / logs:

I tried to upgrade my iptables rules:

iptables -I INPUT 1 -i cni0 -s 10.42.0.0/16 -j ACCEPT
iptables -I INPUT 1 -i cni0 -s 10.43.0.0/16 -j ACCEPT

But it didn’t help.

Master routes:

ip route
default via 172.31.0.1 dev eth0 proto dhcp src 172.31.12.196 metric 100
10.42.0.0/24 dev cni0 proto kernel scope link src 10.42.0.1
10.42.1.0/24 via 10.42.1.0 dev flannel.1 onlink
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
172.31.0.0/20 dev eth0 proto kernel scope link src 172.31.12.196 metric 100
172.31.0.1 dev eth0 proto dhcp scope link src 172.31.12.196 metric 100
172.31.0.2 dev eth0 proto dhcp scope link src 172.31.12.196 metric 100

Worker routes:

ip route
default via 172.31.1.1 dev eth0 proto dhcp src <worker-external-ip> metric 100
10.42.0.0/24 via 10.42.0.0 dev flannel.1 onlink
10.42.1.0/24 dev cni0 proto kernel scope link src 10.42.1.1
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
172.19.0.0/16 dev br-9c77cd708ba2 proto kernel scope link src 172.19.0.1
172.31.1.1 dev eth0 proto dhcp scope link src <worker-external-ip> metric 100

I also tried to launch new cluster, where master node is not behind NAT (external IP = IP no eth0 interface), the worker node stayed the same, and everything worked fine. But I have to use AWS server which is behind NAT.

All firewalls (both on master and worker nodes) are disabled.

Backporting

  • Needs backporting to older releases

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 23 (8 by maintainers)

Most upvoted comments

Demo ready: same init script, then:

kubectl apply -f https://raw.githubusercontent.com/UrielCh/dyn-ingress/main/demo.yml

This config contains:

  • Lots of comments 😄
  • A DaemonSet of docker.io/traefik/whoami:v1.8
  • A strip-prefix Middleware for traefik
  • An empty Ingress with a dummy path /fake-url to make it valid.
  • A ServiceAccount / Role / RoleBinding to allow dyn-ingress to populate the dummy Ingress with routes to all docker.io/traefik/whoami:v1.8
  • My hand crafted with love dyn-ingress sources.

Once deployed, accessing with a browser to the k3s will list all nodes in an html page, or return a json list of pods if requested from a script.

The dyn-ingress pod, will output nice activity summary in its logs, with ANSI color, but kubectl logs look to drop all my colors 😞

Now let’s try more test.

You might try out the new flag that was recently added: https://github.com/k3s-io/k3s/pull/6321

You might try using the wireguard flannel(flannel-native) backend. vxlan, host-gw, and udp are all unlikely to properly transit across the internet or other networks that might mangle the packets (as well as being wildly insecure for use across the internet).