k3s: Pods in different networks are unable to communicate between them.
This might be related to the discussion on https://github.com/rancher/k3s/pull/881, but with some slight differences.
Version:
k3s -v
k3s version v1.18.2+k3s1 (698e444a)
K3s arguments:
curl -sfL https://get.k3s.io | K3S_TOKEN="${token}" K3S_URL="https://${endpoint}:6443/" sh -s - $(scw-userdata k3s_node_labels) --node-external-ip="$(scw-metadata --cached PUBLIC_IP_ADDRESS)"
Describe the bug I am trying to run a cluster which has some nodes hosted on scaleway vps and some in another network/provider.
On scaleway nodes are assigned private ips, but they also have a routable public ip (1:1) and by passing a flag node-external-ip I am able to have a proper internal and external ip assigned to nodes, i.e.
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k3s-server-001 Ready <none> 23m v1.18.2+k3s1 10.64.212.xx 212.47.252.xx Ubuntu 20.04 LTS 5.4.0-1011-kvm containerd://1.3.3-k3s2
k3s-server-002 Ready <none> 39m v1.18.2+k3s1 10.69.66.xx 51.158.109.xx Ubuntu 20.04 LTS 5.4.0-1011-kvm containerd://1.3.3-k3s2
k3s-online-01 Ready master 4h3m v1.18.2+k3s1 62.210.202.xx <none> Debian GNU/Linux 10 (buster) 4.19.0-9-amd64 containerd://1.3.3-k3s2
(above the third node is the master and 2 scaleway’s nodes are workers, it was a test to see if having master on public ip solves this issue or not. Normal situation is that 2 scaleway nodes are master and the k3s-online-01 is slave)
Third node which is situated on another provider has only public ip address.
For test purposes all nodes have been wiped clean, they also had the same OS version (debian buster), no firewall is set up.
The issue is that until I remain in the Scaleway realm, everything works as expected. When I add an external node from a different network it shows up as ready in the cluster, I am able to deploy pods to it and exec into a shell. However I am unable to reach any of the other network, dns resolution doesnt’ work on any address (kubernetes or outside word), and I am only able to ping resources in WWW.
Is there something I am missing? I tried flannel with default/ipsec/wireguard setting with no success so far.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 22 (7 by maintainers)
For now I’ve put up a deployable workaround (https://github.com/alekc-go/flannel-fixer) which involves launching a listener deployment which fixes this annotation on existing and any new node joining the cluster. Will try to chase it/debug on flannel side of things.
After a bit of investigation I am able to shed more light onto this issue.
So, it turns out the problem is indeed in flannel. Even though node have private and public ip correctly reported in
kubectl get nodes -o wide, on node annotation flannel reportsnode public-ip which contains node’s private ip.
After adding an additional annotation to the node
and restarting k3s-agent, I can see in logs
After that everything begins to work again. I need to do some further testing, but so far results are encouraging.