k3s: cluster networking is broken?

helm install job never succeed, it seem that it is not possible to reach dns server.

alpine:/home/alpine/k3s/dist/artifacts# ./k3s kubectl  get all -n kube-system 
NAME                             READY   STATUS             RESTARTS   AGE
pod/coredns-7748f7f6df-tp7fq     1/1     Running            1          104m
pod/helm-install-traefik-g5rmk   0/1     CrashLoopBackOff   21         104m

NAME               TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE
service/kube-dns   ClusterIP   10.43.0.10   <none>        53/UDP,53/TCP,9153/TCP   104m

NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/coredns   1/1     1            1           104m

NAME                                 DESIRED   CURRENT   READY   AGE
replicaset.apps/coredns-7748f7f6df   1         1         1       104m

NAME                             COMPLETIONS   DURATION   AGE
job.batch/helm-install-traefik   0/1           104m       104m

./k3s kubectl   -n kube-system logs -f pod/helm-install-traefik-g5rmk
+ export HELM_HOST=127.0.0.1:44134+ 
tiller --listen=127.0.0.1:44134 --storage=secret
+ HELM_HOST=127.0.0.1:44134
+ helm init --client-only
[main] 2019/02/08 20:48:52 Starting Tiller v2.12.3 (tls=false)
[main] 2019/02/08 20:48:52 GRPC listening on 127.0.0.1:44134
[main] 2019/02/08 20:48:52 Probes listening on :44135
[main] 2019/02/08 20:48:52 Storage driver is Secret
[main] 2019/02/08 20:48:52 Max history per release is 0
Creating /root/.helm 
Creating /root/.helm/repository 
Creating /root/.helm/repository/cache 
Creating /root/.helm/repository/local 
Creating /root/.helm/plugins 
Creating /root/.helm/starters 
Creating /root/.helm/cache/archive 
Creating /root/.helm/repository/repositories.yaml 
Adding stable repo with URL: https://kubernetes-charts.storage.googleapis.com 
Error: Looks like "https://kubernetes-charts.storage.googleapis.com" is not a valid chart repository or cannot be reached: Get https://kubernetes-charts.storage.googleapis.com/index.yaml: dial tcp: lookup kubernetes-charts.storage.googleapis.com on 10.43.0.10:53: read udp 10.42.0.4:39333->10.43.0.10:53: i/o timeout

Verify by running a busy box

alpine:/home/alpine/k3s/dist/artifacts# ./k83s kubectl run -i --tty busybox --image=busybox --restart=Never -- sh
ash: ./k83s: not found
alpine:/home/alpine/k3s/dist/artifacts# ./k3s kubectl run -i --tty busybox --image=busybox --restart=Never -- sh
If you don't see a command prompt, try pressing enter.
/ # 
/ # ping 10.43.0.10
PING 10.43.0.10 (10.43.0.10): 56 data bytes
^C
--- 10.43.0.10 ping statistics ---
7 packets transmitted, 0 packets received, 100% packet loss
/ # ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
3: eth0@if8: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1450 qdisc noqueue 
    link/ether 32:03:33:52:8c:19 brd ff:ff:ff:ff:ff:ff
    inet 10.42.0.6/24 brd 10.42.0.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::3003:33ff:fe52:8c19/64 scope link 
       valid_lft forever preferred_lft forever
/ # ping 10.43.0.10
PING 10.43.0.10 (10.43.0.10): 56 data bytes
^C
--- 10.43.0.10 ping statistics ---
6 packets transmitted, 0 packets received, 100% packet loss
/ # ping 10.42.0.6
PING 10.42.0.6 (10.42.0.6): 56 data bytes
64 bytes from 10.42.0.6: seq=0 ttl=64 time=0.109 ms
64 bytes from 10.42.0.6: seq=1 ttl=64 time=0.108 ms
64 bytes from 10.42.0.6: seq=2 ttl=64 time=0.106 ms
^C
--- 10.42.0.6 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.106/0.107/0.109 ms

About this issue

Original URL
State: closed
Created 5 years ago
Reactions: 13
Comments: 39 (13 by maintainers)

Commits related to this issue

fix: dial tcp connect: connection refused https://github.com/k3s-io/k3s/issues/24#issuecomment-644215491 — committed to TeamE9uana/DeepMush by DPS0340 2 years ago

Most upvoted comments

Fedora 29 these fixed both CoreDNS and Traefik install for me:

firewall-cmd --permanent --direct --add-rule ipv4 filter INPUT 1 -i cni0 -s 10.42.0.0/16 -j ACCEPT
firewall-cmd --permanent --direct --add-rule ipv4 filter FORWARD 1 -s 10.42.0.0/15 -j ACCEPT
firewall-cmd --reload

Might be possible to further narrow down or optimise the /15.

+26

xykonur on Mar 22, 2019

Pings to 10.43.0.0/16 addresses aren’t going to respond, due to them being ‘virtual’ and only really existing within iptables. If your DNS requests are getting to the CoreDNS pod, then cluster networking looks like it’s working. Your issue may be related to https://github.com/rancher/k3s/issues/53 (how on earth did a DNS problem get issue number 53…)

+18

aaliddell on Mar 13, 2019

I’ve not used firewalld before, but essentially you need to add a rule equivalent to this iptables rule:

iptables -I INPUT 1 -i cni0 -s 10.42.0.0/16 -j ACCEPT

This rule says, permit incoming packets from interface (bridge) cni0, with source in range 10.42.0.0/16. If you wanted to be more granular, you could individually open each port, rather than accept any.

+14

aaliddell on Mar 5, 2019

I would just like to summarize this post by clearly stating the two iptables rules, taken from above, whose fixed my broken fresh install of k3s in a matter of (a fraction of) a second, after several days of struggling with it:

sudo iptables -I INPUT 1 -i cni0 -s 10.42.0.0/16 -j ACCEPT sudo iptables -I FORWARD 1 -s 10.42.0.0/15 -j ACCEPT

Many many thanks to you all for your contribution!

+13

adacaccia on Jun 15, 2020

In case if you are having a similar issue, I notice there were rules related to docker in my chains. (I was using containerd). The steps I have followed:

Stop cluster. (systemctl stop k3s on master, systemctl stop k3s-agent on agents)
Delete all iptables rules in your chains like here: https://serverfault.com/a/200658/455081
Start cluster again.

+11

devopswise on Jan 15, 2020

This comment:

iptables -I INPUT 1 -i cni0 -s 10.42.0.0/16 -j ACCEPT

briandealwis on Mar 12, 2019

Getting the same issue with multi nodes k3s. With only one node, everything work like a charm. When adding a new node :

it show up with kubectl get nodes
when running new pods, they start properly on this node

I’m playing with this command kubectl run -it --rm --restart=Never busybox --image=busybox sh When codeDNS and busybox pods are not on the same host, they can’t talk. But when they are on the same node, they can…

My config

Two fresh Centos 7 launched on GCP with no firewall filtering between them. k3s cluster launched with those start commands : server : /usr/local/bin/k3s server --docker node : /usr/local/bin/k3s agent --docker --server https://server:6443 --token "TOKEN"

Lunik on Mar 30, 2019

In fact the coredns isn’t able to reach out to 1.1.1.1:53. I changed it according to #53 and now its working!!

Thanks!

BTW: you’re right on the issue number. What a nice coincidence!

sahlex on Mar 13, 2019

I’m seeing this with VMWare Photon 3.0. Adding @aaliddell’s snippet to /etc/systemd/scripts/ip4save has done the trick.

briandealwis on Mar 5, 2019

@maci0, woops, you’re right, I edited my response above, sorry for the delay @jbutler992

kbrowder on May 26, 2020

I’ve not used firewalld before, but essentially you need to add a rule equivalent to this iptables rule:
iptables -I INPUT 1 -i cni0 -s 10.42.0.0/16 -j ACCEPT
This rule says, permit incoming packets from interface (bridge) cni0, with source in range 10.42.0.0/16. If you wanted to be more granular, you could individually open each port, rather than accept any.

Is it correct for ufw or am I the other way around :

sudo ufw allow in on cni0 from 10.42.0.0/16 comment "K3s rule : https://github.com/rancher/k3s/issues/24#issuecomment-469759329"

adi90x on Jul 25, 2019

To anyone running into this issue on Fedora, the proper command to add the iptables rule is:

firewall-cmd --permanent --direct --add-rule ipv4 filter INPUT 1 -i cni0 -s 10.42.0.0/16 -j ACCEPT

and then a firewall-cmd --reload fixed it for me.

Still having issues with DNS resolving though.

odensc on Mar 17, 2019

@liyimeng can you ensure the br_netfilter module is loaded. The agent is supposed to load this module but it seems to not always work. I’m troubleshooting that now.

ibuildthecloud on Feb 9, 2019