k3s: Can't reach internet from pod / container

Environmental Info: K3s Version:

k3s -v
k3s version v1.22.7+k3s1 (8432d7f2)
go version go1.16.10

Host OS Version:

cat /etc/os-release 
NAME="Ubuntu"
VERSION="20.04.4 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.4 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

IP Forwarding:

# sysctl net.ipv4.ip_forward
net.ipv4.ip_forward = 1

Node(s) CPU architecture, OS, and Version:

Linux ansible-awx 5.4.0-105-generic #119-Ubuntu SMP Mon Mar 7 18:49:24 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration: Single node.

# k3s kubectl get nodes -o wide
NAME          STATUS   ROLES                  AGE     VERSION        INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
ansible-awx   Ready    control-plane,master   5d10h   v1.22.7+k3s1   10.164.12.6   <none>        Ubuntu 20.04.4 LTS   5.4.0-105-generic   containerd://1.5.9-k3s1

Describe the bug: I cannot connect to the internet from within the pod / container:

# time curl https://www.google.de
curl: (7) Failed to connect to www.google.de port 443: Connection timed out

real    2m11.892s
user    0m0.005s
sys     0m0.005s

Steps To Reproduce: Install one node k3s cluster with curl -sfL https://get.k3s.io | sh on a ubuntu 20.04 VM. Setup a simple workload (in my case AWX - https://github.com/ansible/awx-operator#basic-install-on-existing-cluster). Enter a container and try to access the internet (for example with curl on a public address).

Expected behavior: Accessing the internet should be working the same way like it is from the host.

Actual behavior: No connectivity to the internet from the pod / container at all.

Additional context / logs:

# cat /etc/resolv.conf 
search awx.svc.cluster.local svc.cluster.local cluster.local mydomain.com
nameserver 10.43.0.10
options ndots:5

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 1
  • Comments: 68 (22 by maintainers)

Most upvoted comments

I have had similar issue. On my server tcpdump showed only SYN outgoing packages without SYN ACK on public interface, but not every time. The reason of problem was firewall which I can setup for the sever machine in providers firewall (not in local system). One of the rules was allowing outgoing traffic from ports 32678-65535 only. Most outgoing connections are using this port range, but I don’t known why k3s doesn’t.

What you described turned out to be the reason for my issues as well! My provider (hetzner) had a rule for incoming TCP ports ranging from 32768 to 65535 with the flag ack in the default template, which was applied to my server. After changing the port range starting from 32768 to 0 the connection tests worked reliably. This perfectly explains, why some attempts did work before while others don’t: If the randomly selected port was in the upper (not blocked) range, it worked. For the ports below 32768 it did not. This has never been an issue before for me before. It still seems strange to me why the VM running k3s, the host and all other VMs are using the upper half of the available ports and k3s seems to use the full range of possible ports.

However, thank you all for your help and support!

I’m really happy that this issue is solved.

Hi @apiening, Were you able to fix this issue? Please let us know if you are able to. I’m also facing the same issue.

Thanks

This is really interesting. The linux kernel by default uses the range 32768 to 60999 (checkcat /proc/sys/net/ipv4/ip_local_port_range) for client TCP connections. However, iptables, when using the flag --random or --random-fully replaces the source TCP port when doing the SNAT with whatever unassigned port, but it doesn’t have to be in that range. Flannel uses that flag to do SNAT. I wonder how other CNI plugins do it… but at least we should document this to avoid more users having this issue

I also have a “perhaps-similar-issue” with my k3s worker node in a VM in Contabo. This happens when doing a POST to gitlab.com but based on the issue, this will happen with any outgoing network access:

ERROR: Registering runner... failed                 runner=GR134894 status=couldn't execute POST against https://gitlab.com/api/v4/runners: Post "https://gitlab.com/api/v4/runners": dial tcp: i/o timeout
PANIC: Failed to register the runner. 

using k3s v1.22.7.

this bug only happens with pods inside that node in Contabo VM. However, if I reschedule the pod to the other node which is in Hetzner’s instance, all network is working fine.

I ran into the same issue. Just to clarify ports 0-65535 should be open on the firewall?

Disabling firewalld works for me.

In my case, after rebooting my machine, pods have Internet again.

  • k3s v1.25.4+k3s1
  • Ubuntu 22.04 as host OS.
  • Docker version 20.10.22

@apiening I sidestepped the problem

  1. Previously it was a multicloud cluster. Hetzner has the control plane and worker, and Contabo is worker only.
  2. Now I created a standalone cluster plane in Contabo where the control plane is also the worker.

Current configuration works. So something is wrong with flannel if it’s joining a cluster in different cloud.

Thanks again @manuelbuil, that’s a good plan.

I tried to bring up a pod with hostNetwork: true with the following command:

kubectl run busybox2 --image=alpine --overrides='{"kind":"Pod", "apiVersion":"v1", "spec": {"hostNetwork": true}}' --command -- sh -c 'echo Hello K3S! && sleep 3600'
kubectl exec -it busybox2 -- sh

I entered the pod/container and verified, that I do have in fact host networking. Then I did the same wget test with 100 tries:

SUCCESS=0; for i in `seq 1 100`; do wget "https://www.heise.de" -O - &> /dev/null && SUCCESS=$((SUCCESS+1)); sleep 1; done; echo $SUCCESS
100

So with hostNetwork: true all requests are passing without issues. The same way they do from the VM and the host.

So this issue must somehow be related to flannel the one way or the other. Maybe a configuration issue or a bug that happens only under specific circumstances.