k3s: Worker agent: "failed to get CA certs: EOF"
Version:
k3s version v1.17.2+k3s1 (cdab19b0)
- Server was installed with
curl -sfL https://get.k3s.io | sh - - Agent was installed with
curl -sfL https://get.k3s.io | K3S_URL=https://controllerpi.local:6443 K3S_TOKEN=... sh -
Simple build here: two brand new RPi4’s running Raspbian Buster Lite, apt update/upgrade were run before I installed k3s. I also updated the RPi’s eeprom (with rpi-eeprom-update). Server has hostname controllerpi and worker has hostname workerpi1.
Worker won’t connect to server. Log is full of:
=info msg="Starting k3s agent v1.17.2+k3s1 (cdab19b0)"
=info msg="module overlay was already loaded"
=info msg="module br_netfilter was already loaded"
=info msg="Running load balancer 127.0.0.1:41461 -> [controllerpi.local:6443]"
=error msg="failed to get CA certs at https://127.0.0.1:41461/cacerts: Get https://127.0.0.1:41461/cacerts: read tcp 127.0.0.1:48874->127.0.0.1:41461: read: co
=error msg="failed to get CA certs at https://127.0.0.1:41461/cacerts: Get https://127.0.0.1:41461/cacerts: EOF"
=error msg="failed to get CA certs at https://127.0.0.1:41461/cacerts: Get https://127.0.0.1:41461/cacerts: EOF"
=error msg="failed to get CA certs at https://127.0.0.1:41461/cacerts: Get https://127.0.0.1:41461/cacerts: EOF"
=error msg="failed to get CA certs at https://127.0.0.1:41461/cacerts: Get https://127.0.0.1:41461/cacerts: EOF"
=error msg="failed to get CA certs at https://127.0.0.1:41461/cacerts: Get https://127.0.0.1:41461/cacerts: EOF"
I’ve tried using curl on the load balancer url (https://127.0.0.1:41461/cacerts), the server’s hostname (https://controllerpi.local:6443/cacerts), and the server’s IP (https://192.168.1.X:6443/cacerts). They all fail with:
curl: (60) SSL certificate problem: unable to get local issuer certificate
More details here: https://curl.haxx.se/docs/sslcerts.html
curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.
If I add the -k flag to curl it works in all cases (load balancer, hostname, and IP). I’m wondering if that has something to do with it.
Thanks!
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 29 (11 by maintainers)
After trying many things, I eventually figured out the issue: when I installed the agent, I used
K3S_URL=https://controllerpi.local:6443. This seemed to work fine - I can resolvecontrollerpi.localeverywhere on my network. But, it appears the k3s agent load balancer cannot resolve that hostname. When I edited the/etc/systemd/system/k3s-agent.service.envfile to change the K3S_URL tohttps://192.168.1.168:6443and then restart the k3s-agent service it worked fine.One of the things to note about the .local domain is that this is usually served by mDNS (aka Bonjour). While you might be able to resolve .local addresses from the host command line, k3s uses the golang native resolver that does not support mDNS. You can use hosts file entries, a local DNS server, or IP addresses, instead.
@bmatcuk do your pi’s have the correct time or were the certs possibly created when they didn’t have the correct time.
I can across this when having the same issue relying on mDNS for host resolution. I was able to continue using DHCP and work around the issue as follows.
#K3S_URL=https://raspi-k3-controller.local:6443 #K3S_URL=https://192.168.2.70:6443 K3S_URL=https://$(getent hosts raspi-k3-controller.local | awk ‘{print $1}’ | head -1):6443
@bmatcuk - Thanks for posting your solution here; I had a similar problem, when setting up a Pi cluster today. I had changed the hostname of the master node after installing K3s, but forgot that the hostname is embedded in the agent nodes’ k3s service file config. So I had to update all those configs and make sure they were pointed at the right DNS name/IP of the master, and they all connected pretty much immediately after a service restart!
If you’re running a traefik reverse proxy as your external load balancer in a HA config, this is what did the trick for me:
add loadbalancer server port label:
traefik.http.services.k3s.loadbalancer.server.port=6443add an entrypoint:
add TCP router and service:
working now!! I guess it does not automatically label node with worker Role?
kubectl get nodes centos8-vm1: Tue Mar 23 14:51:29 2021
NAME STATUS ROLES AGE VERSION centos8-vm1 Ready control-plane,master 44h v1.20.4+k3s1 rpi5 Ready <none> 19s v1.20.4+k3s1 rpi4 Ready <none> 19s v1.20.4+k3s1 rpi3 Ready <none> 18s v1.20.4+k3s1 rpi2 Ready <none> 18s v1.20.4+k3s1 rpi0 Ready <none> 6h5m v1.20.4+k3s1 rpi1 Ready <none> 18s v1.20.4+k3s1
oh, should there be :6443 at end of the server URL?
That’s not the correct port.
Thanks for putting up your solution, @bmatcuk . There was an important hint in here, namely that the error could be caused by a node not being able to talk to the master. In my case I had an incorrect firewall policy that prevented some of the nodes from reaching the master, but the error thrown left me clueless. You likely saved me a lot of time today.