k3s: Unable to connect to the server: x509: certificate signed by unknown authority - inconsistent behavior

Environmental Info: K3s Version:

# k3s -v
k3s version v1.19.7+k3s1 (5a00e38d)

Node(s) CPU architecture, OS, and Version:

Linux k3s-ya-1 3.10.0-1127.el7.x86_64 #1 SMP Tue Mar 31 23:36:51 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Linux k3s-ya-2 3.10.0-1127.el7.x86_64 #1 SMP Tue Mar 31 23:36:51 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Linux k3s-ya-3 3.10.0-1127.el7.x86_64 #1 SMP Tue Mar 31 23:36:51 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:

3 masters

Describe the bug:

# kubectl get nodes
Unable to connect to the server: x509: certificate signed by unknown authority
[root@k3s-ya-1 ~]# k3s kubectl get nodes
Unable to connect to the server: x509: certificate signed by unknown authority
[root@k3s-ya-1 ~]#

However, the result is inconsistent. Sometimes the first master node will work but 2nd and 3rd node Unable to connect to the server: x509: certificate signed by unknown authority

Steps To Reproduce:

Installed K3s:

etcd certs are copied into /root

First node - k3s-ya-1

k3s-uninstall.sh
export INSTALL_K3S_VERSION=v1.19.7+k3s1 
export K3S_DATASTORE_CAFILE=/root/ca.crt
export K3S_DATASTORE_CERTFILE=/root/apiserver-etcd-client.crt
export K3S_DATASTORE_KEYFILE=/root/apiserver-etcd-client.key
export K3S_KUBECONFIG_OUTPUT=/root/kube.confg
export K3S_DATASTORE_ENDPOINT=https://etcd1.k8s:2379,https://etcd2.k8s,https://etcd3.k8s:2379
k3s.install server

# kubectl get nodes
Unable to connect to the server: x509: certificate signed by unknown authority

^^^ this result is inconsistent - sometimes works, sometimes not

cat /var/lib/rancher/k3s/server/node-token to get token for use with additional nodes.

2nd node - k3s-ya-2

k3s-uninstall.sh
export INSTALL_K3S_VERSION=v1.19.7+k3s1 
export K3S_DATASTORE_ENDPOINT=https://etcd1.k8s:2379,https://etcd2.k8s,https://etcd3.k8s:2379
export K3S_DATASTORE_CAFILE=/root/ca.crt
export K3S_DATASTORE_CERTFILE=/root/apiserver-etcd-client.crt
export K3S_DATASTORE_KEYFILE=/root/apiserver-etcd-client.key
export K3S_TOKEN=--from first node--
export K3S_URL=https://k3s:6443
export K3S_KUBECONFIG_OUTPUT=/root/kube.confg
k3s.install server

# kubectl get nodes
NAME                  STATUS   ROLES                  AGE    VERSION
k3s-ya-1   Ready    control-plane,master   2d5h   v1.19.7+k3s1
k3s-ya-2   Ready    control-plane,master   36h    v1.19.7+k3s1

^^^ this time it worked - last 3 attempts 2nd node didn’t work but the 1st node did - go figure.

3rd node

k3s-uninstall.sh export INSTALL_K3S_VERSION=v1.19.7+k3s1 export K3S_DATASTORE_ENDPOINT=https://etcd1.k8s:2379,https://etcd2.k8s,https://etcd3.k8.:2379 export K3S_DATASTORE_CAFILE=/root/ca.crt export K3S_DATASTORE_CERTFILE=/root/apiserver-etcd-client.crt export K3S_DATASTORE_KEYFILE=/root/apiserver-etcd-client.key export K3S_TOKEN=–from first node– export K3S_URL=https://k3s:6443 export K3S_KUBECONFIG_OUTPUT=/root/kube.confg k3s.install server

# kubectl get nodes
NAME                  STATUS   ROLES                  AGE    VERSION
k3s-ya-1   Ready    control-plane,master   2d5h   v1.19.7+k3s1
k3s-ya-2   Ready    control-plane,master   36h    v1.19.7+k3s1
k3s-ya-3   Ready    master                 19s    v1.19.7+k3s1

^^^ more expected - 1/2 the time yields Unable to connect to the server: x509: certificate signed by unknown authority

Expected behavior:

Consistent behavior after k3s sever is installed. kubectl should work without certificate errors across all nodes.

Actual behavior:

Inconsistent. Some nodes Unable to connect to the server: x509: certificate signed by unknown authority others can. Uninstall and repeat - different results.

Yesterday entire cluster was working as expected no errors across all nodes with Rancher installed and running another cluster as expected. Today, Unable to connect to the server: x509: certificate signed by unknown authority on every k3s node.

It’s almost like the certificates are playing musical chairs.

Additional context / logs:

Samples from /var/log/messages

Feb  8 20:55:42 k3s-ya-1 k3s: time="2021-02-08T20:55:42.901658387-05:00" level=info msg="Cluster-Http-Server 2021/02/08 20:55:42 http: TLS handshake error from 10.1.0.84:43082: remote error: tls: bad certificate"
Feb  8 20:55:43 k3s-ya-1 k3s: time="2021-02-08T20:55:43.012864767-05:00" level=info msg="Cluster-Http-Server 2021/02/08 20:55:43 http: TLS handshake error from 10.42.2.175:46490: remote error: tls: bad certificate"

Feb  8 20:56:37 k3s-ya-2 k3s: time="2021-02-08T20:56:37.629125982-05:00" level=info msg="Cluster-Http-Server 2021/02/08 20:56:37 http: TLS handshake error from 10.1.0.85:35180: remote error: tls: bad certificate"
Feb  8 20:56:37 k3s-ya-2 k3s: time="2021-02-08T20:56:37.840388714-05:00" level=info msg="Cluster-Http-Server 2021/02/08 20:56:37 http: TLS handshake error from 10.1.0.83:42518: remote error: tls: bad certificate"

Feb  8 20:57:49 k3s-ya-3 k3s: E0208 20:57:49.215716     829 event.go:273] Unable to write event: 'Patch "https://127.0.0.1:6443/api/v1/namespaces/kube-system/events/helm-install-traefik-4lncd.1661f1476f8d4e12": x509: certificate signed by unknown authority' (may retry after sleeping)
Feb  8 20:57:49 k3s-ya-3 k3s: time="2021-02-08T20:57:49.361122818-05:00" level=info msg="Connecting to proxy" url="wss://10.1.0.81:6443/v1-k3s/connect"
Feb  8 20:57:49 k3s-ya-3 k3s: time="2021-02-08T20:57:49.362442817-05:00" level=error msg="Failed to connect to proxy" error="x509: certificate signed by unknown authority"
Feb  8 20:57:49 k3s-ya-3 k3s: time="2021-02-08T20:57:49.362463456-05:00" level=error msg="Remotedialer proxy error" error="x509: certificate signed by unknown authority"
Feb  8 20:57:49 k3s-ya-3 k3s: time="2021-02-08T20:57:49.367212594-05:00" level=info msg="Connecting to proxy" url="wss://10.1.0.82:6443/v1-k3s/connect"

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 20 (8 by maintainers)

Most upvoted comments

I had the same error message now after uninstalling and re-installing K3S. Turns out the problem was my ~/.kube/config was still referring to the old cluster. Delete that and then cp /etc/rancher/k3s/k3s.yaml ~/.kube/config to get the new context.

+24

cawoodm on Feb 26, 2022

You don’t need to set K3S_URL (–server) when using an external datastore; this is only for use when joining agents or using embedded etcd.

I am curious how you came to have two nodes with the control-plane role label. This wasn’t added until 1.20, yet your nodes are all still on 1.19. Did you upgrade temporarily, and then downgrade again?

In the past I have seen behavior like this when servers were all brought up at the same time and raced to bootstrap the cluster CA certs, or when nodes were started up with existing certs from a different cluster that they then try to use instead of the ones recognized by the rest of the cluster.

It sounds like these nodes have been through some odd things. I run my personal cluster with an external etcd and haven’t had any problems with it; I suspect something in the way you started up, upgrade, or grew this cluster has left it very confused about what certificates to use.

brandond on Feb 9, 2021