k3s: Master failing to join cluster - TLS handshake error: bad certificate

K3s Version: k3s version v1.17.2+k3s1 (cdab19b0)

Node(s) CPU architecture, OS, and Version: Linux 3.10.0-1062.12.1.el7.x86_64 #1 SMP Thu Dec 12 06:44:49 EST 2019 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration: 4 masters, 8 agents

Describe the bug: Master node dropped from the cluster and is unable to rejoin due to certificate issues.

Steps To Reproduce: Running airgap installation with the below service file

[Unit]
Description=Lightweight Kubernetes
Documentation=https://k3s.io
Wants=network-online.target

[Install]
WantedBy=multi-user.target

[Service]
Type=notify
EnvironmentFile=/etc/systemd/system/k3s.service.env
KillMode=process
Delegate=yes
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity
TasksMax=infinity
TimeoutStartSec=0
Restart=always
RestartSec=5s
ExecStartPre=-/sbin/modprobe br_netfilter
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/opt/app/mdx/MdxCache1/k3s/bin/k3s \
    server \
        '--data-dir' \
        '/opt/app/mdx/MdxCache1/k3s-data' \
        '--datastore-endpoint' \
        'https://...:2379,https://...:2379,https://...:2379,https://....:2379,https://....:2379' \
        '--datastore-cafile' \
        '/opt/app/mdx/MdxCache1/k3s/etcd/ca.crt' \
        '--datastore-certfile' \
        '/opt/app/mdx/MdxCache1/k3s/etcd/etcd-server.crt' \
        '--datastore-keyfile' \
        '/opt/app/mdx/MdxCache1/k3s/etcd/etcd-server.key' \
        '--tls-san' \
        'KUBERNETES-CA' \

Installed K3s:

Expected behavior: Master node should join the cluster

Actual behavior: Master failed to join the cluster with http: TLS handshake error from 127.0.0.1:40404: remote error: tls: bad certificate

Additional context / logs:

Oct 06 17:06:00 lonrs13760 k3s[12866]: http: TLS handshake error from 127.0.0.1:40396: remote error: tls: bad certificate
Oct 06 17:06:00 lonrs13760 k3s[12866]: time="2020-10-06T17:06:00.382530597+01:00" level=info msg="Wrote kubeconfig /etc/rancher/k3s/k3s.yaml"
Oct 06 17:06:00 lonrs13760 k3s[12866]: time="2020-10-06T17:06:00.382558967+01:00" level=info msg="Run: k3s kubectl"
Oct 06 17:06:00 lonrs13760 k3s[12866]: time="2020-10-06T17:06:00.382571906+01:00" level=info msg="k3s is up and running"
Oct 06 17:06:00 lonrs13760 systemd[1]: Started Lightweight Kubernetes.
Oct 06 17:06:00 lonrs13760 k3s[12866]: time="2020-10-06T17:06:00.382810224+01:00" level=info msg="module overlay was already loaded"
Oct 06 17:06:00 lonrs13760 k3s[12866]: time="2020-10-06T17:06:00.382836321+01:00" level=info msg="module nf_conntrack was already loaded"
Oct 06 17:06:00 lonrs13760 k3s[12866]: time="2020-10-06T17:06:00.382851414+01:00" level=info msg="module br_netfilter was already loaded"
Oct 06 17:06:00 lonrs13760 k3s[12866]: I1006 17:06:00.385210   12866 controller.go:606] quota admission added evaluator for: addons.k3s.cattle.io
Oct 06 17:06:00 lonrs13760 k3s[12866]: http: TLS handshake error from 127.0.0.1:40404: remote error: tls: bad certificate
Oct 06 17:06:00 lonrs13760 k3s[12866]: http: TLS handshake error from 127.0.0.1:40410: remote error: tls: bad certificate
Oct 06 17:06:00 lonrs13760 k3s[12866]: time="2020-10-06T17:06:00.418355513+01:00" level=info msg="Using registry config file at /etc/rancher/k3s/registries.yaml"
Oct 06 17:06:00 lonrs13760 k3s[12866]: time="2020-10-06T17:06:00.421453594+01:00" level=info msg="Logging containerd to /opt/app/mdx/MdxCache1/k3s-data/agent/containerd/containerd.log"
Oct 06 17:06:00 lonrs13760 k3s[12866]: time="2020-10-06T17:06:00.421624206+01:00" level=info msg="Running containerd -c /opt/app/mdx/MdxCache1/k3s-data/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containerd --root /opt/app/mdx/MdxCache1/k3s-data/agent/containerd"
Oct 06 17:06:00 lonrs13760 k3s[12866]: time="2020-10-06T17:06:00.421784727+01:00" level=info msg="Waiting for containerd startup: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial unix /run/k3s/containerd/containerd.sock: connect: connection refused\""

Have tried removing the certs from /opt/app/mdx/MdxCache1/k3s-data/server/tls and deleting /api/v1/namespaces/kube-system/secrets/k3s-serving secret to regenerate new certificates and restarting but no improvement.

About this issue

Original URL
State: closed
Created 4 years ago
Reactions: 3
Comments: 18 (8 by maintainers)

Most upvoted comments

maybe do not set hostname

ghost on Sep 3, 2022