k3s: Master failing to join cluster - TLS handshake error: bad certificate
K3s Version:
k3s version v1.17.2+k3s1 (cdab19b0)
Node(s) CPU architecture, OS, and Version: Linux 3.10.0-1062.12.1.el7.x86_64 #1 SMP Thu Dec 12 06:44:49 EST 2019 x86_64 x86_64 x86_64 GNU/Linux
Cluster Configuration: 4 masters, 8 agents
Describe the bug: Master node dropped from the cluster and is unable to rejoin due to certificate issues.
Steps To Reproduce: Running airgap installation with the below service file
[Unit]
Description=Lightweight Kubernetes
Documentation=https://k3s.io
Wants=network-online.target
[Install]
WantedBy=multi-user.target
[Service]
Type=notify
EnvironmentFile=/etc/systemd/system/k3s.service.env
KillMode=process
Delegate=yes
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity
TasksMax=infinity
TimeoutStartSec=0
Restart=always
RestartSec=5s
ExecStartPre=-/sbin/modprobe br_netfilter
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/opt/app/mdx/MdxCache1/k3s/bin/k3s \
server \
'--data-dir' \
'/opt/app/mdx/MdxCache1/k3s-data' \
'--datastore-endpoint' \
'https://...:2379,https://...:2379,https://...:2379,https://....:2379,https://....:2379' \
'--datastore-cafile' \
'/opt/app/mdx/MdxCache1/k3s/etcd/ca.crt' \
'--datastore-certfile' \
'/opt/app/mdx/MdxCache1/k3s/etcd/etcd-server.crt' \
'--datastore-keyfile' \
'/opt/app/mdx/MdxCache1/k3s/etcd/etcd-server.key' \
'--tls-san' \
'KUBERNETES-CA' \
- Installed K3s:
Expected behavior: Master node should join the cluster
Actual behavior:
Master failed to join the cluster with http: TLS handshake error from 127.0.0.1:40404: remote error: tls: bad certificate
Additional context / logs:
Oct 06 17:06:00 lonrs13760 k3s[12866]: http: TLS handshake error from 127.0.0.1:40396: remote error: tls: bad certificate
Oct 06 17:06:00 lonrs13760 k3s[12866]: time="2020-10-06T17:06:00.382530597+01:00" level=info msg="Wrote kubeconfig /etc/rancher/k3s/k3s.yaml"
Oct 06 17:06:00 lonrs13760 k3s[12866]: time="2020-10-06T17:06:00.382558967+01:00" level=info msg="Run: k3s kubectl"
Oct 06 17:06:00 lonrs13760 k3s[12866]: time="2020-10-06T17:06:00.382571906+01:00" level=info msg="k3s is up and running"
Oct 06 17:06:00 lonrs13760 systemd[1]: Started Lightweight Kubernetes.
Oct 06 17:06:00 lonrs13760 k3s[12866]: time="2020-10-06T17:06:00.382810224+01:00" level=info msg="module overlay was already loaded"
Oct 06 17:06:00 lonrs13760 k3s[12866]: time="2020-10-06T17:06:00.382836321+01:00" level=info msg="module nf_conntrack was already loaded"
Oct 06 17:06:00 lonrs13760 k3s[12866]: time="2020-10-06T17:06:00.382851414+01:00" level=info msg="module br_netfilter was already loaded"
Oct 06 17:06:00 lonrs13760 k3s[12866]: I1006 17:06:00.385210 12866 controller.go:606] quota admission added evaluator for: addons.k3s.cattle.io
Oct 06 17:06:00 lonrs13760 k3s[12866]: http: TLS handshake error from 127.0.0.1:40404: remote error: tls: bad certificate
Oct 06 17:06:00 lonrs13760 k3s[12866]: http: TLS handshake error from 127.0.0.1:40410: remote error: tls: bad certificate
Oct 06 17:06:00 lonrs13760 k3s[12866]: time="2020-10-06T17:06:00.418355513+01:00" level=info msg="Using registry config file at /etc/rancher/k3s/registries.yaml"
Oct 06 17:06:00 lonrs13760 k3s[12866]: time="2020-10-06T17:06:00.421453594+01:00" level=info msg="Logging containerd to /opt/app/mdx/MdxCache1/k3s-data/agent/containerd/containerd.log"
Oct 06 17:06:00 lonrs13760 k3s[12866]: time="2020-10-06T17:06:00.421624206+01:00" level=info msg="Running containerd -c /opt/app/mdx/MdxCache1/k3s-data/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containerd --root /opt/app/mdx/MdxCache1/k3s-data/agent/containerd"
Oct 06 17:06:00 lonrs13760 k3s[12866]: time="2020-10-06T17:06:00.421784727+01:00" level=info msg="Waiting for containerd startup: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial unix /run/k3s/containerd/containerd.sock: connect: connection refused\""
Have tried removing the certs from /opt/app/mdx/MdxCache1/k3s-data/server/tls and deleting /api/v1/namespaces/kube-system/secrets/k3s-serving secret to regenerate new certificates and restarting but no improvement.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 3
- Comments: 18 (8 by maintainers)
maybe do not set hostname