rancher: [BUG] Certs and directory /var/lib/rancher/rke2/server/tls/kube-scheduler and /var/lib/rancher/rke2/server/tls/kube-controller-manager missing

Rancher Server Setup

  • Rancher version: 2.7.3
  • Installation option (Docker install/Helm Chart): HELM
    • K3S v1.25.9+k3s1

Information about the Cluster

  • Kubernetes version: v1.25.9+rke2r1
  • Cluster Type (Downstream):
    • If downstream, what type of cluster? (Custom/Imported or specify provider for Hosted/Infrastructure Provider):
    • Custom
    • Node VM OS Ubuntu 22.04

User Information

  • What is the role of the user logged in? Admin

Describe the bug When creating a new custom cluster via Rancher’s UI, the nodes that I try to add are stuck in “reconcilling” sate.

To Reproduce

  1. Create a new cluster on Rancher UI following documentation: https://ranchermanager.docs.rancher.com/reference-guides/cluster-configuration/rancher-server-configuration/rke2-cluster-configuration All parameters to default except: Kubernetes Version: v1.25.9+rke2r1 Container Network: Calico NGINX Ingress: Selected or not I had the same problem

  2. Register a node Keep all roles selected in step 1 Select “Insecure” in step 2 Copy and paste the cmd to the first node

Result The node stay in “reconcilling” state with this message: “Waiting for probes: kube-apiserver, kube-controller-manager, kube-scheduler”

Command to obtain logs on the node: sudo journalctl -u rancher-system-agent.service -f

Logs obtained: May 24 13:26:56 s001kubed01 rancher-system-agent[16705]: time="2023-05-24T13:26:56Z" level=error msg="error loading CA cert for probe (kube-scheduler) /var/lib/rancher/rke2/server/tls/kube-scheduler/kube-scheduler.crt: open /var/lib/rancher/rke2/server/tls/kube-scheduler/kube-scheduler.crt: no such file or directory" May 24 13:26:56 s001kubed01 rancher-system-agent[16705]: time="2023-05-24T13:26:56Z" level=error msg="error while appending ca cert to pool for probe kube-scheduler" May 24 13:26:57 s001kubed01 rancher-system-agent[16705]: time="2023-05-24T13:26:57Z" level=error msg="error encountered during parsing of last run time: parsing time \"\" as \"Mon Jan _2 15:04:05 MST 2006\": cannot parse \"\" as \"Mon\"" May 24 13:26:57 s001kubed01 rancher-system-agent[16705]: time="2023-05-24T13:26:57Z" level=error msg="error loading CA cert for probe (kube-controller-manager) /var/lib/rancher/rke2/server/tls/kube-controller-manager/kube-controller-manager.crt: open /var/lib/rancher/rke2/server/tls/kube-controller-manager/kube-controller-manager.crt: no such file or directory" May 24 13:26:57 s001kubed01 rancher-system-agent[16705]: time="2023-05-24T13:26:57Z" level=error msg="error while appending ca cert to pool for probe kube-controller-manager" May 24 13:26:57 s001kubed01 rancher-system-agent[16705]: time="2023-05-24T13:26:57Z" level=error msg="error loading CA cert for probe (kube-scheduler) /var/lib/rancher/rke2/server/tls/kube-scheduler/kube-scheduler.crt: open /var/lib/rancher/rke2/server/tls/kube-scheduler/kube-scheduler.crt: no such file or directory" May 24 13:26:57 s001kubed01 rancher-system-agent[16705]: time="2023-05-24T13:26:57Z" level=error msg="error while appending ca cert to pool for probe kube-scheduler" May 24 13:27:01 s001kubed01 rancher-system-agent[16705]: time="2023-05-24T13:27:01Z" level=error msg="error encountered during parsing of last run time: parsing time \"\" as \"Mon Jan _2 15:04:05 MST 2006\": cannot parse \"\" as \"Mon\""

Directories “/var/lib/rancher/rke2/server/tls/kube-controller-manager” and “/var/lib/rancher/rke2/server/tls/kube-scheduler” are indeed missing. So are the CRTs (not even misplaced)

Expected Result The node get registered automatically

Thanks for your help 😄

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Reactions: 11
  • Comments: 28 (1 by maintainers)

Most upvoted comments

same issue on Rancher 2.7.9, Calico, 1.26.8+rke2r1

Hi, good news, we have find the root cause,

We had the options “noexec” on /var mountpoint in /etc/fstab, so some execution scripts could’nt be executed.

/dev/mapper/rootvg-var /var ext4 defaults,noexec,nosuid,nodev 1 2

This is because we are using already hardened RHEL iso version.

I fixed the issue with disabling IPv6 IP-addresses on master nodes. When the first node was bootstrapped it tried to connect with two other node with IPv6 and failed. After disabling all works fine.

The Problem still occurs and it is not a fstab error. Pretty urgend issue if so many have that problem

open /var/lib/rancher/rke2/server/tls/kube-controller-manager/kube-controller-manager.crt: no such file or directory
open /var/lib/rancher/rke2/server/tls/kube-scheduler/kube-scheduler.crt: no such file or directory
error while appending ca cert to pool for probe kube-scheduler
root@master-1:/var/lib/rancher/rke2/server/tls# ll
total 148
drwx------ 4 root root 4096 Sep 26 22:03 ./
drwxr-xr-x 7 root root 4096 Sep 26 22:03 ../
-rw-r--r-- 1 root root 1177 Sep 26 22:03 client-admin.crt
-rw------- 1 root root  227 Sep 26 22:03 client-admin.key
-rw-r--r-- 1 root root 1186 Sep 26 22:03 client-auth-proxy.crt
-rw------- 1 root root  227 Sep 26 22:03 client-auth-proxy.key
-rw-r--r-- 1 root root  570 Sep 26 22:03 client-ca.crt
-rw------- 1 root root  227 Sep 26 22:03 client-ca.key
-rw-r--r-- 1 root root  570 Sep 26 22:33 client-ca.nochain.crt
-rw-r--r-- 1 root root 1165 Sep 26 22:03 client-controller.crt
-rw------- 1 root root  227 Sep 26 22:03 client-controller.key
-rw-r--r-- 1 root root 1181 Sep 26 22:03 client-kube-apiserver.crt
-rw------- 1 root root  227 Sep 26 22:03 client-kube-apiserver.key
-rw-r--r-- 1 root root 1149 Sep 26 22:03 client-kube-proxy.crt
-rw------- 1 root root  227 Sep 26 22:03 client-kube-proxy.key
-rw------- 1 root root  227 Sep 26 22:03 client-kubelet.key
-rw-r--r-- 1 root root 1165 Sep 26 22:03 client-rke2-cloud-controller.crt
-rw------- 1 root root  227 Sep 26 22:03 client-rke2-cloud-controller.key
-rw-r--r-- 1 root root 1153 Sep 26 22:03 client-rke2-controller.crt
-rw------- 1 root root  227 Sep 26 22:03 client-rke2-controller.key
-rw-r--r-- 1 root root 1153 Sep 26 22:03 client-scheduler.crt
-rw------- 1 root root  227 Sep 26 22:03 client-scheduler.key
-rw-r--r-- 1 root root 1189 Sep 26 22:03 client-supervisor.crt
-rw------- 1 root root  227 Sep 26 22:03 client-supervisor.key
-rw-r--r-- 1 root root 3019 Sep 26 22:03 dynamic-cert.json
drwxr-xr-x 2 root root 4096 Sep 26 22:03 etcd/
-rw-r--r-- 1 root root  595 Sep 26 22:03 request-header-ca.crt
-rw------- 1 root root  227 Sep 26 22:03 request-header-ca.key
-rw-r--r-- 1 root root  574 Sep 26 22:03 server-ca.crt
-rw------- 1 root root  227 Sep 26 22:03 server-ca.key
-rw-r--r-- 1 root root  574 Sep 26 22:33 server-ca.nochain.crt
-rw------- 1 root root 1679 Sep 26 22:33 service.current.key
-rw------- 1 root root 1679 Sep 26 22:03 service.key
-rw-r--r-- 1 root root 1400 Sep 26 22:03 serving-kube-apiserver.crt
-rw------- 1 root root  227 Sep 26 22:03 serving-kube-apiserver.key
-rw------- 1 root root  227 Sep 26 22:03 serving-kubelet.key
drwx------ 2 root root 4096 Sep 26 22:03 temporary-certs/

Rancher RKE2 Cluster Config: KubeVersion: v1.24.17+rke2r1 CNI: cilium Default Pod Security Policy: RKE2 Default Worker CIS Profile: cis-1.6

System Services: {CoreDNS, Metrics Server} Members: {…} ETCD: auto snapshots enabled (default)

Cluster CIDR: 10.42.0.0/16 Service CIDR: 10.43.0.0/16

Advanced.“Raise error if kernel parameters are different than the expected kubelet defaults”: true

I fixed the issue with disabling IPv6 IP-addresses on master nodes. When the first node was bootstrapped it tried to connect with two other node with IPv6 and failed. After disabling all works fine.

In my env, I need the dual-stack ip addresses.

You can check my repo where I’ve fixed the issue (WORKING DUAL STACK !!), if you intend to use this code as is, you have to change the rancher agent version to v2.7-head on line 40 (var VERSION) of the file cmd/agent/main.go