rancher: [BUG] Certs and directory /var/lib/rancher/rke2/server/tls/kube-scheduler and /var/lib/rancher/rke2/server/tls/kube-controller-manager missing
Rancher Server Setup
- Rancher version: 2.7.3
- Installation option (Docker install/Helm Chart): HELM
- K3S v1.25.9+k3s1
Information about the Cluster
- Kubernetes version: v1.25.9+rke2r1
- Cluster Type (Downstream):
- If downstream, what type of cluster? (Custom/Imported or specify provider for Hosted/Infrastructure Provider):
- Custom
- Node VM OS Ubuntu 22.04
User Information
- What is the role of the user logged in? Admin
Describe the bug When creating a new custom cluster via Rancher’s UI, the nodes that I try to add are stuck in “reconcilling” sate.
To Reproduce
-
Create a new cluster on Rancher UI following documentation: https://ranchermanager.docs.rancher.com/reference-guides/cluster-configuration/rancher-server-configuration/rke2-cluster-configuration All parameters to default except: Kubernetes Version: v1.25.9+rke2r1 Container Network: Calico NGINX Ingress: Selected or not I had the same problem
-
Register a node Keep all roles selected in step 1 Select “Insecure” in step 2 Copy and paste the cmd to the first node
Result The node stay in “reconcilling” state with this message: “Waiting for probes: kube-apiserver, kube-controller-manager, kube-scheduler”
Command to obtain logs on the node:
sudo journalctl -u rancher-system-agent.service -f
Logs obtained:
May 24 13:26:56 s001kubed01 rancher-system-agent[16705]: time="2023-05-24T13:26:56Z" level=error msg="error loading CA cert for probe (kube-scheduler) /var/lib/rancher/rke2/server/tls/kube-scheduler/kube-scheduler.crt: open /var/lib/rancher/rke2/server/tls/kube-scheduler/kube-scheduler.crt: no such file or directory" May 24 13:26:56 s001kubed01 rancher-system-agent[16705]: time="2023-05-24T13:26:56Z" level=error msg="error while appending ca cert to pool for probe kube-scheduler" May 24 13:26:57 s001kubed01 rancher-system-agent[16705]: time="2023-05-24T13:26:57Z" level=error msg="error encountered during parsing of last run time: parsing time \"\" as \"Mon Jan _2 15:04:05 MST 2006\": cannot parse \"\" as \"Mon\"" May 24 13:26:57 s001kubed01 rancher-system-agent[16705]: time="2023-05-24T13:26:57Z" level=error msg="error loading CA cert for probe (kube-controller-manager) /var/lib/rancher/rke2/server/tls/kube-controller-manager/kube-controller-manager.crt: open /var/lib/rancher/rke2/server/tls/kube-controller-manager/kube-controller-manager.crt: no such file or directory" May 24 13:26:57 s001kubed01 rancher-system-agent[16705]: time="2023-05-24T13:26:57Z" level=error msg="error while appending ca cert to pool for probe kube-controller-manager" May 24 13:26:57 s001kubed01 rancher-system-agent[16705]: time="2023-05-24T13:26:57Z" level=error msg="error loading CA cert for probe (kube-scheduler) /var/lib/rancher/rke2/server/tls/kube-scheduler/kube-scheduler.crt: open /var/lib/rancher/rke2/server/tls/kube-scheduler/kube-scheduler.crt: no such file or directory" May 24 13:26:57 s001kubed01 rancher-system-agent[16705]: time="2023-05-24T13:26:57Z" level=error msg="error while appending ca cert to pool for probe kube-scheduler" May 24 13:27:01 s001kubed01 rancher-system-agent[16705]: time="2023-05-24T13:27:01Z" level=error msg="error encountered during parsing of last run time: parsing time \"\" as \"Mon Jan _2 15:04:05 MST 2006\": cannot parse \"\" as \"Mon\""
Directories “/var/lib/rancher/rke2/server/tls/kube-controller-manager” and “/var/lib/rancher/rke2/server/tls/kube-scheduler” are indeed missing. So are the CRTs (not even misplaced)
Expected Result The node get registered automatically
Thanks for your help 😄
About this issue
- Original URL
- State: open
- Created a year ago
- Reactions: 11
- Comments: 28 (1 by maintainers)
same issue on Rancher 2.7.9, Calico, 1.26.8+rke2r1
Hi, good news, we have find the root cause,
We had the options “noexec” on /var mountpoint in /etc/fstab, so some execution scripts could’nt be executed.
/dev/mapper/rootvg-var /var ext4 defaults,noexec,nosuid,nodev 1 2
This is because we are using already hardened RHEL iso version.
I fixed the issue with disabling IPv6 IP-addresses on master nodes. When the first node was bootstrapped it tried to connect with two other node with IPv6 and failed. After disabling all works fine.
The Problem still occurs and it is not a fstab error. Pretty urgend issue if so many have that problem
Rancher RKE2 Cluster Config: KubeVersion: v1.24.17+rke2r1 CNI: cilium Default Pod Security Policy: RKE2 Default Worker CIS Profile: cis-1.6
System Services: {CoreDNS, Metrics Server} Members: {…} ETCD: auto snapshots enabled (default)
Cluster CIDR: 10.42.0.0/16 Service CIDR: 10.43.0.0/16
Advanced.“Raise error if kernel parameters are different than the expected kubelet defaults”: true
You can check my repo where I’ve fixed the issue (WORKING DUAL STACK !!), if you intend to use this code as is, you have to change the rancher agent version to v2.7-head on line 40 (var VERSION) of the file cmd/agent/main.go