rancher: [BUG] Etcd restore does not work on an RKE2 cluster

Rancher Server Setup

  • Rancher version: 2.8-head commit id: d101c27
  • Installation option (Docker install/Helm Chart): Docker Install

Information about the Cluster

  • Kubernetes version: 1.27.5+rke2r1 to v1.26.8+rke2r1 RKE2
  • Cluster Type (Local/Downstream): AWS Node driver cluster

User Information

  • What is the role of the user logged in? (Admin/Cluster Owner/Cluster Member/Project Owner/Project Member/Custom) Standard

Describe the bug [BUG] Etcd restore does not work on an RKE2 cluster

To Reproduce

  • Deploy a downstream RKE2 node driver cluster on 1.26 RKE2 version

  • Take an etcd snapshot

  • Upgrade to 1.27 RKE2 version

  • Restore using All options - config, k8s and etcd option to the snapshot taken previously

  • Cluster is stuck in Updating state error: [INFO ] configuring etcd node(s) rke2-backup-restore-etcd-5fb5f775c6x9hzcw-rwkwm: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd

  • rancher prov logs:

[INFO ] provisioning done
--
4:52:48 pm | [INFO ] configuring bootstrap node(s) rke2-backup-restore-etcd-5fb5f775c6x9hzcw-7rkvw: waiting for plan to be applied
4:52:54 pm | [INFO ] configuring bootstrap node(s) rke2-backup-restore-etcd-5fb5f775c6x9hzcw-7rkvw: waiting for probes: kubelet
4:53:30 pm | [INFO ] configuring bootstrap node(s) rke2-backup-restore-etcd-5fb5f775c6x9hzcw-7rkvw: waiting for probes: etcd
4:53:36 pm | [INFO ] configuring bootstrap node(s) rke2-backup-restore-etcd-5fb5f775c6x9hzcw-7rkvw: waiting for kubelet to update
4:53:46 pm | [INFO ] configuring etcd node(s) rke2-backup-restore-etcd-5fb5f775c6x9hzcw-q89sr,rke2-backup-restore-etcd-5fb5f775c6x9hzcw-rwkwm
4:54:28 pm | [INFO ] configuring etcd node(s) rke2-backup-restore-etcd-5fb5f775c6x9hzcw-rwkwm: waiting for probes: kubelet
4:54:46 pm | [INFO ] configuring etcd node(s) rke2-backup-restore-etcd-5fb5f775c6x9hzcw-rwkwm: waiting for probes: etcd, kubelet
4:54:50 pm | [INFO ] configuring etcd node(s) rke2-backup-restore-etcd-5fb5f775c6x9hzcw-rwkwm: waiting for probes: etcd
4:54:56 pm | [INFO ] configuring etcd node(s) rke2-backup-restore-etcd-5fb5f775c6x9hzcw-rwkwm: waiting for kubelet to update
4:55:04 pm | [INFO ] configuring control plane node(s) rke2-backup-restore-cp-8675c69865x58z9h-b4cb4,rke2-backup-restore-cp-8675c69865x58z9h-zstx8
4:56:10 pm | [INFO ] configuring control plane node(s) rke2-backup-restore-cp-8675c69865x58z9h-zstx8: waiting for plan to be applied
4:56:16 pm | [INFO ] configuring control plane node(s) rke2-backup-restore-cp-8675c69865x58z9h-b4cb4,rke2-backup-restore-cp-8675c69865x58z9h-zstx8
4:56:20 pm | [INFO ] configuring control plane node(s) rke2-backup-restore-cp-8675c69865x58z9h-zstx8: waiting for probes: kubelet
4:56:44 pm | [INFO ] configuring control plane node(s) rke2-backup-restore-cp-8675c69865x58z9h-zstx8: waiting for probes: kube-apiserver, kubelet
4:56:54 pm | [INFO ] configuring control plane node(s) rke2-backup-restore-cp-8675c69865x58z9h-zstx8: waiting for probes: kube-apiserver, kube-controller-manager, kube-scheduler, kubelet
4:56:58 pm | [INFO ] configuring control plane node(s) rke2-backup-restore-cp-8675c69865x58z9h-zstx8: waiting for probes: kube-apiserver, kube-controller-manager, kube-scheduler
4:57:12 pm | [INFO ] configuring control plane node(s) rke2-backup-restore-cp-8675c69865x58z9h-zstx8: waiting for probes: kube-apiserver, kube-controller-manager
4:57:18 pm | [INFO ] configuring worker node(s) rke2-backup-restore-wk-56df7d58b5xb6ffp-4dhrj,rke2-backup-restore-wk-56df7d58b5xb6ffp-5kcbn,rke2-backup-restore-wk-56df7d58b5xb6ffp-ltbld
4:57:34 pm | [INFO ] configuring worker node(s) rke2-backup-restore-wk-56df7d58b5xb6ffp-5kcbn,rke2-backup-restore-wk-56df7d58b5xb6ffp-ltbld
4:57:54 pm | [INFO ] configuring worker node(s) rke2-backup-restore-wk-56df7d58b5xb6ffp-ltbld: waiting for plan to be applied
4:57:58 pm | [INFO ] configuring worker node(s) rke2-backup-restore-wk-56df7d58b5xb6ffp-ltbld: waiting for probes: kubelet
4:58:08 pm | [INFO ] configuring worker node(s) rke2-backup-restore-wk-56df7d58b5xb6ffp-ltbld: waiting for kubelet to update
4:58:46 pm | [INFO ] rke2-backup-restore-wk-56df7d58b5xb6ffp-4dhrj,rke2-backup-restore-wk-56df7d58b5xb6ffp-5kcbn,rke2-backup-restore-wk-56df7d58b5xb6ffp-ltbld
4:58:48 pm | [INFO ] provisioning done
5:01:26 pm | [INFO ] refreshing etcd restore state
5:01:28 pm | [INFO ] waiting to stop rke2 services on node [rke2-backup-restore-cp-bfd6beba-nz8l2]
5:01:30 pm | [INFO ] waiting to stop rke2 services on node [rke2-backup-restore-wk-e548aa18-h5k2c]
5:01:32 pm | [INFO ] waiting for etcd restore
5:02:16 pm | [INFO ] waiting for etcd restore probes
5:02:54 pm | [INFO ] waiting for etcd restore
5:05:02 pm | [INFO ] waiting for etcd restore probes
5:05:16 pm | [INFO ] refreshing etcd restore state
5:05:18 pm | [INFO ] configuring etcd node(s) rke2-backup-restore-etcd-5fb5f775c6x9hzcw-7rkvw,rke2-backup-restore-etcd-5fb5f775c6x9hzcw-rwkwm
5:05:34 pm | [INFO ] configuring etcd node(s) rke2-backup-restore-etcd-5fb5f775c6x9hzcw-rwkwm: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for plan to be applied
5:05:36 pm | [INFO ] configuring etcd node(s) rke2-backup-restore-etcd-5fb5f775c6x9hzcw-rwkwm: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd, kubelet
5:05:46 pm | [INFO ] configuring control plane node(s) rke2-backup-restore-cp-8675c69865x58z9h-b4cb4,rke2-backup-restore-cp-8675c69865x58z9h-zstx8
5:05:56 pm | [INFO ] configuring control plane node(s) rke2-backup-restore-cp-8675c69865x58z9h-zstx8: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for plan to be applied
5:05:58 pm | [INFO ] configuring etcd node(s) rke2-backup-restore-etcd-5fb5f775c6x9hzcw-rwkwm: Node condition MemoryPressure is Unknown. Node condition DiskPressure is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown., waiting for probes: etcd


Note:

  • On an RKE1 cluster, this use case works. No issues seen.
  • On an rke2 cluster - Cluster upgrade from 1.26.8+rke2r1 to 1.27.5+rke2r1 works but the restore to snapshot taken on 1.26 fails.

About this issue

  • Original URL
  • State: closed
  • Created 9 months ago
  • Comments: 17 (16 by maintainers)

Most upvoted comments

@felipe-colussi I tested this on v2.7.8 an a v1.25.13+rke2r1 rke2 cluster, and got that same error. Also I ran the same test on k3s v1.25.13+k3s1 and it worked fine.