rancher: [BUG] RKE2 Downstream Clusters Not Coming Up After Rancher Migration

Rancher Server Setup

Rancher version: v2.6.9 && v2.7.0
Installation option (Docker install/Helm Chart): Helm
- If Helm Chart, Kubernetes Cluster and version (RKE1, RKE2, k3s, EKS, etc): RKE2
Proxy/Cert Details: byo-valid

Information about the Cluster

Kubernetes version: v1.24.8+rke2r1
Cluster Type (Local/Downstream): Downstream
- If downstream, what type of cluster? (Custom/Imported or specify provider for Hosted/Infrastructure Provider): AWS

User Information

What is the role of the user logged in? (Admin/Cluster Owner/Cluster Member/Project Owner/Project Member/Custom): Admin

Describe the bug When migrating rancher servers from one HA to another HA using Rancher Backups and Restore, the RKE2 downstream cluster is not coming back up. The status is stuck at Updating with the following message: Configuring bootstrap node(s) <redacted>: waiting for plan to be applied

The RKE2 version I used for both Rancher versions is v1.24.8+rke2r1 and after changing the version to the default RKE2 version for each Rancher version (v1.24.4+rke2r1 && v1.24.6+rke2r1 respectively) and redeploying all workloads, the RKE2 cluster for the v2.6.9 instance came up and was Active.

The v2.7.0 RKE2 cluster is still showing the same status and message.

To Reproduce

Deploy a Rancher HA instance on v2.6.9 && v2.7.0
Create an AWS RKE1 downstream cluster (3 workers, 1 control plane, 1 etcd) using v1.24.8 as the RKE version
Create an AWS RKE2 downstream cluster (3 workers, 2 control plane, 3 etcd) using v1.24.8+rke2r1 as the RKE version
Wait for the downstream clusters to be Active
Install the Rancher Backups chart on your local cluster
Create a backup in your preferred storage (I use an AWS S3 bucket)
Bring up a new HA and point the load balancer to the new HA
Use the backup to restore onto the new HA
Install Rancher with the same version as the original HA
Go to Cluster Management
Check cluster statuses

Result The RKE2 downstream cluster did not come back to the Active status after the restore/migration

Expected Result

The RKE2 downstream cluster does come back to the Active after the restore/migration

Screenshots

Showing the Status and Message:

Machine Pool:

About this issue

Original URL
State: closed
Created a year ago
Comments: 19 (6 by maintainers)

Most upvoted comments

@eliyamlevy , duplicating offline conversation, please rollback rancher/backup-restore-operator#293 as we don’t want to backup *machine-plan-token secrets (or any secrets of kubernetes.io/service-account-token type) as these are service account token that won’t be valid if restored on a new cluster as they are tied to service account ID that will be different on new cluster.

@snasovich reverted in most recent rcs for 2.6 and 2.7

eliyamlevy on Feb 27, 2023