rancher: [BUG] RKE2 Downstream Clusters Not Coming Up After Rancher Migration

Rancher Server Setup

  • Rancher version: v2.6.9 && v2.7.0
  • Installation option (Docker install/Helm Chart): Helm
    • If Helm Chart, Kubernetes Cluster and version (RKE1, RKE2, k3s, EKS, etc): RKE2
  • Proxy/Cert Details: byo-valid

Information about the Cluster

  • Kubernetes version: v1.24.8+rke2r1
  • Cluster Type (Local/Downstream): Downstream
    • If downstream, what type of cluster? (Custom/Imported or specify provider for Hosted/Infrastructure Provider): AWS

User Information

  • What is the role of the user logged in? (Admin/Cluster Owner/Cluster Member/Project Owner/Project Member/Custom): Admin

Describe the bug When migrating rancher servers from one HA to another HA using Rancher Backups and Restore, the RKE2 downstream cluster is not coming back up. The status is stuck at Updating with the following message: Configuring bootstrap node(s) <redacted>: waiting for plan to be applied

The RKE2 version I used for both Rancher versions is v1.24.8+rke2r1 and after changing the version to the default RKE2 version for each Rancher version (v1.24.4+rke2r1 && v1.24.6+rke2r1 respectively) and redeploying all workloads, the RKE2 cluster for the v2.6.9 instance came up and was Active.

The v2.7.0 RKE2 cluster is still showing the same status and message.

To Reproduce

  1. Deploy a Rancher HA instance on v2.6.9 && v2.7.0
  2. Create an AWS RKE1 downstream cluster (3 workers, 1 control plane, 1 etcd) using v1.24.8 as the RKE version
  3. Create an AWS RKE2 downstream cluster (3 workers, 2 control plane, 3 etcd) using v1.24.8+rke2r1 as the RKE version
  4. Wait for the downstream clusters to be Active
  5. Install the Rancher Backups chart on your local cluster
  6. Create a backup in your preferred storage (I use an AWS S3 bucket)
  7. Bring up a new HA and point the load balancer to the new HA
  8. Use the backup to restore onto the new HA
  9. Install Rancher with the same version as the original HA
  10. Go to Cluster Management
  11. Check cluster statuses

Result The RKE2 downstream cluster did not come back to the Active status after the restore/migration

Expected Result

The RKE2 downstream cluster does come back to the Active after the restore/migration

Screenshots

Showing the Status and Message: image

Machine Pool: image

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 19 (6 by maintainers)

Most upvoted comments

@eliyamlevy , duplicating offline conversation, please rollback rancher/backup-restore-operator#293 as we don’t want to backup *machine-plan-token secrets (or any secrets of kubernetes.io/service-account-token type) as these are service account token that won’t be valid if restored on a new cluster as they are tied to service account ID that will be different on new cluster.

@snasovich reverted in most recent rcs for 2.6 and 2.7