rancher: Restoring rancher from a backup on a migrated rancher server fails with the error unable to create new content in namespace cluster-fleet-default-anupamapostbkp because it is being terminated error restoring

Rancher Server Setup

Rancher version: v2.6.0-rc8
Installation option (Docker install/Helm Chart): Helm
- If Helm Chart, Kubernetes Info: RKE1 Cluster Type (RKE1, RKE2, k3s, EKS, etc): Node Setup: 3 nodes all roles Version: v1.21.4
Proxy/Cert Details: Self-signed

Information about the Cluster

Kubernetes version:
Cluster Type (Local/Downstream): Downstream node driver rke1 and rke2 clusters
- If downstream, what type of cluster? (Custom/Imported or specify provider for Hosted/Infrastructure Provider):

Describe the bug Restoring rancher server from the backup that is created on a rancher server that is migrated to a new rke cluster.

To Reproduce

On rancher server created an rke1 and rke2 node driver clusters [3 worker, 1 etcd, 1 cp ] in each cluster
Installed backup charts in the local cluster. version 2.0.0+up2.0.0-rc11
from continuous delivery, created a git repo - fleet1, url https://github.com/rancher/fleet-examples , path: multi-cluster/helm
Took a backup of the rancher server - bkp1
Created another git repo - fleet2 [url: https://github.com/rancher/fleet-examples , path: single-cluster/helm
Created another backup - bkp2 [will reuse after migration]
Created a new rke1 cluster and installed the rancher backup charts version 2.0.0+up2.0.0-rc11.
Restored the backup on the new rke cluster and migrated rancher to the new rke cluster. ref:https://rancher.com/docs/rancher/v2.x/en/backups/v2.5/migrating-rancher/
All clusters come up fine. verify no errors in fleet logs in the downstream clusters and the clusters in continuous delivery are all up and active.
Create a new rke2 cluster on this new rancher server.
Take a backup bkp3 once the cluster is up.
Perform restore from the backup - bkp2 [from the steps above]
All clusters come up fine. verify no errors in fleet logs in the downstream clusters and the clusters in continuous delivery are all up and active.
Perform another restore with bkp3.

Result

restore fails with the errors seen in the logs continuously in the backup resource rancher-backup-xxx.

- rancher server retries multiple times, after sometime the rancher server comes up and the cluster created after migration in the new rke cluster is stuck in `Wait Check-In` state on the cluster. Screen Shot 2021-08-27 at 2 11 49 PM

Expected Result Restore should be successful and no errors should be seen. Downstream clusters should come up active.

Additional Info:

Issue is seen for both rke1 and rke2 downstream clusters:

SURE-5358

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 19 (13 by maintainers)

Most upvoted comments

@anupama2501 just pushed https://github.com/rancherlabs/release-notes/pull/328/commits/c94295f8dbcac8bc79bd637452110ce61aa4328d to address your comment

martyav on Sep 25, 2023

cc @btat @martyav Release note for this ticket needs to be updated as well.

anupama2501 on Sep 25, 2023