rancher: local cluster upgrade does not work

I edited my local cluster and selected a newer version of Kubernetes in order to upgrade it. However, the upgrade did not happen and it’s now flashing in the cluster view as upgrading all the time. GIF:

J0F2bGAho2

Already tried to restart the Rancher server and the node hosting the Rancher itself but nothing helps. Also, since it’s “upgrading”, I can’t revert to an older version.

Useful	Info
Versions	Rancher `v2.5.0` UI: `v2.5.0`
Route	`undefined`

About this issue

Original URL
State: closed
Created 4 years ago
Reactions: 7
Comments: 18

Most upvoted comments

EDIT: Err I did the following (see below) and currently I can no longer access my main Rancher Cluster Overview. I can access other parts of Rancher through various URLs though. So yeah, it might be a bad idea. I see 2 problems in my logs. Fleet-system gitjob can’t be updated because another change has been in between and [ERROR] could not convert gke config to map - so proceed with caution.

Ok I managed to sort of fix / work around it. Note: This might have unforeseen consequences down the line (maybe somebody else can judge this)…

Steps:

Go to your local cluster
Go to Edit cluster
Set control plane concurrency and Worker concurrency to 1
Press save
Now you should be back at your local cluster
Go to View in API
In the top right corner of the API click on Edit
Edit the value of k3sConfig and set it to {"k3supgradeStrategy":{"drainServerNodes":false,"drainWorkerNodes":false,"serverConcurrency":1,"type":"/v3/schemas/clusterUpgradeStrategy","workerConcurrency":1},"kubernetesVersion":"v1.18.8+k3s1","type":"/v3/schemas/k3sConfig"}. (Or the version your k3s version used to be, you can still see the old version on the Dashboard).
Click Show Request
Click on Send Request
You should get a HTTP response 200

After that the upgrade message will stop flashing. However where there used to be Edit cluster / View in API etc. I now see “No actions available”. I am not sure if I broke something by doing these steps, or if this was the intended behavior since 2.5.2.

Other things I noticed: If you press save on editing the cluster at step 3 without setting them to 1, it’ll refuse because it can’t be empty or 0. Which is likely why simply editing the field in the API before didn’t work, because it refused the value for my k3sConfig. So it might be you only need steps 6 through 11, but I wrote down exactly what I did to be sure.

Vashiru on Jan 25, 2021

I meet the same problem, Is there anybody know how to solve it ?

hp-caoy on Nov 17, 2020

Same here, K3S v1.18.8+k3s1

moschlar on Jan 22, 2021

@maggieliu It appears that the fix for that isn’t fool proof. I run a single node docker install as well, currently running v2.5.5 and the option to upgrade was available for me… The local cluster was running K3s v1.18.8+k3s1, tried upgrading it to 1.19.5+k3s2 as the option was there and now it’s just blinking on and off. The API shows it as:

state": "upgrading",
"transitioning": "yes",
"transitioningMessage": "cluster is being upgraded",
"type": "cluster",
"uuid": "f42d4b29-9116-4552-8449-909xxxxxxx",

I tried editing the k3sConfig field in the API to see if I can revert it back but it won’t let me (error 422, unprocessable entity)

Vashiru on Jan 25, 2021

EDIT: Err I did the following (see below) and currently I can no longer access my main Rancher Cluster Overview. I can access other parts of Rancher through various URLs though. So yeah, it might be a bad idea. I see 2 problems in my logs. Fleet-system gitjob can’t be updated because another change has been in between and [ERROR] could not convert gke config to map - so proceed with caution. Ok I managed to sort of fix / work around it. Note: This might have unforeseen consequences down the line (maybe somebody else can judge this)… Steps:
1. Go to your local cluster

2. Go to Edit cluster

3. Set control plane concurrency and Worker concurrency to 1

4. Press save

5. Now you  should be back at your local cluster

6. Go to View in API

7. In the top right corner of the API click on Edit

8. Edit the value of k3sConfig and set it to `{"k3supgradeStrategy":{"drainServerNodes":false,"drainWorkerNodes":false,"serverConcurrency":1,"type":"/v3/schemas/clusterUpgradeStrategy","workerConcurrency":1},"kubernetesVersion":"v1.18.8+k3s1","type":"/v3/schemas/k3sConfig"}`. (Or the version your k3s version used to be, you can still see the old version on the Dashboard).

9. Click Show Request

10. Click on Send Request

11. You should get a HTTP response 200
After that the upgrade message will stop flashing. However where there used to be Edit cluster / View in API etc. I now see “No actions available”. I am not sure if I broke something by doing these steps, or if this was the intended behavior since 2.5.2. Other things I noticed: If you press save on editing the cluster at step 3 without setting them to 1, it’ll refuse because it can’t be empty or 0. Which is likely why simply editing the field in the API before didn’t work, because it refused the value for my k3sConfig. So it might be you only need steps 6 through 11, but I wrote down exactly what I did to be sure.
Fixed it for me. I do see the buttons though in latest rancher version v2.5.7.

Glad to hear my tinkering fixed it for you. I tested this recently in 2.5.8 and the upgrade button is no longer available.

I ran into the same issue and reverted my /var/lib/rancher directory to a pre-upgrade backup. I use a cronjob to stop the rancher container nightly and backup the /var/lib/rancher volume with restic over s3 via minio.

My question is. How do we upgrade k8s on a docker container? I assume there will eventually be a need to?

Nope. Rancher in a single node docker install uses K3s internally to set up a cluster. K3S is basically K8S in a binary. Upgrading that (when used as an actual cluster, NOT INSIDE THIS CONTAINER*) is nothing but stopping K3s, replacing the binary and restarting K3s. In this case you don’t have to worry about that as new versions of the Rancher Docker image will automatically be bundled with newer versions of K3s and thus upgrade the kubernetes version.

* For real, don’t go replacing the binary in this container. I’ve tried doing this on an older version of Rancher, it didn’t go over well. (Mind you that was a really old version of K3s and the reason I was attempting it is because that old version didn’t have automatic certificate renewal yet and I was dealing with expired certificates and running out of options).

As a side note: It is now supported (as of Rancher 2.5.x) to migrate from a Single Node Docker installation to Rancher installed in a cluster. In order to do so you’ll have to make a backup using the rancher-backup-operator following the instructions here: https://rancher.com/docs/rancher/v2.5/en/backups/back-up-rancher/ and restore it in a new cluster using: https://rancher.com/docs/rancher/v2.5/en/backups/migrating-rancher/

I plan on doing a more elaborate blog post / article on that over at the Suse Community (https://community.suse.com) and maybe somewhere else. But that won’t be ready for a few more weeks. Couple of important notes on it though:

When doing this, you MUST use Rancher 2.5.x. I highly recommend 2.5.8 or above as earlier versions in my experience had various issues installing fleet into the downstream clusters
rancher-backup-operator only has 2 supported storage modes out of the box: a PV or a S3 bucket. I’ve only been succesfull with the S3 route thus far. Reason being I tried using a HostPath PV => Which because Rancher runs on K3s inside docker, means HostPath turns into ‘inside the docker container’ => You can work around this by first bind mounting a directory to that directory inside the container (or get your backup out using docker cp). However I didn’t want to deal with figuring out how to do that in reverse for the restore just now. Will do that for my blog post though.
Very Important When setting up the new cluster your restore will live in: do NOT install Rancher first, if you do, you’ll run into issues. The docs note this, but not explicit enough imo.
When setting up the new cluster for your restore: cert-manager must be present before you run the last step of re-installing Rancher.
The restored cluster MUST use the same domain as the cluster had before. Changing the domain name of a Rancher installation is not currently supported. So make sure to update your DNS records to point to the new cluster.

Vashiru on May 25, 2021

Had the same issue this morning in rancher 2.5.6 : i wanted to upgrade k3s local cluster as it was proposed in the cluster “edit” properties… and after that, the message “cluster is being upgraded” was blinking every about 10sec 😦

I restored from a backup (i’m using the backup/restore feature of rancher scheduled to run every 6h), and after that the message (cluster upgrade) disappeared and i was able to use rancher again normally.

TiTidom-RC on Mar 6, 2021