harvester: [BUG] Harvester single node upgrade will get `another operation (install/upgrade/rollback) is in progress` error
Describe the bug
Harvester single node upgrade will encounter another operation (install/upgrade/rollback) is in progress error after node reboot. Therefore, it will block the next Harvester upgrade or managedChart update. Potentially related to https://github.com/helm/helm/issues/8987#issuecomment-786149813
To Reproduce Steps to reproduce the behavior:
- Install a Harvester cluster with an old version, e.g., v1.1.1
- upgrade the Harvester cluster to a newer version, e.g., v1.1.2-head.iso
- after the upgrade is complete, the upgrade status shows success, then check the
harvestermanagedChart status and it will contain an error of:
conditions:
- lastUpdateTime: "2023-03-08T05:42:32Z"
message: 'ErrApplied(1) [Cluster fleet-local/local: another operation (install/upgrade/rollback)
is in progress]; daemonset.apps harvester-system/kube-vip [progressing] Available:
0/1; kubevirt.kubevirt.io harvester-system/kubevirt [progressing] Deployin
Expected behavior single node upgrade should not contain the above error.
Support bundle
Environment
- Harvester ISO version: v1.1.1 upgrade
- Underlying Infrastructure (e.g. Baremetal with Dell PowerEdge R630):
Additional context Workaround can be referred to https://github.com/helm/helm/issues/8987#issuecomment-786149813; however, ensure the rollback version is correct.
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 25 (14 by maintainers)
The workaround is to roll back the problematic chart.
First, we need to get the helm release name and namespace of a bundle:
We know the problematic chart is mcc-harvester-crd. Then we can get the bundle’s chart and namespace with:
Then, check if the previous revision is sane:
Then roll back the chart:
And check if the bundle becomes Ready again:
Note, you can download helm here: https://github.com/helm/helm/releases/tag/v3.11.3
We’ll use the issue https://github.com/harvester/harvester/issues/3675 in 1.2.0 to track upstream issue https://github.com/rancher/fleet/issues/637
@irishgordo https://github.com/harvester/harvester/pull/3643 reduce the chance of seeing the issue for Harvester-managed charts. fleet-agent-local chart is still out of our control. We should have a document to advise rollbacking the chart if this happens. cc @w13915984028
@irishgordo Yes. please help test with rc4. This should be quite easy to reproduce with rc3 (single node)…
Attempt to fix by scaling fleet agent replicas to 0 before upgrading RKE2 and scale it back to 1 later: https://github.com/harvester/harvester/pull/3641