longhorn: [BUG] Upgrade failed through Rancher (2.5.5) UI from 1.2.0 -> 1.2.2

Describe the bug

The upgrade of Longhorn from version 1.2.0 to 1.2.2 failed with the following error message

Failed to install app longhorn-system. Error: UPGRADE FAILED: kind CustomResourceDefinition with the name "backups.longhorn.io" already exists in the cluster and wasn't defined in the previous release. Before upgrading, please either delete the resource from the cluster or remove it from the chart

To Reproduce

Steps to reproduce the behavior:

  1. Go to Apps
  2. Click on Longhorn Upgrade
  3. Review settings and start
  4. See error message above in the longhorn chart overview

Expected behavior

Expected that all workloads from longhorn will be updated to version 1.2.2

Log or Support bundle

If applicable, add the Longhorn managers’ log or support bundle when the issue happens. You can generate a Support Bundle using the link at the footer of the Longhorn UI.

Environment

  • Longhorn version: 1.2.0
  • Installation method (e.g. Rancher Catalog App/Helm/Kubectl): Rancher Catalog App
  • Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: RKE
    • Number of management node in the cluster: 7
    • Number of worker node in the cluster: 2
  • Node config
    • OS type and version: Ubuntu Server 20.04
    • CPU per node: 4
    • Memory per node: 64
    • Disk type(e.g. SSD/NVMe): SSD/HDD/NVMe
    • Network bandwidth between the nodes: 10G
  • Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): Baremetal/VM
  • Number of Longhorn volumes in the cluster: 26

Additional context

Add any other context about the problem here.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 5
  • Comments: 18 (6 by maintainers)

Most upvoted comments

@ajacques Thank you, your post relates to Rancher 2.6, I still use the old version 2.5.5. Will your description also match for the old Rancher version 2.5.5?

This allows you to upgrade to Helm 3 (it’s actually not related to Rancher 2.6 or Rancher 2.5, but I ended up doing this because Rancher 2.6 removed Helm 2, but ultimately it’s a similar problem). I assumed this is your problem because this is the same error I got when I tried to switch my Helm 2 version of Longhorn to Helm 3.

Right, but it also means that the Longhorn CRs would be deleted as well. Then, the Longhorn cluster will broken.

My proposed fix does not delete the CRDs. Note the commands do not delete any resources. It modifies the CRDs so you can upgrade in place. Following this, I was able to continue using Longhorn without deleting any volumes and after upgrading PVCs and restarting pods, it worked perfectly.

@ajacques Thank you so much!! With option 3 I was able to upgrade longhorn and after that successfully update rancher to the latest version and I can still manage everything correctly 🥳

I am not sure about that. As @ajacques mentioned in his post, I changed the labels and annotations. In his post, the name of the app is longhorn. Now, when I reinstall the App over the marketplace, but the old app has the name longhorn-system, I think this could lead to a double deployed application with different names. But I did not try this yet because the cluster is productive.

If you run

helm list --namespace=longhorn-system

Does it show a release named longhorn-system and longhorn-crds?

There’s three options:

  1. The safe option - Do nothing
  2. Rename the Helm release from longhorn-system to longhorn using something like this
  3. Continue using the longhorn-system release by changing the annotate -n longhorn-system % meta.helm.sh/release-name=longhorn parts in my script to annotate -n longhorn-system % meta.helm.sh/release-name=longhorn-system

On further reading of your GitHub issue it seems like your issue was that the CRDs were not created using Helm, but the Longhorn application itself was? If so, then option 3 should be your best bet and can be used to undo the part that is currently broken and should permit upgrades to continue:

kubectl --context=$KUBE_CONTEXT get psp longhorn-psp -o name | xargs -I % kubectl --context=$KUBE_CONTEXT annotate % meta.helm.sh/release-name=longhorn-system
kubectl --context=$KUBE_CONTEXT -n longhorn-system get configMap,service,ds,deploy,serviceaccount,role,rolebinding -o name  | xargs -I % kubectl --context=$KUBE_CONTEXT -n longhorn-system annotate % meta.helm.sh/release-name=longhorn-system
kubectl --context=$KUBE_CONTEXT get clusterrole,clusterrolebinding -o name | grep longhorn   | xargs -I % kubectl --context=$KUBE_CONTEXT annotate % meta.helm.sh/release-name=longhorn-system

Changing the annotation doesn’t do much other than tell Helm that yes it’s okay to touch those resources.

If you delete the CRDs, doesn’t that mean you have to delete the K8s resources matching the CRDs too which would mean you lose PVCs?

Instead of deleting CRDs, I figured out a way to upgrade to Helm 3 in place and wrote a blog post on how I upgraded it here without deleting any data or CRDs. Ultimately, I used kubectl change the labels and annotations. Then used Helm to deploy over top of the existing resources and it worked. I think this should be the same fix for your case.

KUBE_CONTEXT=prod

kubectl --context=$KUBE_CONTEXT get crds -o name | grep longhorn | xargs -I % kubectl --context=$KUBE_CONTEXT label --overwrite -n longhorn-system % app.kubernetes.io/managed-by=Helm

kubectl --context=$KUBE_CONTEXT get crds -o name | grep longhorn | xargs -I % kubectl --context=$KUBE_CONTEXT annotate -n longhorn-system % meta.helm.sh/release-name=longhorn-crd

kubectl --context=$KUBE_CONTEXT get crds -o name | grep longhorn | xargs -I % kubectl --context=$KUBE_CONTEXT annotate -n longhorn-system % meta.helm.sh/release-namespace=longhorn-system

kubectl --context=$KUBE_CONTEXT get psp longhorn-psp -o name | xargs -I % kubectl --context=$KUBE_CONTEXT annotate % meta.helm.sh/release-name=longhorn
kubectl --context=$KUBE_CONTEXT get psp longhorn-psp -o name | xargs -I % kubectl --context=$KUBE_CONTEXT annotate % meta.helm.sh/release-namespace=longhorn-system
kubectl --context=$KUBE_CONTEXT get psp longhorn-psp -o name | xargs -I % kubectl --context=$KUBE_CONTEXT label --overwrite -n longhorn-system % app.kubernetes.io/managed-by=Helm

kubectl --context=$KUBE_CONTEXT -n longhorn-system get configMap,service,ds,deploy,serviceaccount,role,rolebinding -o name  | xargs -I % kubectl --context=$KUBE_CONTEXT label --overwrite -n longhorn-system % app.kubernetes.io/managed-by=Helm

kubectl --context=$KUBE_CONTEXT -n longhorn-system get configMap,service,ds,deploy,serviceaccount,role,rolebinding -o name  | xargs -I % kubectl --context=$KUBE_CONTEXT -n longhorn-system annotate % meta.helm.sh/release-name=longhorn

kubectl --context=$KUBE_CONTEXT -n longhorn-system get configMap,service,ds,deploy,serviceaccount,role,rolebinding -o name  | xargs -I % kubectl --context=$KUBE_CONTEXT -n longhorn-system annotate % meta.helm.sh/release-namespace=longhorn-system

kubectl --context=$KUBE_CONTEXT get clusterrole,clusterrolebinding -o name | grep longhorn   | xargs -I % kubectl --context=$KUBE_CONTEXT label --overwrite -n longhorn-system % app.kubernetes.io/managed-by=Helm

kubectl --context=$KUBE_CONTEXT get clusterrole,clusterrolebinding -o name | grep longhorn   | xargs -I % kubectl --context=$KUBE_CONTEXT annotate % meta.helm.sh/release-namespace=longhorn-system

kubectl --context=$KUBE_CONTEXT get clusterrole,clusterrolebinding -o name | grep longhorn   | xargs -I % kubectl --context=$KUBE_CONTEXT annotate % meta.helm.sh/release-name=longhorn

In your blog post, you say “deploy the Longhorn application”. Do you mean, that I need to redeploy the Chart from the marketplace instead of updating the old Longhorn App?

Yes, you’re going to deploy it from the Marketplace since that’ll ensure it gets installed with Helm v3.

label validation error: key “app.kubernetes.io/managed-by” must equal “Helm”: current value is “Tiller”

This can be fixed by running (and I’ve updated the doc):

kubectl --context=$KUBE_CONTEXT get psp longhorn-psp -o name | xargs -I % kubectl --context=$KUBE_CONTEXT label --overwrite -n longhorn-system % app.kubernetes.io/managed-by=Helm

key “meta.helm.sh/release-name” must equal “longhorn-system”: current value is “longhorn”

However, this error is interesting. AFAIK, the release-name was supposed to be ‘longhorn’, not ‘longhorn-system’, as per 1, 2, and 3 It’s easy to fix, but maybe somebody else knows if this is correct or not?