longhorn: [BUG] Uninstall on production cluster with 1.3.0-dev accidentally installed not possible

Describe the bug

A clear and concise description of what the bug is.

Calling helm delete longhorn -n longhorn-system does not remove longhorn

To Reproduce

Steps to reproduce the behavior:

  1. Install longhorn-1.3.0-dev from the main branch
  2. Remove longhorn using helm
  3. See error
time="2022-03-24T06:23:01Z" level=info msg="Found 1 engineimages remaining" controller=longhorn-uninstall
time="2022-03-24T06:23:01Z" level=warning msg="worker error" controller=longhorn-uninstall error="Failed to delete engine images: Failed to mark for deletion: admission webhook \"validator.longhorn.io\" denied the request: unable to delete the default engine image"
time="2022-03-24T06:23:01Z" level=info msg="Found 1 backuptargets remaining" controller=longhorn-uninstall
time="2022-03-24T06:23:01Z" level=info msg="Marked for deletion" controller=longhorn-uninstall cred= interval=5m0s url=
time="2022-03-24T06:23:01Z" level=info msg="Found 1 engineimages remaining" controller=longhorn-uninstall
time="2022-03-24T06:23:01Z" level=warning msg="worker error" controller=longhorn-uninstall error="Failed to delete engine images: Failed to mark for deletion: admission webhook \"validator.longhorn.io\" denied the request: unable to delete the default engine image"
time="2022-03-24T06:23:01Z" level=info msg="Found 1 backuptargets remaining" controller=longhorn-uninstall
time="2022-03-24T06:23:01Z" level=info msg="Marked for deletion" controller=longhorn-uninstall cred= interval=5m0s url=
time="2022-03-24T06:23:01Z" level=info msg="Found 1 backuptargets remaining" controller=longhorn-uninstall
time="2022-03-24T06:23:02Z" level=info msg="Marked for deletion" controller=longhorn-uninstall cred= interval=5m0s url=
time="2022-03-24T06:23:02Z" level=info msg="Found 1 engineimages remaining" controller=longhorn-uninstall
time="2022-03-24T06:23:02Z" level=warning msg="worker error" controller=longhorn-uninstall error="Failed to delete engine images: Failed to mark for deletion: admission webhook \"validator.longhorn.io\" denied the request: unable to delete the default engine image"
time="2022-03-24T06:23:02Z" level=info msg="Found 1 backuptargets remaining" controller=longhorn-uninstall
time="2022-03-24T06:23:02Z" level=info msg="Marked for deletion" controller=longhorn-uninstall cred= interval=5m0s url=
time="2022-03-24T06:23:02Z" level=info msg="Found 1 engineimages remaining" controller=longhorn-uninstall
time="2022-03-24T06:23:02Z" level=warning msg="worker error" controller=longhorn-uninstall error="Failed to delete engine images: Failed to mark for deletion: admission webhook \"validator.longhorn.io\" denied the request: unable to delete the default engine image"
time="2022-03-24T06:23:02Z" level=info msg="Found 1 backuptargets remaining" controller=longhorn-uninstall
time="2022-03-24T06:23:02Z" level=info msg="Marked for deletion" controller=longhorn-uninstall cred= interval=5m0s url=
time="2022-03-24T06:23:02Z" level=info msg="Found 1 backuptargets remaining" controller=longhorn-uninstall
time="2022-03-24T06:23:03Z" level=warning msg="worker error" controller=longhorn-uninstall error="Failed to delete backup targets: Failed to mark for deletion: backuptargets.longhorn.io \"default\" not found"
time="2022-03-24T06:23:03Z" level=info msg="Found 1 engineimages remaining" controller=longhorn-uninstall
time="2022-03-24T06:23:03Z" level=warning msg="worker error" controller=longhorn-uninstall error="Failed to delete engine images: Failed to mark for deletion: admission webhook \"validator.longhorn.io\" denied the request: unable to delete the default engine image"
time="2022-03-24T06:23:03Z" level=info msg="Found 1 backuptargets remaining" controller=longhorn-uninstall
time="2022-03-24T06:23:03Z" level=info msg="Marked for deletion" controller=longhorn-uninstall cred= interval=5m0s url=
time="2022-03-24T06:23:03Z" level=info msg="Found 1 engineimages remaining" controller=longhorn-uninstall
time="2022-03-24T06:23:03Z" level=warning msg="worker error" controller=longhorn-uninstall error="Failed to delete engine images: Failed to mark for deletion: admission webhook \"validator.longhorn.io\" denied the request: unable to delete the default engine image"
time="2022-03-24T06:23:03Z" level=info msg="Found 1 backuptargets remaining" controller=longhorn-uninstall

Expected behavior

Longhorn is removed

Log or Support bundle

If applicable, add the Longhorn managers’ log or support bundle when the issue happens. You can generate a Support Bundle using the link at the footer of the Longhorn UI.

Environment

  • Longhorn version: 1.3.0-dev
  • Installation method (e.g. Rancher Catalog App/Helm/Kubectl): helm
  • Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: 1.22.8
    • Number of management node in the cluster: 1
    • Number of worker node in the cluster: 2
  • Node config
    • OS type and version: DGX Ubuntu
    • CPU per node: 24
    • Memory per node: 128Gb
    • Disk type(e.g. SSD/NVMe): NVMe
    • Network bandwidth between the nodes: 20Gb
  • Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): Baremetal
  • Number of Longhorn volumes in the cluster: 3

Additional context

We were getting an error in kubernetes 1.21.x that had been fixed in kubernetes 1.22.8

During the upgrade we were unable to drain the pool of longhorn nodes, longhorn became unstable.

We considered to uninstall and reinstall longhorn.

During the process we accidentally partially installed 1.3.0-dev over the top of 1.1.2 now we cannot remove either versions of longhorn as there are CRDs and Webhooks that are in a bad state for uninstall or reinstall to work.

Ideally we would like to run longhorn 1.2.4 with kubernetes 1.22.8

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 22 (7 by maintainers)

Most upvoted comments

Refer to https://longhorn.io/docs/1.2.4/deploy/install/install-with-helm/ and install the stable version. 😃

All apologies

we did

kubectl delete job longhorn-uninstall -n longhorn-system

and got

job.batch "longhorn-uninstall" deleted

then we did

helm uninstall longhorn -n longhorn-system

and got

W0329 17:30:06.659750 1868628 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
release "longhorn" uninstalled

Thanks for everything, we owe you one!

Looking for a power cycle solution for now.

@samhodge-aiml

Looks you ran into a bug that we fixed in https://github.com/longhorn/longhorn-manager/pull/1279.

You can try the steps to uninstall Longhorn.

  1. Stop the helm uninstall process in error state.

  2. Clone https://github.com/longhorn/longhorn.git

  3. To upgrade and replace the old master-head image with the latest one, please add/modify chart/templates/daemonset-sa.yaml, chart/templates/deployment-webhook.yaml, chart/templates/deployment-driver.yaml and chart/templates/uninstall-job.yaml in master branch:

  4. Then, upgrade your cluster by helm upgrade longhorn -n longhorn-system ./chart

  5. Uninstall the longhorn by helm delete longhorn -n longhorn-system