longhorn: [BUG] Uninstall on production cluster with 1.3.0-dev accidentally installed not possible
Describe the bug
A clear and concise description of what the bug is.
Calling helm delete longhorn -n longhorn-system does not remove longhorn
To Reproduce
Steps to reproduce the behavior:
- Install longhorn-1.3.0-dev from the main branch
- Remove longhorn using helm
- See error
time="2022-03-24T06:23:01Z" level=info msg="Found 1 engineimages remaining" controller=longhorn-uninstall
time="2022-03-24T06:23:01Z" level=warning msg="worker error" controller=longhorn-uninstall error="Failed to delete engine images: Failed to mark for deletion: admission webhook \"validator.longhorn.io\" denied the request: unable to delete the default engine image"
time="2022-03-24T06:23:01Z" level=info msg="Found 1 backuptargets remaining" controller=longhorn-uninstall
time="2022-03-24T06:23:01Z" level=info msg="Marked for deletion" controller=longhorn-uninstall cred= interval=5m0s url=
time="2022-03-24T06:23:01Z" level=info msg="Found 1 engineimages remaining" controller=longhorn-uninstall
time="2022-03-24T06:23:01Z" level=warning msg="worker error" controller=longhorn-uninstall error="Failed to delete engine images: Failed to mark for deletion: admission webhook \"validator.longhorn.io\" denied the request: unable to delete the default engine image"
time="2022-03-24T06:23:01Z" level=info msg="Found 1 backuptargets remaining" controller=longhorn-uninstall
time="2022-03-24T06:23:01Z" level=info msg="Marked for deletion" controller=longhorn-uninstall cred= interval=5m0s url=
time="2022-03-24T06:23:01Z" level=info msg="Found 1 backuptargets remaining" controller=longhorn-uninstall
time="2022-03-24T06:23:02Z" level=info msg="Marked for deletion" controller=longhorn-uninstall cred= interval=5m0s url=
time="2022-03-24T06:23:02Z" level=info msg="Found 1 engineimages remaining" controller=longhorn-uninstall
time="2022-03-24T06:23:02Z" level=warning msg="worker error" controller=longhorn-uninstall error="Failed to delete engine images: Failed to mark for deletion: admission webhook \"validator.longhorn.io\" denied the request: unable to delete the default engine image"
time="2022-03-24T06:23:02Z" level=info msg="Found 1 backuptargets remaining" controller=longhorn-uninstall
time="2022-03-24T06:23:02Z" level=info msg="Marked for deletion" controller=longhorn-uninstall cred= interval=5m0s url=
time="2022-03-24T06:23:02Z" level=info msg="Found 1 engineimages remaining" controller=longhorn-uninstall
time="2022-03-24T06:23:02Z" level=warning msg="worker error" controller=longhorn-uninstall error="Failed to delete engine images: Failed to mark for deletion: admission webhook \"validator.longhorn.io\" denied the request: unable to delete the default engine image"
time="2022-03-24T06:23:02Z" level=info msg="Found 1 backuptargets remaining" controller=longhorn-uninstall
time="2022-03-24T06:23:02Z" level=info msg="Marked for deletion" controller=longhorn-uninstall cred= interval=5m0s url=
time="2022-03-24T06:23:02Z" level=info msg="Found 1 backuptargets remaining" controller=longhorn-uninstall
time="2022-03-24T06:23:03Z" level=warning msg="worker error" controller=longhorn-uninstall error="Failed to delete backup targets: Failed to mark for deletion: backuptargets.longhorn.io \"default\" not found"
time="2022-03-24T06:23:03Z" level=info msg="Found 1 engineimages remaining" controller=longhorn-uninstall
time="2022-03-24T06:23:03Z" level=warning msg="worker error" controller=longhorn-uninstall error="Failed to delete engine images: Failed to mark for deletion: admission webhook \"validator.longhorn.io\" denied the request: unable to delete the default engine image"
time="2022-03-24T06:23:03Z" level=info msg="Found 1 backuptargets remaining" controller=longhorn-uninstall
time="2022-03-24T06:23:03Z" level=info msg="Marked for deletion" controller=longhorn-uninstall cred= interval=5m0s url=
time="2022-03-24T06:23:03Z" level=info msg="Found 1 engineimages remaining" controller=longhorn-uninstall
time="2022-03-24T06:23:03Z" level=warning msg="worker error" controller=longhorn-uninstall error="Failed to delete engine images: Failed to mark for deletion: admission webhook \"validator.longhorn.io\" denied the request: unable to delete the default engine image"
time="2022-03-24T06:23:03Z" level=info msg="Found 1 backuptargets remaining" controller=longhorn-uninstall
Expected behavior
Longhorn is removed
Log or Support bundle
If applicable, add the Longhorn managers’ log or support bundle when the issue happens. You can generate a Support Bundle using the link at the footer of the Longhorn UI.
Environment
- Longhorn version: 1.3.0-dev
- Installation method (e.g. Rancher Catalog App/Helm/Kubectl): helm
- Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: 1.22.8
- Number of management node in the cluster: 1
- Number of worker node in the cluster: 2
- Node config
- OS type and version: DGX Ubuntu
- CPU per node: 24
- Memory per node: 128Gb
- Disk type(e.g. SSD/NVMe): NVMe
- Network bandwidth between the nodes: 20Gb
- Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): Baremetal
- Number of Longhorn volumes in the cluster: 3
Additional context
We were getting an error in kubernetes 1.21.x that had been fixed in kubernetes 1.22.8
During the upgrade we were unable to drain the pool of longhorn nodes, longhorn became unstable.
We considered to uninstall and reinstall longhorn.
During the process we accidentally partially installed 1.3.0-dev over the top of 1.1.2 now we cannot remove either versions of longhorn as there are CRDs and Webhooks that are in a bad state for uninstall or reinstall to work.
Ideally we would like to run longhorn 1.2.4 with kubernetes 1.22.8
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 22 (7 by maintainers)
Refer to https://longhorn.io/docs/1.2.4/deploy/install/install-with-helm/ and install the stable version. 😃
All apologies
we did
kubectl delete job longhorn-uninstall -n longhorn-systemand got
job.batch "longhorn-uninstall" deletedthen we did
helm uninstall longhorn -n longhorn-systemand got
Thanks for everything, we owe you one!
Looking for a power cycle solution for now.
@samhodge-aiml
Looks you ran into a bug that we fixed in https://github.com/longhorn/longhorn-manager/pull/1279.
You can try the steps to uninstall Longhorn.
Stop the
helm uninstallprocess in error state.Clone https://github.com/longhorn/longhorn.git
To upgrade and replace the old
master-headimage with the latest one, please add/modifychart/templates/daemonset-sa.yaml,chart/templates/deployment-webhook.yaml,chart/templates/deployment-driver.yamlandchart/templates/uninstall-job.yamlinmasterbranch:imagePullPolicy: IfNotPresentwithimagePullPolicy: Alwaysdate: "{{ now | unixEpoch }}"as the description https://github.com/helm/helm/issues/5696#issuecomment-667935723Then, upgrade your cluster by
helm upgrade longhorn -n longhorn-system ./chartUninstall the longhorn by
helm delete longhorn -n longhorn-system