longhorn: [BUG] Upgrade to 1.5.0 failed: validator.longhorn.io denied the request if having orphan resources
Describe the bug (š if you encounter this issue)
Upgrade to v1.5.0 failed with the error:
Error starting manager: upgrade resources failed: admission webhook \"validator.longhorn.io\" denied the request: orphan orphan-024ed10f75415525327901b184c46b279fa24fdab23c89ea80e9f6ea7be50c83 spec fields are immutable
To Reproduce
Steps to reproduce the behavior:
Upgrade Longhorn from v1.4.2 to v.1.5.0 via Helm
Expected behavior
Successful upgrade to v1.5.0
Log or Support bundle
Logs from longhorn-manager:
time="2023-07-07T07:04:13Z" level=info msg="Starting longhorn conversion webhook server"
W0707 07:04:13.804064 1 client_config.go:618] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
time="2023-07-07T07:04:13Z" level=info msg="Waiting for conversion webhook to become ready"
time="2023-07-07T07:04:13Z" level=warning msg="Failed to get webhook health endpoint https://localhost:9501/v1/healthz" error="Get \"https://localhost:9501/v1/healthz\": dial tcp 127.0.0.1:9501: connect: connection refused"
time="2023-07-07T07:04:13Z" level=info msg="Active TLS secret longhorn-webhook-tls (ver=2687) (count 2): map[listener.cattle.io/cn-longhorn-admission-webhook.longhorn.svc:longhorn-admission-webhook.longhorn.svc listener.cattle.io/cn-longhorn-conversion-webhook.longhorn.svc:longhorn-conversion-webhook.longhorn.svc listener.cattle.io/fingerprint:SHA1=5722A8DEA1DC17BCBDFA7D07C25EB1C0DBB6C4F3]"
time="2023-07-07T07:04:13Z" level=info msg="Listening on :9501"
time="2023-07-07T07:04:15Z" level=info msg="Starting apiextensions.k8s.io/v1, Kind=CustomResourceDefinition controller"
time="2023-07-07T07:04:15Z" level=info msg="Starting /v1, Kind=Secret controller"
time="2023-07-07T07:04:15Z" level=info msg="Starting apiregistration.k8s.io/v1, Kind=APIService controller"
time="2023-07-07T07:04:15Z" level=info msg="Building conversion rules..."
time="2023-07-07T07:04:15Z" level=info msg="Updating TLS secret for longhorn-webhook-tls (count: 2): map[listener.cattle.io/cn-longhorn-admission-webhook.longhorn.svc:longhorn-admission-webhook.longhorn.svc listener.cattle.io/cn-longhorn-conversion-webhook.longhorn.svc:longhorn-conversion-webhook.longhorn.svc listener.cattle.io/fingerprint:SHA1=5722A8DEA1DC17BCBDFA7D07C25EB1C0DBB6C4F3]"
time="2023-07-07T07:04:15Z" level=info msg="Webhook conversion is ready"
time="2023-07-07T07:04:15Z" level=warning msg="Started longhorn conversion webhook server"
time="2023-07-07T07:04:15Z" level=info msg="Starting longhorn admission webhook server"
W0707 07:04:15.815424 1 client_config.go:618] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
time="2023-07-07T07:04:15Z" level=info msg="Waiting for admission webhook to become ready"
I0707 07:04:15.817401 1 shared_informer.go:311] Waiting for caches to sync for longhorn datastore
time="2023-07-07T07:04:15Z" level=warning msg="Failed to get webhook health endpoint https://localhost:9502/v1/healthz" error="Get \"https://localhost:9502/v1/healthz\": dial tcp 127.0.0.1:9502: connect: connection refused"
I0707 07:04:17.018086 1 request.go:696] Waited for 1.198095876s due to client-side throttling, not priority and fairness, request: GET:https://10.3.0.1:443/apis/longhorn.io/v1beta2/sharemanagers?limit=500&resourceVersion=0
time="2023-07-07T07:04:17Z" level=warning msg="Failed to get webhook health endpoint https://localhost:9502/v1/healthz" error="Get \"https://localhost:9502/v1/healthz\": dial tcp 127.0.0.1:9502: connect: connection refused"
I0707 07:04:18.117793 1 shared_informer.go:318] Caches are synced for longhorn datastore
time="2023-07-07T07:04:18Z" level=info msg="Add validaton handler for nodes.longhorn.io (Node)"
time="2023-07-07T07:04:18Z" level=info msg="Add validaton handler for settings.longhorn.io (Setting)"
time="2023-07-07T07:04:18Z" level=info msg="Add validaton handler for recurringjobs.longhorn.io (RecurringJob)"
time="2023-07-07T07:04:18Z" level=info msg="Add validaton handler for backingimages.longhorn.io (BackingImage)"
time="2023-07-07T07:04:18Z" level=info msg="Add validaton handler for volumes.longhorn.io (Volume)"
time="2023-07-07T07:04:18Z" level=info msg="Add validaton handler for orphans.longhorn.io (Orphan)"
time="2023-07-07T07:04:18Z" level=info msg="Add validaton handler for snapshots.longhorn.io (Snapshot)"
time="2023-07-07T07:04:18Z" level=info msg="Add validaton handler for supportbundles.longhorn.io (SupportBundle)"
time="2023-07-07T07:04:18Z" level=info msg="Add validaton handler for systembackups.longhorn.io (SystemBackup)"
time="2023-07-07T07:04:18Z" level=info msg="Add validaton handler for systemrestores.longhorn.io (SystemRestore)"
time="2023-07-07T07:04:18Z" level=info msg="Add validaton handler for volumeattachments.longhorn.io (VolumeAttachment)"
time="2023-07-07T07:04:18Z" level=info msg="Add validaton handler for engines.longhorn.io (Engine)"
time="2023-07-07T07:04:18Z" level=info msg="Add validaton handler for replicas.longhorn.io (Replica)"
time="2023-07-07T07:04:18Z" level=info msg="Add mutation handler for backups.longhorn.io (Backup)"
time="2023-07-07T07:04:18Z" level=info msg="Add mutation handler for backingimages.longhorn.io (BackingImage)"
time="2023-07-07T07:04:18Z" level=info msg="Add mutation handler for backingImageManagers.longhorn.io (BackingImageManager)"
time="2023-07-07T07:04:18Z" level=info msg="Add mutation handler for backingimagedatasources.longhorn.io (BackingImageDataSource)"
time="2023-07-07T07:04:18Z" level=info msg="Add mutation handler for nodes.longhorn.io (Node)"
time="2023-07-07T07:04:18Z" level=info msg="Add mutation handler for volumes.longhorn.io (Volume)"
time="2023-07-07T07:04:18Z" level=info msg="Add mutation handler for engines.longhorn.io (Engine)"
time="2023-07-07T07:04:18Z" level=info msg="Add mutation handler for recurringjobs.longhorn.io (RecurringJob)"
time="2023-07-07T07:04:18Z" level=info msg="Add mutation handler for engineimages.longhorn.io (EngineImage)"
time="2023-07-07T07:04:18Z" level=info msg="Add mutation handler for orphans.longhorn.io (Orphan)"
time="2023-07-07T07:04:18Z" level=info msg="Add mutation handler for sharemanagers.longhorn.io (ShareManager)"
time="2023-07-07T07:04:18Z" level=info msg="Add mutation handler for backupvolumes.longhorn.io (BackupVolume)"
time="2023-07-07T07:04:18Z" level=info msg="Add mutation handler for snapshots.longhorn.io (Snapshot)"
time="2023-07-07T07:04:18Z" level=info msg="Add mutation handler for replicas.longhorn.io (Replica)"
time="2023-07-07T07:04:18Z" level=info msg="Add mutation handler for supportbundles.longhorn.io (SupportBundle)"
time="2023-07-07T07:04:18Z" level=info msg="Add mutation handler for systembackups.longhorn.io (SystemBackup)"
time="2023-07-07T07:04:18Z" level=info msg="Add mutation handler for volumeattachments.longhorn.io (VolumeAttachment)"
time="2023-07-07T07:04:18Z" level=info msg="Active TLS secret longhorn-webhook-tls (ver=2687) (count 2): map[listener.cattle.io/cn-longhorn-admission-webhook.longhorn.svc:longhorn-admission-webhook.longhorn.svc listener.cattle.io/cn-longhorn-conversion-webhook.longhorn.svc:longhorn-conversion-webhook.longhorn.svc listener.cattle.io/fingerprint:SHA1=5722A8DEA1DC17BCBDFA7D07C25EB1C0DBB6C4F3]"
time="2023-07-07T07:04:18Z" level=info msg="Listening on :9502"
time="2023-07-07T07:04:19Z" level=info msg="Starting apiregistration.k8s.io/v1, Kind=APIService controller"
time="2023-07-07T07:04:19Z" level=info msg="Starting /v1, Kind=Secret controller"
time="2023-07-07T07:04:19Z" level=info msg="Starting apiextensions.k8s.io/v1, Kind=CustomResourceDefinition controller"
time="2023-07-07T07:04:19Z" level=info msg="Building validation rules..."
time="2023-07-07T07:04:19Z" level=debug msg="Add rule for {Name:nodes Scope:Namespaced APIGroup:longhorn.io APIVersion:v1beta2 ObjectType:0xc002a21380 OperationTypes:[CREATE UPDATE]}"
time="2023-07-07T07:04:19Z" level=debug msg="Add rule for {Name:settings Scope:Namespaced APIGroup:longhorn.io APIVersion:v1beta2 ObjectType:0xc00263ad80 OperationTypes:[CREATE UPDATE]}"
time="2023-07-07T07:04:19Z" level=debug msg="Add rule for {Name:recurringjobs Scope:Namespaced APIGroup:longhorn.io APIVersion:v1beta2 ObjectType:0xc00267e000 OperationTypes:[CREATE UPDATE]}"
time="2023-07-07T07:04:19Z" level=debug msg="Add rule for {Name:backingimages Scope:Namespaced APIGroup:longhorn.io APIVersion:v1beta2 ObjectType:0xc00267e480 OperationTypes:[CREATE DELETE]}"
time="2023-07-07T07:04:19Z" level=debug msg="Add rule for {Name:volumes Scope:Namespaced APIGroup:longhorn.io APIVersion:v1beta2 ObjectType:0xc000214900 OperationTypes:[CREATE UPDATE]}"
time="2023-07-07T07:04:19Z" level=debug msg="Add rule for {Name:orphans Scope:Namespaced APIGroup:longhorn.io APIVersion:v1beta2 ObjectType:0xc002413a20 OperationTypes:[CREATE UPDATE]}"
time="2023-07-07T07:04:19Z" level=debug msg="Add rule for {Name:snapshots Scope:Namespaced APIGroup:longhorn.io APIVersion:v1beta2 ObjectType:0xc002eecfc0 OperationTypes:[CREATE UPDATE]}"
time="2023-07-07T07:04:19Z" level=debug msg="Add rule for {Name:supportbundles Scope:Namespaced APIGroup:longhorn.io APIVersion:v1beta2 ObjectType:0xc000ef8700 OperationTypes:[CREATE]}"
time="2023-07-07T07:04:19Z" level=debug msg="Add rule for {Name:systembackups Scope:Namespaced APIGroup:longhorn.io APIVersion:v1beta2 ObjectType:0xc002022380 OperationTypes:[CREATE]}"
time="2023-07-07T07:04:19Z" level=debug msg="Add rule for {Name:systemrestores Scope:Namespaced APIGroup:longhorn.io APIVersion:v1beta2 ObjectType:0xc00240a160 OperationTypes:[CREATE]}"
time="2023-07-07T07:04:19Z" level=debug msg="Add rule for {Name:volumeattachments Scope:Namespaced APIGroup:longhorn.io APIVersion:v1beta2 ObjectType:0xc0025c9cc0 OperationTypes:[CREATE UPDATE]}"
time="2023-07-07T07:04:19Z" level=debug msg="Add rule for {Name:engines Scope:Namespaced APIGroup:longhorn.io APIVersion:v1beta2 ObjectType:0xc0026b1800 OperationTypes:[CREATE UPDATE]}"
time="2023-07-07T07:04:19Z" level=debug msg="Add rule for {Name:replicas Scope:Namespaced APIGroup:longhorn.io APIVersion:v1beta2 ObjectType:0xc000c4edc0 OperationTypes:[CREATE UPDATE]}"
time="2023-07-07T07:04:19Z" level=info msg="Building mutation rules..."
time="2023-07-07T07:04:19Z" level=debug msg="Add rule for {Name:backups Scope:Namespaced APIGroup:longhorn.io APIVersion:v1beta2 ObjectType:0xc002431400 OperationTypes:[CREATE UPDATE]}"
time="2023-07-07T07:04:19Z" level=debug msg="Add rule for {Name:backingimages Scope:Namespaced APIGroup:longhorn.io APIVersion:v1beta2 ObjectType:0xc00267e900 OperationTypes:[CREATE UPDATE]}"
time="2023-07-07T07:04:19Z" level=debug msg="Add rule for {Name:backingImageManagers Scope:Namespaced APIGroup:longhorn.io APIVersion:v1beta2 ObjectType:0xc00226bdc0 OperationTypes:[CREATE UPDATE]}"
time="2023-07-07T07:04:19Z" level=debug msg="Add rule for {Name:backingimagedatasources Scope:Namespaced APIGroup:longhorn.io APIVersion:v1beta2 ObjectType:0xc0029e0800 OperationTypes:[CREATE UPDATE]}"
time="2023-07-07T07:04:19Z" level=debug msg="Add rule for {Name:nodes Scope:Namespaced APIGroup:longhorn.io APIVersion:v1beta2 ObjectType:0xc002a21ba0 OperationTypes:[CREATE UPDATE]}"
time="2023-07-07T07:04:19Z" level=debug msg="Add rule for {Name:volumes Scope:Namespaced APIGroup:longhorn.io APIVersion:v1beta2 ObjectType:0xc00260c900 OperationTypes:[CREATE UPDATE]}"
time="2023-07-07T07:04:19Z" level=debug msg="Add rule for {Name:engines Scope:Namespaced APIGroup:longhorn.io APIVersion:v1beta2 ObjectType:0xc0023dc300 OperationTypes:[CREATE UPDATE]}"
time="2023-07-07T07:04:19Z" level=debug msg="Add rule for {Name:recurringjobs Scope:Namespaced APIGroup:longhorn.io APIVersion:v1beta2 ObjectType:0xc00267ed80 OperationTypes:[CREATE UPDATE]}"
time="2023-07-07T07:04:19Z" level=debug msg="Add rule for {Name:engineimages Scope:Namespaced APIGroup:longhorn.io APIVersion:v1beta2 ObjectType:0xc002eaaf00 OperationTypes:[CREATE]}"
time="2023-07-07T07:04:19Z" level=debug msg="Add rule for {Name:orphans Scope:Namespaced APIGroup:longhorn.io APIVersion:v1beta2 ObjectType:0xc00240a580 OperationTypes:[CREATE]}"
time="2023-07-07T07:04:19Z" level=debug msg="Add rule for {Name:sharemanagers Scope:Namespaced APIGroup:longhorn.io APIVersion:v1beta2 ObjectType:0xc00240a9a0 OperationTypes:[CREATE]}"
time="2023-07-07T07:04:19Z" level=debug msg="Add rule for {Name:backupvolumes Scope:Namespaced APIGroup:longhorn.io APIVersion:v1beta2 ObjectType:0xc0029e0e00 OperationTypes:[CREATE]}"
time="2023-07-07T07:04:19Z" level=debug msg="Add rule for {Name:snapshots Scope:Namespaced APIGroup:longhorn.io APIVersion:v1beta2 ObjectType:0xc002382540 OperationTypes:[CREATE]}"
time="2023-07-07T07:04:19Z" level=debug msg="Add rule for {Name:replicas Scope:Namespaced APIGroup:longhorn.io APIVersion:v1beta2 ObjectType:0xc000c4f600 OperationTypes:[CREATE UPDATE]}"
time="2023-07-07T07:04:19Z" level=debug msg="Add rule for {Name:supportbundles Scope:Namespaced APIGroup:longhorn.io APIVersion:v1beta2 ObjectType:0xc002383c00 OperationTypes:[CREATE]}"
time="2023-07-07T07:04:19Z" level=debug msg="Add rule for {Name:systembackups Scope:Namespaced APIGroup:longhorn.io APIVersion:v1beta2 ObjectType:0xc0023e3c00 OperationTypes:[CREATE]}"
time="2023-07-07T07:04:19Z" level=debug msg="Add rule for {Name:volumeattachments Scope:Namespaced APIGroup:longhorn.io APIVersion:v1beta2 ObjectType:0xc0026ac140 OperationTypes:[CREATE UPDATE]}"
time="2023-07-07T07:04:19Z" level=info msg="Updating TLS secret for longhorn-webhook-tls (count: 2): map[listener.cattle.io/cn-longhorn-admission-webhook.longhorn.svc:longhorn-admission-webhook.longhorn.svc listener.cattle.io/cn-longhorn-conversion-webhook.longhorn.svc:longhorn-conversion-webhook.longhorn.svc listener.cattle.io/fingerprint:SHA1=5722A8DEA1DC17BCBDFA7D07C25EB1C0DBB6C4F3]"
time="2023-07-07T07:04:19Z" level=debug msg="DesiredSet - No change(2) admissionregistration.k8s.io/v1, Kind=ValidatingWebhookConfiguration /longhorn-webhook-validator for longhorn/longhorn-webhook-ca"
time="2023-07-07T07:04:19Z" level=debug msg="DesiredSet - No change(2) admissionregistration.k8s.io/v1, Kind=MutatingWebhookConfiguration /longhorn-webhook-mutator for longhorn/longhorn-webhook-ca"
time="2023-07-07T07:04:19Z" level=info msg="Webhook admission is ready"
time="2023-07-07T07:04:19Z" level=warning msg="Started longhorn admission webhook server"
time="2023-07-07T07:04:19Z" level=info msg="Starting longhorn recovery-backend server"
W0707 07:04:19.829494 1 client_config.go:618] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0707 07:04:19.831098 1 shared_informer.go:311] Waiting for caches to sync for longhorn datastore
I0707 07:04:22.132427 1 shared_informer.go:318] Caches are synced for longhorn datastore
time="2023-07-07T07:04:23Z" level=info msg="Starting apiextensions.k8s.io/v1, Kind=CustomResourceDefinition controller"
time="2023-07-07T07:04:23Z" level=info msg="Starting apiregistration.k8s.io/v1, Kind=APIService controller"
time="2023-07-07T07:04:23Z" level=info msg="Started longhorn recovery-backend server"
W0707 07:04:23.187680 1 client_config.go:618] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
time="2023-07-07T07:04:23Z" level=info msg="Recovery-backend server is running at :9503"
time="2023-07-07T07:04:23Z" level=info msg="Checking if the upgrade path from v1.4.2 to v1.5.0 is supported"
I0707 07:04:23.210688 1 leaderelection.go:245] attempting to acquire leader lease longhorn/longhorn-manager-upgrade-lock...
I0707 07:04:23.261286 1 leaderelection.go:255] successfully acquired lease longhorn/longhorn-manager-upgrade-lock
time="2023-07-07T07:04:23Z" level=info msg="Start upgrading"
time="2023-07-07T07:04:23Z" level=info msg="No API version upgrade is needed"
time="2023-07-07T07:04:23Z" level=debug msg="Walking through the resource upgrade path v1.4.x to v1.5.0"
time="2023-07-07T07:04:25Z" level=warning msg="Rejected operation: Request (user: system:serviceaccount:longhorn:longhorn-service-account, longhorn.io/v1beta2, Kind=Orphan, namespace: longhorn, name: orphan-024ed10f75415525327901b184c46b279fa24fdab23c89ea80e9f6ea7be50c83, operation: UPDATE)" error="orphan orphan-024ed10f75415525327901b184c46b279fa24fdab23c89ea80e9f6ea7be50c83 spec fields are immutable" service=admissionWebhook
time="2023-07-07T07:04:25Z" level=debug msg="admit result: UPDATE longhorn.io/v1beta2, Kind=Orphan longhorn/orphan-024ed10f75415525327901b184c46b279fa24fdab23c89ea80e9f6ea7be50c83 user=system:serviceaccount:longhorn:longhorn-service-account allowed=false err=<nil>"
time="2023-07-07T07:04:25Z" level=error msg="Upgrade failed: upgrade resources failed: admission webhook \"validator.longhorn.io\" denied the request: orphan orphan-024ed10f75415525327901b184c46b279fa24fdab23c89ea80e9f6ea7be50c83 spec fields are immutable"
time="2023-07-07T07:04:25Z" level=info msg="Upgrade leader lost: worker-3.test-k8s.iamoffice.lv"
time="2023-07-07T07:04:25Z" level=fatal msg="Error starting manager: upgrade resources failed: admission webhook \"validator.longhorn.io\" denied the request: orphan orphan-024ed10f75415525327901b184c46b279fa24fdab23c89ea80e9f6ea7be50c83 spec fields are immutable"
Environment
- Longhorn version: 1.4.2
- Installation method (e.g. Rancher Catalog App/Helm/Kubectl): Helm
- Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: Vanilla Kubernetes v1.24.15
- Number of management node in the cluster: 3
- Number of worker node in the cluster: 3
- Node config
- OS type and version: Flatcar Container Linux by Kinvolk 3510.2.4 (Oklo)
- CPU per node: 12CPU
- Memory per node: 16Gb
- Disk type(e.g. SSD/NVMe): SSD
- Network bandwidth between the nodes: 10Gbit, x2 bond
- Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): KVM
- Number of Longhorn volumes in the cluster: 9
Workaround
We recommend workaround A before upgrade. However, if you already run in the issue, please resolve the issue by workaround B.
A. Delete orphan resources before upgrade.
ā
B.
- Delete crashloop longhorn-manager pods
kubectl -n longhorn-system delete pod -l app=longhorn-manager
- Edit
longhorn-webhook-validatorvalidatingwebhookconfigurations
kubectl -n longhorn-system edit validatingwebhookconfigurations longhorn-webhook-validator
- Remove
UPDATEfromorphansresources
...
- apiGroups:
- longhorn.io
apiVersions:
- v1beta2
operations:
- CREATE
- UPDATE
resources:
- orphans
...
- Continue the upgrade
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 42 (18 by maintainers)
Commits related to this issue
- fix(Longhorn): Fix issue upgrading from 1.4.2 to 1.5.0 https://github.com/longhorn/longhorn/issues/6246 — committed to bidluo/Home-GitOps by bidluo a year ago
@DmitryMigunov
Hit a regression issue. There is a workaround.
longhorn-webhook-validatorvalidatingwebhookconfigurationsUPDATEfromorphansresourcesHi @DmitryMigunov After v1.5.0, webhook and recovery services are merged into longhorn-manager But somehow the deployment templates got left in the helm chart So helm install the deployment back
Workaround:
kubectl delete deployments.apps longhorn-admission-webhook longhorn-conversion-webhook longhorn-recovery-backend -n longhorn-systemThanks!
To fix this one, I just set the following in my custom values.yaml, it stops the Deployments from scaling up, as only the pods are the issue, not the deployments.
I donāt think, that itās a related issue, but now
longhorn-recovery-backendandlonghorn-conversion-webhookstarts crashing with errors:@derekbit , Longhorn has been successfully upgraded with your workaround. Thank you.
@PhanLe1010
For issue 1, those who use GitOps with declarative configuration canāt delete these deployments manually because the system expects them to exist by virtue of them existing in the Helm Chart, so either the reconciliation of the cluster fails because the resources donāt exist or they get recreated by the system but fail to start and the reconciliation still fails.
Having other applications depend on the health of longhorn (which is failing) means they canāt start
Whatās the recommended course of action?
Verified on v1.5.x-head 20230710
The test steps
https://github.com/longhorn/longhorn/issues/6246#issuecomment-1624970058
Result Passed
PSA
In this ticket, two regressions were found in v1.5.0. The workaround has been provided as below and the issues will be fixed in the upcoming v1.5.1 patch. We recommend users to stop upgrading to v1.5.0 and wait for v1.5.1 release. Thanks for the understanding.
Issue 1: Longhorn v1.5.0 merged webhook and recovery deployments longhorn-manager DaemonSet but they are accidentally left in from Helm templates (due to a Helm chart syncing issue between different repos). The workaround is setting helm value to scale down these deployment https://github.com/longhorn/longhorn/issues/6246#issuecomment-1629855815
Issue 2: We missed updating the webhook logic for orphan CR thus blocking any update to orphan CR. The workaround for this one is https://github.com/longhorn/longhorn/issues/6246#issuecomment-1624944396
@Starttoaster you can please check the workaround in the description for the immutable field issue.
The step is tricky. You probably need to try multiple times. https://github.com/longhorn/longhorn/issues/6246#issue-1792948689
If you never mind, I would recommend using the customized image I built. The steps are https://github.com/longhorn/longhorn/discussions/6281#discussioncomment-6414724
Thanks @absentbri ! That is a better workaround! Updated the PSA
Verified on master-head 20230710
The test steps
https://github.com/longhorn/longhorn/issues/6246#issuecomment-1624970058
Result Passed
1.4.2tomaster-headusing Helm.Thanks for the response, I originally did a rollback as someone else said they were able to fix the current topic ( instance managers fail in
1.5.0upgrade due to orphaned resources ) with that method. I was experiencing that same issue and did a rollback, as stated in the linked comment/reply , I know itās documented as unsupported, but I was able to rollback and delete the orphan pvcs and do an upgrade before experiencing the current issue.But as you stated, this is a known behavior
and should resolve itself once you detach the volumes and then the old engine will be recycled. Let me try and detach/re-attach and ensure that happens.EDIT: In the docs, https://longhorn.io/docs/1.5.0/deploy/upgrade/upgrade-engine/, engine upgrade is actually done via UI/manually or through automation you should do this manually or set up longhorn to automatically do it. This was my first upgrade for longhorn and I missed that section. As soon as I upgraded the volumes to use the new engine version, the daemonset was removed:Deleted the comment in the other thread and will stick to this thread/issue though, thanks
Itās not causing downtime currently, just leaving the cluster in an unreconciled state. Though I can remove the dependency link from my apps to longhorn and it should be okay until v1.5.1 is released.
I think the patch release of chart would be useful:
We have automation CI test https://github.com/longhorn/longhorn-tests and https://ci.longhorn.io/.
The issue is caused by the missing part (orphan resources) in the upgrade path.
Any feedback or contribution is appreciated.
When something is released itās expected to work and be properly tested beforehand (?). I donāt see any automated CI testing in this repo. We should install (and/or run tests) automatically on the latest 3 stable kubernetes versions, e.g. like https://github.com/kubernetes/ingress-nginx
Also note that the helm chart version does not have to be equal to the longhorn version that allows you to fix the chart without releasing a new longhorn version.
@ChanYiLin as discussed, letās create another issue to tackle helm chart issue. cc @longhorn/qa
longhorn-recovery-backend and longhorn-conversion-webhook should not be used and be terminated after upgrade. cc @ChanYiLin can you take a look?
/upgrade/v14xto150/upgrade.go#L33-L35