longhorn: [BUG] Upgrade engine --> spec.restoreVolumeRecurringJob and spec.snapshotDataIntegrity Unsupported value
Describe the bug (🐛 if you encounter this issue)
After a migration from Longhorm 1.3.2 to Longhorm 1.4.0, I am trying to upgrade engine of my volume and I get folling errors
cannot upgrade engine for volume XXXX using image rancher/mirrored-longhornio-longhorn-engine:v1.4.0:
Volume.longhorn.io "XXXX" is invalid:
spec.restoreVolumeRecurringJob: Unsupported value: "": supported values: "ignored", "enabled", "disabled",
spec.snapshotDataIntegrity: Unsupported value: "": supported values: "ignored", "disabled", "enabled", "fast-check"
When I take a look to Snapshot Data Integrity and Allow snapshots removal during trim parameter, options are empty. If I try to change the value, the error rize again
It is like that for all volumes
To Reproduce
Steps to reproduce the behavior:
- In Rancher, use
Cluster Tools EditLonghorn package- Change the version from 1.3.2 to 1.4.0
- Click
Next - Click
Update - Wait until the update process finish without errors
- Wait a wile than all pods restart of all nodes
- On Longhorn UI, on each volume try to upgrade Engine Image to
rancher/mirrored-longhornio-longhorn-engine:v1.4.0 - The following error rises
cannot upgrade engine for volume XXXX using image rancher/mirrored-longhornio-longhorn-engine:v1.4.0: Volume.longhorn.io "XXXX" is invalid: [spec.snapshotDataIntegrity: Unsupported value: "": supported values: "ignored", "disabled", "enabled", "fast-check", spec.restoreVolumeRecurringJob: Unsupported value: "": supported values: "ignored", "enabled", "disabled"]
Expected behavior
An upgrade of the engine of the volume to v1.4.0
Environment
- Longhorn version: 1.4.0
- Installation method (e.g. Rancher Catalog App/Helm/Kubectl): Rancher Catalog App
- Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: v1.24.9+k3s2
- Number of management node in the cluster: 3
- Number of worker node in the cluster: 5
- Node config
- OS type and version: Debian 11
- CPU per node: AMD and ARM
- Memory per node: 16 Go / 8Go
- Disk type(e.g. SSD/NVMe): SSD
- Network bandwidth between the nodes: 10Gb
- Number of Longhorn volumes in the cluster: 8
Workaround
https://github.com/longhorn/longhorn/issues/5485#issuecomment-1499639915
Additional context
After a rollback to version 1.3.2 (using Rancher Catalog app), everything return to a stable state
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 29 (15 by maintainers)
For example:
@innobead Moved. Sorry for neglecting the state change. Yeah, https://github.com/longhorn/longhorn/issues/5762 is the root cause, and the issue also make the upgrade path more complete.
@innobead This is my original thought.
restoreVolumeRecurringJobare empty.But you’re right, the issue happened in the upgrade path, so the root cause should be
I would not close this as I think it’s a genuine bug. Those fields should either be added automatically upon upgrade, or at least the validator should pretend they’re there with the default values, instead of breaking.
Great!!! I applyed the patch to all my pvc, it works like a charm
A big thanks
Derek means re-trying reproducing step 8 for one volume then generating a Longhorn support bundle.
I think webhook is enabled, how can I confirm it? (pods are running and there is not error in logs) Deleted the admission pods, they were recreated, but the issue is still there
Do you need the result of
kubectl get MutatingWebhookConfiguration longhorn-webhook-mutator -o yamlto help?