longhorn: [BUG] Restoring backup fails when validating volume spec

Describe the bug (🐛 if you encounter this issue)

I have periodic backups to S3. Today I thought I’d try restoring one today but it fails when trying to validate the volume spec:

unable to create volume: unable to create volume foo: Volume.longhorn.io "foo" is invalid: [spec.snapshotDataIntegrity: Unsupported value: "": supported values: "ignored", "disabled", "enabled", "fast-check", spec.unmapMarkSnapChainRemoved: Unsupported value: "": supported values: "ignored", "disabled", "enabled", spec.dataLocality: Unsupported value: "": supported values: "disabled", "best-effort", "strict-local", spec.replicaAutoBalance: Unsupported value: "": supported values: "ignored", "disabled", "least-effort", "best-effort"]

As far as I can tell I cannot control these values when restoring in the UI, so I’m assuming it is passing empty string to mean “inherit” but the validation is rejecting that.

To Reproduce

Steps to reproduce the behavior:

Create a period backup job
Go to the backups list for a volume and attempt to restore one
Restore config I used:
Observe backup restore failure

Expected behavior

The backup to be restored.

Log or Support bundle

If applicable, add the Longhorn managers’ log or support bundle when the issue happens. You can generate a Support Bundle using the link at the footer of the Longhorn UI.

Environment

Longhorn version: 1.4.0
Installation method (e.g. Rancher Catalog App/Helm/Kubectl): Kubectl
Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: kubeadm cluster, selfhosted
- Number of management node in the cluster: 3
- Number of worker node in the cluster: 4
Node config
- OS type and version: Ubuntu 20.04.5 LTS
- CPU per node: 10
- Memory per node: 24GB
- Disk type(e.g. SSD/NVMe): NVME
- Network bandwidth between the nodes: 10gbit
Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): KVM
Number of Longhorn volumes in the cluster: 11

Additional context

Backups are stored via Minio/S3.

About this issue

Original URL
State: closed
Created a year ago
Comments: 19 (12 by maintainers)

Most upvoted comments

Got it. It looks the default values of the newly added spec fields in LH v1.4.0 are not set correctly.

spec.snapshotDataIntegrity
spec.unmapMarkSnapChainRemoved
spec.dataLocality

The message also complains the wrong value of spec.replicaAutoBalance added prior to v1.4.0. 😕

spec.replicaAutoBalance

derekbit on Jan 29, 2023

Yep, deleted the admission pods and they were recreated and now it works. Now I feel like an idiot 😄 sorry for the bother friends.

Foritus on Jan 30, 2023

@derekbit I do remember we have default values covered in the mutating webhooks, so what’s missing here?

Yes, we mutate the values if they are empty strings. https://github.com/longhorn/longhorn-manager/blob/v1.4.0/webhook/resources/volume/mutator.go#L101

Update: I created a volume and backed up it in v1.3.2 and sucessfully restored it in v1.4.0.

derekbit on Jan 30, 2023

It fails even with backups created by 1.4.0, as in this case they are backups of a volume originally created by 1.3.2 (or possibly older, hard to verify)

Foritus on Jan 29, 2023

No, the last upgrade was only the kubectl apply script as above. Not sure why the pods were not regenerated honestly. Maybe it’s time to rebuild my control plane.

Foritus on Jan 31, 2023

Ah that I can tell you as I keep the upgrade script in version control

#!/bin/bash

# Deploy Longhorn
kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v1.4.0/deploy/longhorn.yaml

# Set LongHorn as default storage class (Remember to do this on each upgrade too!)
kubectl patch storageclass longhorn -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

# Add a custom ingress to allow external prometheus to scrape metrics
kubectl apply -f metrics-ingress.yaml

Just did a quick test. The validatingWebhookConfiguration and mutatingWebhookConfiguration are regenerated after upgrade. Not sure why your environemtn didn’t do the regeneration. 😕

derekbit on Jan 30, 2023

@derekbit I do remember we have default values covered in the mutating webhooks, so what’s missing here?

Yes, we mutate the values if they are empty strings. https://github.com/longhorn/longhorn-manager/blob/v1.4.0/webhook/resources/volume/mutator.go#L101

Update: I created a volume and backed up it in v1.3.2 and sucessfully restored it in v1.4.0.

I also tried to reproduce the issue, but I couldn’t reproduce it either.

The test steps

Deploy Longhorn v1.3.2
Create and attach 1 volume.
Setup S3 backup target
Create volume backup
Create a period backup job
Upgrade Longhorn to v1.4.0
Do live upgrade for volume.
restore the volume backup supportbundle_4dbfe34e-2d8a-49f3-a3f1-46d63149b27b_2023-01-30T02-57-20Z.zip

roger-ryao on Jan 30, 2023