longhorn: [BUG] Can not delete type=`bi` VolumeSnapshot if related backing image not exist
Describe the bug (๐ if you encounter this issue)
Delete type=bi VolumeSnapshot will stuck if backing image not exist
To Reproduce
Steps to reproduce the behavior:
-
Deploy the CSI snapshot CRDs and related Controller v6.2.1
-
Deploy Longhorn
-
Create VolumeSnapshotClass
longhorn-snapshot-vscmanifest
kind: VolumeSnapshotClass apiVersion: snapshot.storage.k8s.io/v1 metadata: name: longhorn-snapshot-vsc driver: driver.longhorn.io deletionPolicy: Delete parameters: type: bi
-
Create VolumeSnapshotContent
test-on-demand-backingmanifest
apiVersion: snapshot.storage.k8s.io/v1 kind: VolumeSnapshotContent metadata: name: test-on-demand-backing spec: volumeSnapshotClassName: longhorn-snapshot-vsc driver: driver.longhorn.io deletionPolicy: Delete source: snapshotHandle: bi://backing?backingImageDataSourceType=download&backingImage=test-bi&url=https%3A%2F%2Flonghorn-backing-image.s3-us-west-1.amazonaws.com%2Fparrot.qcow2&backingImageChecksum=bd79ab9e6d45abf4f3f0adf552a868074dd235c4698ce7258d521160e0ad79ffe555b94e7d4007add6e1a25f4526885eb25c53ce38f7d344dd4925b9f2cb5d3b # NOQA volumeSnapshotRef: name: test-snapshot-on-demand-backing namespace: default -
Create VolumeSnapshot
test-snapshot-on-demand-backingmanifest
apiVersion: snapshot.storage.k8s.io/v1 kind: VolumeSnapshot metadata: name: test-snapshot-on-demand-backing spec: volumeSnapshotClassName: longhorn-snapshot-vsc source: volumeSnapshotContentName: test-on-demand-backing -
Delete VolumeSnapshot
test-snapshot-on-demand-backing, the deletion stuck. (If Longhorn had a backing image namedtest-bibefore execute step 6, the delete will success.)
Expected behavior
Delete should success no matter backing image exist
Log or Support bundle
longhorn-manager
time="2023-07-10T02:03:00Z" level=warning msg="HTTP handling error" error="failed to get backing image 'test-bi': backingimage.longhorn.io \"test-bi\" not found"
time="2023-07-10T02:03:00Z" level=error msg="Error in request: failed to get backing image 'test-bi': backingimage.longhorn.io \"test-bi\" not found"
10.42.0.31 - - [10/Jul/2023:02:03:00 +0000] "GET /v1/backingimages/test-bi HTTP/1.1" 500 248 "" "Go-http-client/1.1"
time="2023-07-10T02:05:08Z" level=warning msg="HTTP handling error" error="failed to get backing image 'test-bi': backingimage.longhorn.io \"test-bi\" not found"
time="2023-07-10T02:05:08Z" level=error msg="Error in request: failed to get backing image 'test-bi': backingimage.longhorn.io \"test-bi\" not found"
csi-snapshotter
E0710 01:55:06.791510 1 snapshot_controller_base.go:283] could not sync content "test-on-demand-backing": failed to delete snapshot "test-on-demand-backing", err: failed to delete snapshot content test-on-demand-backing: "rpc error: code = Internal desc = Bad response statusCode [500]. Status [500 Internal Server Error]. Body: [detail=, message=failed to get backing image 'test-bi': backingimage.longhorn.io \"test-bi\" not found, code=Server Error] from [http://longhorn-backend:9500/v1/backingimages/test-bi]"
supportbundle_f16395ed-43c1-480c-bbb3-eec929c91479_2023-07-10T02-07-09Z.zip
Environment
- Longhorn version: master, v1.5.x, v1.4.x
- Installation method (e.g. Rancher Catalog App/Helm/Kubectl): kubectl
- Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: v1.27.3+k3s1
- Node config
- OS type and version: Ubuntu 22.04
- Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): AWS
Additional context
About this issue
- Original URL
- State: closed
- Created a year ago
- Reactions: 1
- Comments: 23 (22 by maintainers)
@innobead https://github.com/longhorn/longhorn/issues/6291
Oh I got it Yeah
I think this solution is good enough for current issue and give the right status code (404) for all not found error For other errors, as you said, we are currently restricted by the lib.
I think we can create another issue for handling other errors
@ChanYiLin It seems PRs merged. Please remember to move it to ready for testing if all done.
@ChanYiLin Sounds good, please help create the issue to track it later by self-assigning, also I am not sure if the go-rancher has any new version w/ a related improvement. or not or even we can contribute back to it.
@ChanYiLin What I mean is not client side but server side, like add your change in the below code in general to handle for all endpoint functions. WDYT?
/api/router.go#L16-L24
Hi @innobead ,
Sorry I tested on
v1.4.3-rc1this morning, VolumeSnpshotClass, VolumeSnapshotContent, VolumeSnapshot, can create without problem, but I just notice that create PVC and backing image from VolumeSnapshot not support, I am sorry for the misleading information.Hi @innobead there is no need to backport to 1.3.X and 1.4.X since backing image with csi feature is only introduced in
v1.5.xthanks!It seems backup type snapshot will have the same issue If someone delete the backupVolume which is created by the volumesnapshot through UI Then one canโt delete the volumesnapshot after as well
Oh I found the reason why it failed
generated_clientwe do check if it is not found (404) and return nil, and should bypass the deletion process
So the csi failed to process because of the error.
Removed from 1.5.1 and 1.4.3 which will be released in the near future, so this can be tackled in the following release instead.