democratic-csi: Snapshot size `0` breaks smart clones with OpenShift Virtualization
Summary
When using the freenas-nfs driver with OpenShift 4.8+ and OpenShift Virtualization 4.8+, offloaded “smart” clones do not work. This appears to be a result of how the Containerized Data Importer looks at the size of the snapshot to determine the size of the newly created volume.
I would expect this will also affect KubeVirt with other Kubernetes distributions as well.
Recreating the issue
-
Deploy OpenShift 4.8 or later.
-
Deploy OpenShift Virtualization 4.8 or later.
-
Configure and deploy the
freenas-nfsdriver, with snapshot support. -
Add a VM template disk image, using the OpenShift Virtualization GUI or by creating a PVC with a template disk inside.
apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
name: tmpl-fedora-34
spec:
pvc:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 10Gi
volumeMode: Filesystem
storageClassName: lab-nfs
source:
http:
url: >-
https://url/to/Fedora-Cloud-Base-34-1.2.x86_64.qcow2
The PVC and PV are successfully created and populated with data.
# oc get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
tmpl-fedora-34 Bound pvc-1e1ffa6c-eef6-41b4-a188-7c513f5b2deb 10Gi RWX lab-nfs 9m52s
# oc get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-1e1ffa6c-eef6-41b4-a188-7c513f5b2deb 10Gi RWX Delete Bound default/tmpl-fedora-34 lab-nfs 10m
- Create a new
DataVolume, using the previous as the source.
apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
name: "fedora-clone"
spec:
source:
pvc:
name: tmpl-fedora-34
namespace: default
pvc:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 10Gi
volumeMode: Filesystem
storageClassName: lab-nfs
VolumeSnapshot and VolumeSnapShotContents objects are successfully created, however there is no ZFS snap.
# oc get volumesnapshot
NAME READYTOUSE SOURCEPVC SOURCESNAPSHOTCONTENT RESTORESIZE SNAPSHOTCLASS SNAPSHOTCONTENT CREATIONTIME AGE
fedora-clone true tmpl-fedora-34 0 lab-nfs snapcontent-04eb9941-b22d-452f-8db8-fc36dca31c7f 5m41s 5m41s
# oc get volumesnapshotcontents
NAME READYTOUSE RESTORESIZE DELETIONPOLICY DRIVER VOLUMESNAPSHOTCLASS VOLUMESNAPSHOT VOLUMESNAPSHOTNAMESPACE AGE
snapcontent-04eb9941-b22d-452f-8db8-fc36dca31c7f true 0 Delete org.democratic-csi.nfs lab-nfs fedora-clone default 5m57s
Note that the RESTORESIZE field for both is 0.
- The new
DataVolumestays in theSnapshotForSmartCloneInProgressstatus indefinitely.
oc describe dv fedora-clone
Name: fedora-clone
Namespace: default
Labels: <none>
Annotations: cdi.kubevirt.io/cloneType: snapshot
cdi.kubevirt.io/storage.clone.token:
eyJhbGciOiJQUzI1NiIsImtpZCI6IiJ9.eyJleHAiOjE2MzU3OTMyMzgsImlhdCI6MTYzNTc5MjkzOCwiaXNzIjoiY2RpLWFwaXNlcnZlciIsIm5hbWUiOiJ0bXBsLWZlZG9yYS0zN...
API Version: cdi.kubevirt.io/v1beta1
Kind: DataVolume
Metadata:
Creation Timestamp: 2021-11-01T18:55:38Z
Generation: 2
Resource Version: 456717
UID: 461f9b88-22bd-4504-946c-42412f0cd120
Spec:
Pvc:
Access Modes:
ReadWriteMany
Resources:
Requests:
Storage: 10Gi
Storage Class Name: lab-nfs
Volume Mode: Filesystem
Source:
Pvc:
Name: tmpl-fedora-34
Namespace: default
Status:
Conditions:
Last Heartbeat Time: 2021-11-01T18:55:38Z
Last Transition Time: 2021-11-01T18:55:38Z
Message: No PVC found
Reason: NotFound
Status: Unknown
Type: Bound
Last Heartbeat Time: 2021-11-01T18:55:38Z
Last Transition Time: 2021-11-01T18:55:38Z
Status: False
Type: Ready
Last Heartbeat Time: 2021-11-01T18:55:38Z
Last Transition Time: 2021-11-01T18:55:38Z
Status: False
Type: Running
Phase: SnapshotForSmartCloneInProgress
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SnapshotForSmartCloneInProgress 57m datavolume-controller Creating snapshot for smart-clone is in progress (for pvc default/tmpl-fedora-34)
Normal NotFound 57m datavolume-controller No PVC found
The logs for the CDI deployment Pod (oc logs cdi-deployment-<identifier> -n openshift-cnv) have these messages:
{
"level": "info",
"ts": 1635791578.1069758,
"logger": "controller.smartclone-controller",
"msg": "reconciling smart clone",
"VolumeSnapshot/PersistentVolumeClaim": "openshift-virtualization-os-images/cdi-tmp-bcd9870e-fd40-4e8b-a966-9d02d1bcfe34"
}
{
"level": "info",
"ts": 1635791578.1070423,
"logger": "controller.smartclone-controller",
"msg": "Reconciling snapshot",
"VolumeSnapshot/PersistentVolumeClaim": "openshift-virtualization-os-images/cdi-tmp-bcd9870e-fd40-4e8b-a966-9d02d1bcfe34",
"snapshot.Name": "cdi-tmp-bcd9870e-fd40-4e8b-a966-9d02d1bcfe34",
"snapshot.Namespace": "openshift-virtualization-os-images"
}
{
"level": "error",
"ts": 1635791578.184192,
"logger": "controller.smartclone-controller",
"msg": "error creating pvc from snapshot",
"VolumeSnapshot/PersistentVolumeClaim": "openshift-virtualization-os-images/cdi-tmp-bcd9870e-fd40-4e8b-a966-9d02d1bcfe34",
"error": "PersistentVolumeClaim \"cdi-tmp-bcd9870e-fd40-4e8b-a966-9d02d1bcfe34\" is invalid: spec.resources[storage]: Invalid value: \"0\": must be greater than zero",
"stacktrace": "github.com/go-logr/zapr.(*zapLogger).Error\n\t/remote-source/app/vendor/github.com/go-logr/zapr/zapr.go:132\nkubevirt.io/containerized-data-importer/pkg/controller.(*SmartCloneReconciler).reconcileSnapshot\n\t/remote-source/app/pkg/controller/smart-clone-controller.go:247\nkubevirt.io/containerized-data-importer/pkg/controller.(*SmartCloneReconciler).Reconcile\n\t/remote-source/app/pkg/controller/smart-clone-controller.go:151\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:298\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.2\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:216\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.UntilWithContext\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:99"
}
{
"level": "error",
"ts": 1635791578.18426,
"logger": "controller-runtime.manager.controller.smartclone-controller",
"msg": "Reconciler error",
"name": "cdi-tmp-bcd9870e-fd40-4e8b-a966-9d02d1bcfe34",
"namespace": "openshift-virtualization-os-images",
"error": "PersistentVolumeClaim \"cdi-tmp-bcd9870e-fd40-4e8b-a966-9d02d1bcfe34\" is invalid: spec.resources[storage]: Invalid value: \"0\": must be greater than zero",
"stacktrace": "github.com/go-logr/zapr.(*zapLogger).Error\n\t/remote-source/app/vendor/github.com/go-logr/zapr/zapr.go:132\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:302\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.2\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:216\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.UntilWithContext\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:99"
}
I believe that is the result of this line of code, which statically sets the size to 0.
Expected result
The objects should have a valid value in the size field, allowing the DataVolume clone operation to succeed.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 21 (13 by maintainers)
Yes understood. We had some good discussion about the matter on slack: https://kubernetes.slack.com/archives/C8EJ01Z46/p1635807374020000
In any case, the result of the conversation was this:
GetCapacity) vs being used as a minimal value to used as the size for derived volumes)0is relevant or nothing better can be assumed)0is no goodPVCif still presentIn any case, I’ll be working on revisiting the logic but it will take me a little bit. It’s actually more nuanced than it seems because I support
ListVolumesso I need to store the size historically in case the parent PVC grows etc. I will however try to retool the driver based on the conclusions of the conversation on slack.I’m glad you brought it up because I otherwise would never have known 😃
Apologies for the delay, that works exactly as expected, thank you!
OK thanks for giving it a try. I’ll work on returning a value, hopefully won’t take too long.