harvester: [BUG] VM with unschedule disks doesn't show clear warning message

Describe the bug VM with unschedule disks doesn’t show clear warning message

To Reproduce Steps to reproduce the behavior:

Create a VM that exceed the storage capacity, e.g. 100T
VM failed to start, but no error message giving to the user
The only available information in the event is

Reason	Resource	Date
FailedMount	Pod virt-launcher-test-nzg65Unable to attach or mount volumes: unmounted volumes=[disk-0], unattached volumes=[container-disks sockets disk-0 cloudinitdisk-ndata public ephemeral-disks hotplug-disks libvirt-runtime cloudinitdisk-udata private]: timed out waiting for the condition	1.3 mins ago
FailedAttachVolume	Pod virt-launcher-test-nzg65AttachVolume.Attach failed for volume “pvc-db775cbb-a5a2-4479-83f6-d7d9af85bfb7” : rpc error: code = Aborted desc = volume pvc-db775cbb-a5a2-4479-83f6-d7d9af85bfb7 is not ready for workloads	1.3 mins ago
FailedMount	Pod virt-launcher-test-nzg65Unable to attach or mount volumes: unmounted volumes=[disk-0], unattached volumes=[ephemeral-disks private libvirt-runtime cloudinitdisk-udata public container-disks hotplug-disks cloudinitdisk-ndata disk-0 sockets]: timed out waiting for the condition

Expected behavior VM should show similar warning information to the user like insufficent storage

Support bundle

Environment:

Harvester ISO version: v1.0.0
Underlying Infrastructure (e.g. Baremetal with Dell PowerEdge R630): any

Additional context Add any other context about the problem here.

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 24 (18 by maintainers)

Commits related to this issue

Check volume's real state before adding it to running VM If pvc's StorageClass has nil VolumeBindingMode, the replica schedule result is uncertain during this phase. We log warning message, and the h... — committed to WebberHuang1118/harvester by WebberHuang1118 a year ago
Check volume's real state before adding it to running VM If pvc's StorageClass has nil VolumeBindingMode, the replica schedule result is uncertain during this phase. We log warning message, and the h... — committed to harvester/harvester by WebberHuang1118 a year ago

Most upvoted comments

Another related issue https://github.com/harvester/harvester/issues/1346

It looks, we need to add kind of enhancement to reflect the real status of PVC/PV, and control the usage from UI. @johnliu55tw @WuJun2016 please also take a look of 1#695

w13915984028 on Jan 17, 2022

On the longhorn side we should prevent the volume creation, if there is no node capable of hosting it during the api volume creation call. We could return the Out of Range error, for this case during the CreateVolume csi call. This evaluation should be done by the backend api creation call.

ref: https://github.com/container-storage-interface/spec/blob/master/spec.md#createvolume-errors

joshimoo on Jan 12, 2022

Verified fixed on master-b0d883ce-head (11/22). Close this issue.

Result

Case 1 (PASS)

When we create volume exceed the max acceptable size, will block and prompt message Exceed maximum size 999999999 Gi!

Case 2 (PASS)

When we create a volume did not exceed but larger than the total disk capability (plus overcommit),
We can create the volume but the status of the volume is NotReady
The error message should be insufficient storage
The link on the top of the page will be visible
Click the link to open a new tab with embedded longhorn UI

Case 3 single node (PASS)

When we create a vm have os image volume with replica scheduling failed and another volume with insufficient disk

The status of the volume should be Degraded and Not ready
The error message of the volume should be replica scheduling failed and insufficient disk
The link in the volume section will be visible

Case 4 multiple nodes (PASS)

Same result as single node but in multi nodes cluster

Test Information

Test Environment: 3 nodes harvester on bare machines and 1 nodes local kvm machine
Harvester version:master-b0d883ce-head (11/22)

Verify Steps

Case 1

Go to create a volume with 9999999999 Gi storage size in both harvester-longhorn and longhorn storage class
The volume should not be created successfully.
The error message should be Exceed maximum size 999999999 Gi!

Case 2

Go to create a volume with 99999999 Gi storage size in both harvester-longhorn and longhorn storage class
Back to volume list
The status of the volume should be NotReady
The error message should be insufficient storage
Go to preference page and enable DEV mode
Go to volume detail page
The link on the top of the page will be visible
Click the link to open a new tab with embedded longhorn UI

Case 3 (Single node)

Go to create a vm
Give os image volume 30Gi and add another volume 1600Gi of harvester-longhorn storage class
Go to vm detail or vm edit page
Switch to volume tab
The status of the volume should be Degraded and Not ready
The error message of the volume should be replica scheduling failed and insufficient disk
The link in the volume section will be visible

Case 4 (multi nodes)

Disable Node scheduling on node 3 in Longhorn UI
Give os image volume 30Gi and add another volume 1600Gi of harvester-longhorn storage class
OS image disk display degraded while another disk display Not ready with corresponding error

TachunLin on Nov 23, 2022