kubevirt: domain Paused(I/O Error) after hotplug the second volume to vm
What happened:
domain paused due to an I/O error after hotplug second volume to vm.
Produce steps:
- create a vmi as following yaml:
apiVersion: kubevirt.io/v1
kind: VirtualMachineInstance
metadata:
name: vmi-001
spec:
domain:
devices:
disks:
- disk:
bus: virtio
name: containerdisk
- disk:
bus: virtio
name: cloudinitdisk
rng: {}
memory:
guest: "4096Mi"
cpu:
cores: 2
terminationGracePeriodSeconds: 0
volumes:
- containerDisk:
image: kubevirt/fedora-with-test-tooling-container-disk:devel
name: containerdisk
- cloudInitNoCloud:
userData: |-
#cloud-config
password: fedora
chpasswd: { expire: False }
name: cloudinitdisk
- create a pvc as the first hotplug disk
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: cxl-hp-1
namespace: default
annotations:
pv.kubernetes.io/bind-completed: 'yes'
pv.kubernetes.io/bound-by-controller: 'yes'
volume.beta.kubernetes.io/storage-provisioner: rbd.csi.ceph.com
volume.kubernetes.io/storage-provisioner: rbd.csi.ceph.com
finalizers:
- kubernetes.io/pvc-protection
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
volumeName: pvc-242f649f-6dad-419b-a1c1-aa5c22f3e432
storageClassName: csi-rbd-sc
volumeMode: Block
- hotplug the first disk to vmi by using the following command
virtctl addvolume vmi-001 --volume-name=cxl-hp-1
- login into vm and write some data on the hotplugged disk
cxl-hp-1
virtctl console vmi-001
fdisk /dev/sda
mkfs.ext4 /dev/sda
mkdir /disk1
mount /dev/sda /disk1
echo "this is the first hotplugged disk" > /disk1/a.txt
- create another pvc as the second hotplug disk
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: cxl-hp-2
namespace: default
annotations:
pv.kubernetes.io/bind-completed: 'yes'
pv.kubernetes.io/bound-by-controller: 'yes'
volume.beta.kubernetes.io/storage-provisioner: rbd.csi.ceph.com
volume.kubernetes.io/storage-provisioner: rbd.csi.ceph.com
finalizers:
- kubernetes.io/pvc-protection
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
volumeName: pvc-1b570192-5088-490d-93d3-dc7b1f89afa4
storageClassName: csi-rbd-sc
volumeMode: Block
- hotplug the second disk to vmi by using the following command
virtctl addvolume vmi-001 --volume-name=cxl-hp-2
- login into vm and write some data still on the first hotplugged disk
cxl-hp-1
virctl console vmi-001
echo "test" > /disk1/b.txt
- error happened !
At this time, vmi status is still “running”, but the domain state is “paused” due to I/O error
~# kubectl exec -it virt-launcher-vmi-001-8q25h -- virsh domstate 1 --reason
Authorization not available. Check if polkit service is running or see debug message for more information.
paused (I/O error)
-
remove the error disk with
virtctl removevolumeand the vmi restore from pause to running -
login into vmi using
virtctl consoleand check the dmesg
[Sun May 7 10:21:03 2023] ISO 9660 Extensions: Microsoft Joliet Level 3
[Sun May 7 10:21:03 2023] ISO 9660 Extensions: RRIP_1991A
[Sun May 7 10:22:35 2023] scsi 6:0:0:0: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5
[Sun May 7 10:22:35 2023] scsi 6:0:0:0: Attached scsi generic sg0 type 0
[Sun May 7 10:22:35 2023] sd 6:0:0:0: Power-on or device reset occurred
[Sun May 7 10:22:35 2023] sd 6:0:0:0: [sda] 20971520 512-byte logical blocks: (10.7 GB/10.0 GiB)
[Sun May 7 10:22:35 2023] sd 6:0:0:0: [sda] Write Protect is off
[Sun May 7 10:22:35 2023] sd 6:0:0:0: [sda] Mode Sense: 63 00 00 08
[Sun May 7 10:22:35 2023] sd 6:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[Sun May 7 10:22:35 2023] sd 6:0:0:0: [sda] Attached SCSI disk
[Sun May 7 10:24:43 2023] sda: sda1
[Sun May 7 10:37:09 2023] sda: sda1
[Sun May 7 10:37:09 2023] sda: sda1
[Sun May 7 10:37:37 2023] EXT4-fs (sda): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
[Sun May 7 10:39:52 2023] scsi 6:0:0:1: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5
[Sun May 7 10:39:52 2023] sd 6:0:0:1: Power-on or device reset occurred
[Sun May 7 10:39:52 2023] sd 6:0:0:1: Attached scsi generic sg1 type 0
[Sun May 7 10:39:52 2023] sd 6:0:0:1: [sdb] 20971520 512-byte logical blocks: (10.7 GB/10.0 GiB)
[Sun May 7 10:39:52 2023] sd 6:0:0:1: [sdb] Write Protect is off
[Sun May 7 10:39:52 2023] sd 6:0:0:1: [sdb] Mode Sense: 63 00 00 08
[Sun May 7 10:39:52 2023] sd 6:0:0:1: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[Sun May 7 10:39:52 2023] sd 6:0:0:1: [sdb] Attached SCSI disk
[Sun May 7 11:36:41 2023] sd 6:0:0:0: [sda] tag#12 FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK cmd_age=0s
[Sun May 7 11:36:41 2023] sd 6:0:0:0: [sda] tag#12 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00
[Sun May 7 11:36:41 2023] blk_update_request: I/O error, dev sda, sector 8651008 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
[Sun May 7 11:36:41 2023] Aborting journal on device sda-8.
[Sun May 7 11:36:41 2023] blk_update_request: I/O error, dev sda, sector 8650752 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
[Sun May 7 11:36:41 2023] blk_update_request: I/O error, dev sda, sector 8650752 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
[Sun May 7 11:36:41 2023] Buffer I/O error on dev sda, logical block 1081344, lost sync page write
[Sun May 7 11:36:41 2023] JBD2: Error -5 detected when updating journal superblock for sda-8.
[Sun May 7 11:36:41 2023] sd 6:0:0:0: [sda] Synchronizing SCSI cache
[Sun May 7 11:36:41 2023] sd 6:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[Sun May 7 11:36:41 2023] sd 6:0:0:0: [sda] Sense Key : Illegal Request [current]
[Sun May 7 11:36:41 2023] sd 6:0:0:0: [sda] Add. Sense: Invalid command operation code
What you expected to happen:
What I expected is that domain runs well.
Additional context:
virt-launcher pod part log :
{"component":"virt-launcher","level":"warning","msg":"Domain id=1 name='default_vmi-001' uuid=97f6b2c4-dff6-45fe-9952-5c296f8927c8 is tainted: custom-ga-command","pos":"qemuDomainObjTaintMsg:6618","subcomponent":"libvirt","thread":"32","timestamp":"2023-05-06T16:11:36.167000Z"}
{"component":"virt-launcher","level":"warning","msg":"Domain id=1 name='default_vmi-001' uuid=97f6b2c4-dff6-45fe-9952-5c296f8927c8 is tainted: custom-ga-command","pos":"qemuDomainObjTaintMsg:6618","subcomponent":"libvirt","thread":"31","timestamp":"2023-05-06T16:11:36.172000Z"}
{"component":"virt-launcher","level":"warning","msg":"Domain id=1 name='default_vmi-001' uuid=97f6b2c4-dff6-45fe-9952-5c296f8927c8 is tainted: custom-ga-command","pos":"qemuDomainObjTaintMsg:6618","subcomponent":"libvirt","thread":"30","timestamp":"2023-05-06T16:11:36.176000Z"}
{"component":"virt-launcher","level":"warning","msg":"Domain id=1 name='default_vmi-001' uuid=97f6b2c4-dff6-45fe-9952-5c296f8927c8 is tainted: custom-ga-command","pos":"qemuDomainObjTaintMsg:6618","subcomponent":"libvirt","thread":"33","timestamp":"2023-05-06T16:11:36.179000Z"}
{"component":"virt-launcher","level":"warning","msg":"Domain id=1 name='default_vmi-001' uuid=97f6b2c4-dff6-45fe-9952-5c296f8927c8 is tainted: custom-ga-command","pos":"qemuDomainObjTaintMsg:6618","subcomponent":"libvirt","thread":"29","timestamp":"2023-05-06T16:11:41.106000Z"}
{"component":"virt-launcher","level":"warning","msg":"Domain id=1 name='default_vmi-001' uuid=97f6b2c4-dff6-45fe-9952-5c296f8927c8 is tainted: custom-ga-command","pos":"qemuDomainObjTaintMsg:6618","subcomponent":"libvirt","thread":"32","timestamp":"2023-05-06T16:11:46.105000Z"}
{"component":"virt-launcher","level":"warning","msg":"Domain id=1 name='default_vmi-001' uuid=97f6b2c4-dff6-45fe-9952-5c296f8927c8 is tainted: custom-ga-command","pos":"qemuDomainObjTaintMsg:6618","subcomponent":"libvirt","thread":"31","timestamp":"2023-05-06T16:11:46.108000Z"}
{"component":"virt-launcher","level":"warning","msg":"Domain id=1 name='default_vmi-001' uuid=97f6b2c4-dff6-45fe-9952-5c296f8927c8 is tainted: custom-ga-command","pos":"qemuDomainObjTaintMsg:6618","subcomponent":"libvirt","thread":"32","timestamp":"2023-05-06T16:11:51.106000Z"}
{"component":"virt-launcher","level":"warning","msg":"Domain id=1 name='default_vmi-001' uuid=97f6b2c4-dff6-45fe-9952-5c296f8927c8 is tainted: custom-ga-command","pos":"qemuDomainObjTaintMsg:6618","subcomponent":"libvirt","thread":"29","timestamp":"2023-05-06T16:11:56.106000Z"}
{"component":"virt-launcher","level":"warning","msg":"Domain id=1 name='default_vmi-001' uuid=97f6b2c4-dff6-45fe-9952-5c296f8927c8 is tainted: custom-ga-command","pos":"qemuDomainObjTaintMsg:6618","subcomponent":"libvirt","thread":"33","timestamp":"2023-05-06T16:11:56.109000Z"}
{"component":"virt-launcher","level":"warning","msg":"Domain id=1 name='default_vmi-001' uuid=97f6b2c4-dff6-45fe-9952-5c296f8927c8 is tainted: custom-ga-command","pos":"qemuDomainObjTaintMsg:6618","subcomponent":"libvirt","thread":"31","timestamp":"2023-05-06T16:12:01.106000Z"}
{"component":"virt-launcher","level":"info","msg":"DomainLifecycle event 3 with reason 2 received","pos":"client.go:463","timestamp":"2023-05-06T16:12:04.063170Z"}
{"component":"virt-launcher","level":"info","msg":"kubevirt domain status: Paused(3):IOError(5)","pos":"client.go:295","timestamp":"2023-05-06T16:12:04.065371Z"}
{"component":"virt-launcher","level":"info","msg":"Domain name event: default_vmi-001","pos":"client.go:422","timestamp":"2023-05-06T16:12:04.155506Z"}
{"component":"virt-launcher","level":"info","msg":"DomainLifecycle event 4 with reason 0 received","pos":"client.go:463","timestamp":"2023-05-06T16:12:04.161337Z"}
{"component":"virt-launcher","level":"info","msg":"kubevirt domain status: Running(1):Unknown(5)","pos":"client.go:295","timestamp":"2023-05-06T16:12:04.163823Z"}
{"component":"virt-launcher","kind":"","level":"info","msg":"Domain unpaused.","name":"vmi-001","namespace":"default","pos":"manager.go:908","timestamp":"2023-05-06T16:12:04.163858Z","uid":"06ebcc37-451a-4ad0-b5fc-462bd3de09b8"}
{"component":"virt-launcher","kind":"","level":"info","msg":"Synced vmi","name":"vmi-001","namespace":"default","pos":"server.go:191","timestamp":"2023-05-06T16:12:04.165197Z","uid":"06ebcc37-451a-4ad0-b5fc-462bd3de09b8"}
{"component":"virt-launcher","level":"info","msg":"DomainLifecycle event 3 with reason 2 received","pos":"client.go:463","timestamp":"2023-05-06T16:12:04.166045Z"}
{"component":"virt-launcher","level":"info","msg":"Domain name event: default_vmi-001","pos":"client.go:422","timestamp":"2023-05-06T16:12:04.166549Z"}
{"component":"virt-launcher","level":"info","msg":"kubevirt domain status: Paused(3):IOError(5)","pos":"client.go:295","timestamp":"2023-05-06T16:12:04.168446Z"}
{"component":"virt-launcher","level":"info","msg":"Domain name event: default_vmi-001","pos":"client.go:422","timestamp":"2023-05-06T16:12:04.250000Z"}
{"component":"virt-launcher","level":"info","msg":"DomainLifecycle event 4 with reason 0 received","pos":"client.go:463","timestamp":"2023-05-06T16:12:04.252235Z"}
{"component":"virt-launcher","kind":"","level":"info","msg":"Domain unpaused.","name":"vmi-001","namespace":"default","pos":"manager.go:908","timestamp":"2023-05-06T16:12:04.254182Z","uid":"06ebcc37-451a-4ad0-b5fc-462bd3de09b8"}
{"component":"virt-launcher","level":"info","msg":"DomainLifecycle event 3 with reason 2 received","pos":"client.go:463","timestamp":"2023-05-06T16:12:04.258532Z"}
{"component":"virt-launcher","kind":"","level":"info","msg":"Synced vmi","name":"vmi-001","namespace":"default","pos":"server.go:191","timestamp":"2023-05-06T16:12:04.259668Z","uid":"06ebcc37-451a-4ad0-b5fc-462bd3de09b8"}
...
hp-volume- pod :
apiVersion: v1
kind: Pod
metadata:
annotations:
container.seccomp.security.alpha.kubernetes.io/hotplug-disk: runtime/default
k8s.v1.cni.cncf.io/network-status: |-
[{
"name": "kube-ovn",
"ips": [
"100.67.82.208"
],
"default": true,
"dns": {}
}]
k8s.v1.cni.cncf.io/networks-status: |-
[{
"name": "kube-ovn",
"ips": [
"100.67.82.208"
],
"default": true,
"dns": {}
}]
ovn.kubernetes.io/allocated: "true"
ovn.kubernetes.io/cidr: 100.67.0.0/16
ovn.kubernetes.io/gateway: 100.67.0.1
ovn.kubernetes.io/ip_address: 100.67.82.208
ovn.kubernetes.io/logical_router: ovn-cluster
ovn.kubernetes.io/logical_switch: ovn-default
ovn.kubernetes.io/mac_address: 00:00:00:B5:49:E3
ovn.kubernetes.io/pod_nic_type: veth-pair
ovn.kubernetes.io/routed: "true"
creationTimestamp: "2023-05-06T16:10:34Z"
generateName: hp-volume-
labels:
kubevirt.io: hotplug-disk
name: hp-volume-bkwkl
namespace: default
ownerReferences:
- apiVersion: v1
blockOwnerDeletion: true
controller: true
kind: Pod
name: virt-launcher-vmi-001-8q25h
uid: 6be02461-d523-4f68-8a3e-02896256ff1f
resourceVersion: "30191198"
uid: dfd0eec5-64ea-4452-8add-ddaebf9bb0da
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- kcp02
containers:
- command:
- /bin/sh
- -c
- /usr/bin/container-disk --copy-path /path/hp
image: <my-private-registry>/virt-launcher:v0.59.0
imagePullPolicy: IfNotPresent
name: hotplug-disk
resources:
limits:
cpu: 100m
memory: 80M
requests:
cpu: 10m
memory: 2M
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
runAsNonRoot: true
runAsUser: 107
seLinuxOptions:
level: s0
seccompProfile:
type: RuntimeDefault
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeDevices:
- devicePath: /path/cxl-hp-2/1b570192-5088-490d-93d3-dc7b1f89afa4
name: cxl-hp-2
volumeMounts:
- mountPath: /path
mountPropagation: HostToContainer
name: hotplug-disks
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-f8w5d
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
nodeName: kcp02
preemptionPolicy: PreemptLowerPriority
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 0
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- emptyDir: {}
name: hotplug-disks
- name: cxl-hp-1
persistentVolumeClaim:
claimName: cxl-hp-1
- name: cxl-hp-2
persistentVolumeClaim:
claimName: cxl-hp-2
- name: kube-api-access-f8w5d
projected:
defaultMode: 420
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
items:
- key: ca.crt
path: ca.crt
name: kube-root-ca.crt
- downwardAPI:
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
path: namespace
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2023-05-06T16:10:34Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2023-05-06T16:10:46Z"
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2023-05-06T16:10:46Z"
status: "True"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2023-05-06T16:10:34Z"
status: "True"
type: PodScheduled
containerStatuses:
- containerID: containerd://7247401b244f73038db56693a7c3811ba16a29be7372ddd9b2e5ff98b1c6eaa9
image: <my-private-registry>/virt-launcher:v0.59.0
imageID: <my-private-registry>/virt-launcher@sha256:dad34a6589d20f65ee370607936afefb0510e1f9c66e1ad63fe74c16d2937463
lastState: {}
name: hotplug-disk
ready: true
restartCount: 0
started: true
state:
running:
startedAt: "2023-05-06T16:10:45Z"
hostIP: 172.16.33.132
phase: Running
podIP: 100.67.82.208
podIPs:
- ip: 100.67.82.208
qosClass: Burstable
startTime: "2023-05-06T16:10:34Z"
Environment:
- KubeVirt version (use
virtctl version): 0.59.0 - Kubernetes version (use
kubectl version): v1.24.7 - VM or VMI specifications: N/A
- Cloud provider or hardware configuration: N/A
- OS (e.g. from /etc/os-release): Ubuntu 20.04.6 LTS
- Kernel (e.g.
uname -a): Linux x86_64 - Install tools: N/A
- Others: N/A
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 23 (13 by maintainers)
This sounds similar to https://github.com/kubevirt/kubevirt/issues/9263. There is a PR with a potential fix: https://github.com/kubevirt/kubevirt/pull/9269. Though, this needs to be checked for the nbd case.