kubevirt: domain Paused(I/O Error) after hotplug the second volume to vm

What happened:

domain paused due to an I/O error after hotplug second volume to vm.

Produce steps:

  1. create a vmi as following yaml:
apiVersion: kubevirt.io/v1
kind: VirtualMachineInstance
metadata:
  name: vmi-001
spec:
  domain:
    devices:
      disks:
      - disk:
          bus: virtio
        name: containerdisk
      - disk:
          bus: virtio
        name: cloudinitdisk
      rng: {}
    memory:
      guest: "4096Mi"
    cpu:
      cores: 2
  terminationGracePeriodSeconds: 0
  volumes:
  - containerDisk:
      image: kubevirt/fedora-with-test-tooling-container-disk:devel
    name: containerdisk
  - cloudInitNoCloud:
      userData: |-
        #cloud-config
        password: fedora
        chpasswd: { expire: False }
    name: cloudinitdisk
  1. create a pvc as the first hotplug disk
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: cxl-hp-1
  namespace: default
  annotations:
    pv.kubernetes.io/bind-completed: 'yes'
    pv.kubernetes.io/bound-by-controller: 'yes'
    volume.beta.kubernetes.io/storage-provisioner: rbd.csi.ceph.com
    volume.kubernetes.io/storage-provisioner: rbd.csi.ceph.com
  finalizers:
    - kubernetes.io/pvc-protection
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  volumeName: pvc-242f649f-6dad-419b-a1c1-aa5c22f3e432
  storageClassName: csi-rbd-sc
  volumeMode: Block
  1. hotplug the first disk to vmi by using the following command
virtctl addvolume vmi-001 --volume-name=cxl-hp-1
  1. login into vm and write some data on the hotplugged disk cxl-hp-1
virtctl console vmi-001 
fdisk /dev/sda
mkfs.ext4 /dev/sda
mkdir /disk1
mount /dev/sda /disk1
echo "this is the first hotplugged disk" > /disk1/a.txt
  1. create another pvc as the second hotplug disk
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: cxl-hp-2
  namespace: default
  annotations:
    pv.kubernetes.io/bind-completed: 'yes'
    pv.kubernetes.io/bound-by-controller: 'yes'
    volume.beta.kubernetes.io/storage-provisioner: rbd.csi.ceph.com
    volume.kubernetes.io/storage-provisioner: rbd.csi.ceph.com
  finalizers:
    - kubernetes.io/pvc-protection
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  volumeName: pvc-1b570192-5088-490d-93d3-dc7b1f89afa4
  storageClassName: csi-rbd-sc
  volumeMode: Block
  1. hotplug the second disk to vmi by using the following command
virtctl addvolume vmi-001 --volume-name=cxl-hp-2
  1. login into vm and write some data still on the first hotplugged disk cxl-hp-1
virctl console vmi-001
echo "test" > /disk1/b.txt
  1. error happened !

At this time, vmi status is still “running”, but the domain state is “paused” due to I/O error

~# kubectl exec -it virt-launcher-vmi-001-8q25h -- virsh domstate 1 --reason
Authorization not available. Check if polkit service is running or see debug message for more information.
paused (I/O error)
  1. remove the error disk with virtctl removevolume and the vmi restore from pause to running

  2. login into vmi using virtctl console and check the dmesg

[Sun May  7 10:21:03 2023] ISO 9660 Extensions: Microsoft Joliet Level 3
[Sun May  7 10:21:03 2023] ISO 9660 Extensions: RRIP_1991A
[Sun May  7 10:22:35 2023] scsi 6:0:0:0: Direct-Access     QEMU     QEMU HARDDISK    2.5+ PQ: 0 ANSI: 5
[Sun May  7 10:22:35 2023] scsi 6:0:0:0: Attached scsi generic sg0 type 0
[Sun May  7 10:22:35 2023] sd 6:0:0:0: Power-on or device reset occurred
[Sun May  7 10:22:35 2023] sd 6:0:0:0: [sda] 20971520 512-byte logical blocks: (10.7 GB/10.0 GiB)
[Sun May  7 10:22:35 2023] sd 6:0:0:0: [sda] Write Protect is off
[Sun May  7 10:22:35 2023] sd 6:0:0:0: [sda] Mode Sense: 63 00 00 08
[Sun May  7 10:22:35 2023] sd 6:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[Sun May  7 10:22:35 2023] sd 6:0:0:0: [sda] Attached SCSI disk
[Sun May  7 10:24:43 2023]  sda: sda1
[Sun May  7 10:37:09 2023]  sda: sda1
[Sun May  7 10:37:09 2023]  sda: sda1
[Sun May  7 10:37:37 2023] EXT4-fs (sda): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
[Sun May  7 10:39:52 2023] scsi 6:0:0:1: Direct-Access     QEMU     QEMU HARDDISK    2.5+ PQ: 0 ANSI: 5
[Sun May  7 10:39:52 2023] sd 6:0:0:1: Power-on or device reset occurred
[Sun May  7 10:39:52 2023] sd 6:0:0:1: Attached scsi generic sg1 type 0
[Sun May  7 10:39:52 2023] sd 6:0:0:1: [sdb] 20971520 512-byte logical blocks: (10.7 GB/10.0 GiB)
[Sun May  7 10:39:52 2023] sd 6:0:0:1: [sdb] Write Protect is off
[Sun May  7 10:39:52 2023] sd 6:0:0:1: [sdb] Mode Sense: 63 00 00 08
[Sun May  7 10:39:52 2023] sd 6:0:0:1: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[Sun May  7 10:39:52 2023] sd 6:0:0:1: [sdb] Attached SCSI disk
[Sun May  7 11:36:41 2023] sd 6:0:0:0: [sda] tag#12 FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK cmd_age=0s
[Sun May  7 11:36:41 2023] sd 6:0:0:0: [sda] tag#12 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00
[Sun May  7 11:36:41 2023] blk_update_request: I/O error, dev sda, sector 8651008 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
[Sun May  7 11:36:41 2023] Aborting journal on device sda-8.
[Sun May  7 11:36:41 2023] blk_update_request: I/O error, dev sda, sector 8650752 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
[Sun May  7 11:36:41 2023] blk_update_request: I/O error, dev sda, sector 8650752 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
[Sun May  7 11:36:41 2023] Buffer I/O error on dev sda, logical block 1081344, lost sync page write
[Sun May  7 11:36:41 2023] JBD2: Error -5 detected when updating journal superblock for sda-8.
[Sun May  7 11:36:41 2023] sd 6:0:0:0: [sda] Synchronizing SCSI cache
[Sun May  7 11:36:41 2023] sd 6:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[Sun May  7 11:36:41 2023] sd 6:0:0:0: [sda] Sense Key : Illegal Request [current]
[Sun May  7 11:36:41 2023] sd 6:0:0:0: [sda] Add. Sense: Invalid command operation code

What you expected to happen:

What I expected is that domain runs well.

Additional context:

virt-launcher pod part log :

{"component":"virt-launcher","level":"warning","msg":"Domain id=1 name='default_vmi-001' uuid=97f6b2c4-dff6-45fe-9952-5c296f8927c8 is tainted: custom-ga-command","pos":"qemuDomainObjTaintMsg:6618","subcomponent":"libvirt","thread":"32","timestamp":"2023-05-06T16:11:36.167000Z"}
{"component":"virt-launcher","level":"warning","msg":"Domain id=1 name='default_vmi-001' uuid=97f6b2c4-dff6-45fe-9952-5c296f8927c8 is tainted: custom-ga-command","pos":"qemuDomainObjTaintMsg:6618","subcomponent":"libvirt","thread":"31","timestamp":"2023-05-06T16:11:36.172000Z"}
{"component":"virt-launcher","level":"warning","msg":"Domain id=1 name='default_vmi-001' uuid=97f6b2c4-dff6-45fe-9952-5c296f8927c8 is tainted: custom-ga-command","pos":"qemuDomainObjTaintMsg:6618","subcomponent":"libvirt","thread":"30","timestamp":"2023-05-06T16:11:36.176000Z"}
{"component":"virt-launcher","level":"warning","msg":"Domain id=1 name='default_vmi-001' uuid=97f6b2c4-dff6-45fe-9952-5c296f8927c8 is tainted: custom-ga-command","pos":"qemuDomainObjTaintMsg:6618","subcomponent":"libvirt","thread":"33","timestamp":"2023-05-06T16:11:36.179000Z"}
{"component":"virt-launcher","level":"warning","msg":"Domain id=1 name='default_vmi-001' uuid=97f6b2c4-dff6-45fe-9952-5c296f8927c8 is tainted: custom-ga-command","pos":"qemuDomainObjTaintMsg:6618","subcomponent":"libvirt","thread":"29","timestamp":"2023-05-06T16:11:41.106000Z"}
{"component":"virt-launcher","level":"warning","msg":"Domain id=1 name='default_vmi-001' uuid=97f6b2c4-dff6-45fe-9952-5c296f8927c8 is tainted: custom-ga-command","pos":"qemuDomainObjTaintMsg:6618","subcomponent":"libvirt","thread":"32","timestamp":"2023-05-06T16:11:46.105000Z"}
{"component":"virt-launcher","level":"warning","msg":"Domain id=1 name='default_vmi-001' uuid=97f6b2c4-dff6-45fe-9952-5c296f8927c8 is tainted: custom-ga-command","pos":"qemuDomainObjTaintMsg:6618","subcomponent":"libvirt","thread":"31","timestamp":"2023-05-06T16:11:46.108000Z"}
{"component":"virt-launcher","level":"warning","msg":"Domain id=1 name='default_vmi-001' uuid=97f6b2c4-dff6-45fe-9952-5c296f8927c8 is tainted: custom-ga-command","pos":"qemuDomainObjTaintMsg:6618","subcomponent":"libvirt","thread":"32","timestamp":"2023-05-06T16:11:51.106000Z"}
{"component":"virt-launcher","level":"warning","msg":"Domain id=1 name='default_vmi-001' uuid=97f6b2c4-dff6-45fe-9952-5c296f8927c8 is tainted: custom-ga-command","pos":"qemuDomainObjTaintMsg:6618","subcomponent":"libvirt","thread":"29","timestamp":"2023-05-06T16:11:56.106000Z"}
{"component":"virt-launcher","level":"warning","msg":"Domain id=1 name='default_vmi-001' uuid=97f6b2c4-dff6-45fe-9952-5c296f8927c8 is tainted: custom-ga-command","pos":"qemuDomainObjTaintMsg:6618","subcomponent":"libvirt","thread":"33","timestamp":"2023-05-06T16:11:56.109000Z"}
{"component":"virt-launcher","level":"warning","msg":"Domain id=1 name='default_vmi-001' uuid=97f6b2c4-dff6-45fe-9952-5c296f8927c8 is tainted: custom-ga-command","pos":"qemuDomainObjTaintMsg:6618","subcomponent":"libvirt","thread":"31","timestamp":"2023-05-06T16:12:01.106000Z"}
{"component":"virt-launcher","level":"info","msg":"DomainLifecycle event 3 with reason 2 received","pos":"client.go:463","timestamp":"2023-05-06T16:12:04.063170Z"}
{"component":"virt-launcher","level":"info","msg":"kubevirt domain status: Paused(3):IOError(5)","pos":"client.go:295","timestamp":"2023-05-06T16:12:04.065371Z"}
{"component":"virt-launcher","level":"info","msg":"Domain name event: default_vmi-001","pos":"client.go:422","timestamp":"2023-05-06T16:12:04.155506Z"}
{"component":"virt-launcher","level":"info","msg":"DomainLifecycle event 4 with reason 0 received","pos":"client.go:463","timestamp":"2023-05-06T16:12:04.161337Z"}
{"component":"virt-launcher","level":"info","msg":"kubevirt domain status: Running(1):Unknown(5)","pos":"client.go:295","timestamp":"2023-05-06T16:12:04.163823Z"}
{"component":"virt-launcher","kind":"","level":"info","msg":"Domain unpaused.","name":"vmi-001","namespace":"default","pos":"manager.go:908","timestamp":"2023-05-06T16:12:04.163858Z","uid":"06ebcc37-451a-4ad0-b5fc-462bd3de09b8"}
{"component":"virt-launcher","kind":"","level":"info","msg":"Synced vmi","name":"vmi-001","namespace":"default","pos":"server.go:191","timestamp":"2023-05-06T16:12:04.165197Z","uid":"06ebcc37-451a-4ad0-b5fc-462bd3de09b8"}
{"component":"virt-launcher","level":"info","msg":"DomainLifecycle event 3 with reason 2 received","pos":"client.go:463","timestamp":"2023-05-06T16:12:04.166045Z"}
{"component":"virt-launcher","level":"info","msg":"Domain name event: default_vmi-001","pos":"client.go:422","timestamp":"2023-05-06T16:12:04.166549Z"}
{"component":"virt-launcher","level":"info","msg":"kubevirt domain status: Paused(3):IOError(5)","pos":"client.go:295","timestamp":"2023-05-06T16:12:04.168446Z"}
{"component":"virt-launcher","level":"info","msg":"Domain name event: default_vmi-001","pos":"client.go:422","timestamp":"2023-05-06T16:12:04.250000Z"}
{"component":"virt-launcher","level":"info","msg":"DomainLifecycle event 4 with reason 0 received","pos":"client.go:463","timestamp":"2023-05-06T16:12:04.252235Z"}
{"component":"virt-launcher","kind":"","level":"info","msg":"Domain unpaused.","name":"vmi-001","namespace":"default","pos":"manager.go:908","timestamp":"2023-05-06T16:12:04.254182Z","uid":"06ebcc37-451a-4ad0-b5fc-462bd3de09b8"}
{"component":"virt-launcher","level":"info","msg":"DomainLifecycle event 3 with reason 2 received","pos":"client.go:463","timestamp":"2023-05-06T16:12:04.258532Z"}
{"component":"virt-launcher","kind":"","level":"info","msg":"Synced vmi","name":"vmi-001","namespace":"default","pos":"server.go:191","timestamp":"2023-05-06T16:12:04.259668Z","uid":"06ebcc37-451a-4ad0-b5fc-462bd3de09b8"}
...

hp-volume- pod :

apiVersion: v1
kind: Pod
metadata:
  annotations:
    container.seccomp.security.alpha.kubernetes.io/hotplug-disk: runtime/default
    k8s.v1.cni.cncf.io/network-status: |-
      [{
          "name": "kube-ovn",
          "ips": [
              "100.67.82.208"
          ],
          "default": true,
          "dns": {}
      }]
    k8s.v1.cni.cncf.io/networks-status: |-
      [{
          "name": "kube-ovn",
          "ips": [
              "100.67.82.208"
          ],
          "default": true,
          "dns": {}
      }]
    ovn.kubernetes.io/allocated: "true"
    ovn.kubernetes.io/cidr: 100.67.0.0/16
    ovn.kubernetes.io/gateway: 100.67.0.1
    ovn.kubernetes.io/ip_address: 100.67.82.208
    ovn.kubernetes.io/logical_router: ovn-cluster
    ovn.kubernetes.io/logical_switch: ovn-default
    ovn.kubernetes.io/mac_address: 00:00:00:B5:49:E3
    ovn.kubernetes.io/pod_nic_type: veth-pair
    ovn.kubernetes.io/routed: "true"
  creationTimestamp: "2023-05-06T16:10:34Z"
  generateName: hp-volume-
  labels:
    kubevirt.io: hotplug-disk
  name: hp-volume-bkwkl
  namespace: default
  ownerReferences:
  - apiVersion: v1
    blockOwnerDeletion: true
    controller: true
    kind: Pod
    name: virt-launcher-vmi-001-8q25h
    uid: 6be02461-d523-4f68-8a3e-02896256ff1f
  resourceVersion: "30191198"
  uid: dfd0eec5-64ea-4452-8add-ddaebf9bb0da
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/hostname
            operator: In
            values:
            - kcp02
  containers:
  - command:
    - /bin/sh
    - -c
    - /usr/bin/container-disk --copy-path /path/hp
    image: <my-private-registry>/virt-launcher:v0.59.0
    imagePullPolicy: IfNotPresent
    name: hotplug-disk
    resources:
      limits:
        cpu: 100m
        memory: 80M
      requests:
        cpu: 10m
        memory: 2M
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      runAsNonRoot: true
      runAsUser: 107
      seLinuxOptions:
        level: s0
      seccompProfile:
        type: RuntimeDefault
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeDevices:
    - devicePath: /path/cxl-hp-2/1b570192-5088-490d-93d3-dc7b1f89afa4
      name: cxl-hp-2
    volumeMounts:
    - mountPath: /path
      mountPropagation: HostToContainer
      name: hotplug-disks
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-f8w5d
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: kcp02
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 0
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - emptyDir: {}
    name: hotplug-disks
  - name: cxl-hp-1
    persistentVolumeClaim:
      claimName: cxl-hp-1
  - name: cxl-hp-2
    persistentVolumeClaim:
      claimName: cxl-hp-2
  - name: kube-api-access-f8w5d
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2023-05-06T16:10:34Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2023-05-06T16:10:46Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2023-05-06T16:10:46Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2023-05-06T16:10:34Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: containerd://7247401b244f73038db56693a7c3811ba16a29be7372ddd9b2e5ff98b1c6eaa9
    image: <my-private-registry>/virt-launcher:v0.59.0
    imageID: <my-private-registry>/virt-launcher@sha256:dad34a6589d20f65ee370607936afefb0510e1f9c66e1ad63fe74c16d2937463
    lastState: {}
    name: hotplug-disk
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2023-05-06T16:10:45Z"
  hostIP: 172.16.33.132
  phase: Running
  podIP: 100.67.82.208
  podIPs:
  - ip: 100.67.82.208
  qosClass: Burstable
  startTime: "2023-05-06T16:10:34Z"

Environment:

  • KubeVirt version (use virtctl version): 0.59.0
  • Kubernetes version (use kubectl version): v1.24.7
  • VM or VMI specifications: N/A
  • Cloud provider or hardware configuration: N/A
  • OS (e.g. from /etc/os-release): Ubuntu 20.04.6 LTS
  • Kernel (e.g. uname -a): Linux x86_64
  • Install tools: N/A
  • Others: N/A

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 23 (13 by maintainers)

Most upvoted comments

This sounds similar to https://github.com/kubevirt/kubevirt/issues/9263. There is a PR with a potential fix: https://github.com/kubevirt/kubevirt/pull/9269. Though, this needs to be checked for the nbd case.