kubevirt: VMs cannot reboot when emulation is used
Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
What happened:
With useEmulation: true, VMs are unable to soft reboot.
What you expected to happen:
It should be slow, but they should eventually reboot.
How to reproduce it (as minimally and precisely as possible):
Deploy KubeVirt with:
---
apiVersion: kubevirt.io/v1
kind: KubeVirt
metadata:
name: kubevirt
namespace: kubevirt
spec:
certificateRotateStrategy: {}
configuration:
developerConfiguration:
featureGates: []
useEmulation: true
customizeComponents: {}
imagePullPolicy: IfNotPresent
workloadUpdateStrategy: {}
Create a cirros VM (although it is reproducible with Fedora too):
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
labels:
kubevirt.io/vm: vm-cirros
name: vm-cirros
spec:
running: true
template:
metadata:
labels:
kubevirt.io/vm: vm-cirros
spec:
domain:
devices:
interfaces:
- masquerade: {}
name: default
disks:
- disk:
bus: virtio
name: containerdisk
- disk:
bus: virtio
name: cloudinitdisk
resources:
requests:
memory: 2Gi
terminationGracePeriodSeconds: 0
networks:
- name: default
pod: {}
volumes:
- containerDisk:
image: quay.io/kubevirt/cirros-container-disk-demo:latest
name: containerdisk
- cloudInitNoCloud:
userData: |
#!/bin/sh
echo 'printed from cloud-init userdata'
name: cloudinitdisk
Connect using console and reboot:
virtctl console vm-cirros
reboot
It turns down but never turns up again:
The system is going down NOW!
Sent SIGTERM to all processes
Sent SIGKILL to all processes
Requesting system reboot
[ 215.325881] reboot: Restarting system
[ 215.326396] reboot: machine restart
Environment:
- KubeVirt version (use
virtctl version):[kubevirt] virtctl version Client Version: version.Info{GitVersion:"v0.43.0", GitCommit:"7c7a2f4ace9ce3a88b164d4d282db55f08b6dc5e", GitTreeState:"clean", BuildDate:"2021-07-09T15:54:26Z", GoVersion:"go1.16.1", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{GitVersion:"v0.41.4-dirty", GitCommit:"0b3c97aed8051d2985b3ee2dee944ffa656822bc", GitTreeState:"dirty", BuildDate:"2021-10-20T07:29:44Z", GoVersion:"go1.13.14", Compiler:"gc", Platform:"linux/amd64"} - Kubernetes version (use
kubectl version):Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v0.21.0-beta.1", GitCommit:"96e95cef877ba04872b88e4e2597eabb0174d182", GitTreeState:"clean", BuildDate:"2021-09-10T13:09:35Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.17", GitCommit:"68b4e26caf6ede7af577db4af62fb405b4dd47e6", GitTreeState:"clean", BuildDate:"2021-03-18T00:54:02Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"} WARNING: version difference between client (0.21) and server (1.18) exceeds the supported minor version skew of +/-1 - VM or VMI specifications: In description above
- Others: using kubevirtci on KubeVirt’s release-0.41
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 2
- Comments: 21 (18 by maintainers)
I did some testing and it looks like rebooting emulated VMs last worked in
KUBEVIRT_VERSION=v0.35.0which used QEMU version 4.2.0 (qemu-kvm-4.2.0-27.fc31) and it stopped working in v0.36.0 which updated QEMU to 5.1.0 (qemu-kvm-5.1.0-16.fc32). When it was working in v0.35 it still hung for about 35 seconds before finally continuing:After upgrading to v0.36 and using virtctl to stop/start the VMI (to recreate the pod), the reboot command stops working and results in the qemu-kvm process pegging a CPU core on the node. Even trying to use virsh commands inside the compute container causes the same result:
When checking QEMU issues I did see one that looked somewhat similar (reboot loop for emulated q35 machines), so I tried to reproduce the issue outside of kubevirt:
which exhibits the same behavior when trying to reboot the VM (and shows a similar increase in QEMU CPU usage). If you omit the
--virt-type=qemuargument it reboots with no delay (using kvm instead of emulation) and if you omit just the--machine q35arguments it creates a qemu process with-machine pc-i440fx-6.1instead of-machine pc-q35-6.1and also works as expected (although with a 65 second delay instead of 35 seconds).So this seems like this could be a long lived regression in QEMU emulation. Using a different machine type would be a workaround, but looks like kubevirt only allows using q35.