kubevirt: Cirros VM hangs very long on dhcp discover
When I boot my cirros VM, it takes some time until it finally gets past the dhcp configuration (up to 30 secons).
Console output:
Starting acpid: OK
mcb [info=/dev/vdb dev=/dev/vdb target=tmp unmount=true callback=mcu_drop_dev_arg]: mount '/dev/vdb' '-o,ro' '/tmp/nocloud.mp.ODGQgF'
mcudda: fn=cp dev=/dev/vdb mp=/tmp/nocloud.mp.ODGQgF : -a /tmp/cirros-ds.INNIox/nocloud/raw
Starting network...
udhcpc (v1.23.2) started
Sending discover...
It hangs at the last line very long until it continues. Let’s check if we have a general issue, or if it is just cirros related. At the end it always works, but we don’t see something on the console for a very long time.
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 35 (35 by maintainers)
Commits related to this issue
- Switch default libvirt interface model type to e1000e This is an Express version of e1000 NIC type. The new type is default for q35 x86 machines starting qemu 2.12. (We currently use 2.10, but the fa... — committed to booxter/kubevirt by booxter 6 years ago
- Switch default network model type to virtio Closes #936 The ``virtio`` type doesn't have issues with slow guest link setup that makes cirros waste 60 seconds during boot when acquiring a dhcp lease.... — committed to booxter/kubevirt by booxter 6 years ago
- Switch default network model type to virtio Closes #936 The ``virtio`` type doesn't have issues with slow guest link setup that makes cirros waste 60 seconds during boot when acquiring a dhcp lease.... — committed to booxter/kubevirt by booxter 6 years ago
- Switch default network model type to virtio Closes #936 The ``virtio`` type doesn't have issues with slow guest link setup that makes cirros waste 60 seconds during boot when acquiring a dhcp lease.... — committed to booxter/kubevirt by booxter 6 years ago
- Switch default network model type to virtio Closes #936 The ``virtio`` type doesn't have issues with slow guest link setup that makes cirros waste 60 seconds during boot when acquiring a dhcp lease.... — committed to booxter/kubevirt by booxter 6 years ago
- Switch default network model type to virtio Closes #936 The ``virtio`` type doesn't have issues with slow guest link setup that makes cirros waste 60 seconds during boot when acquiring a dhcp lease.... — committed to booxter/kubevirt by booxter 6 years ago
- Switch default network model type to virtio Closes #936 The ``virtio`` type doesn't have issues with slow guest link setup that makes cirros waste 60 seconds during boot when acquiring a dhcp lease.... — committed to booxter/kubevirt by booxter 6 years ago
- Fix race in cmd.Exec where sometimes stdout/err was closed before being read. (#936) Signed-off-by: Alexander Wels <awels@redhat.com> — committed to kubevirt-bot/kubevirt by awels 5 years ago
An alternative to customizing interface model type via API would be labeling / tagging images with metadata that would specify hardware requirements. Then kubevirt would consume the metadata to make decisions on which model type to choose. (This is how OpenStack Nova does it: Glance images carry hw_vif_model tags that then Nova Compute consumes.)
I have the root cause.
When cirros boots, in dmesg we can see that it takes 2 seconds for the guest to get link ready on eth0 after the interface is up. (Later down / up calls happen immediately.) While kernel, asynchronously, set up link, udhcpc is already invoked. It issues its first request frame and it goes nowhere because link is not ready. Then after 60 seconds it retries and succeeds because the link is now up.
I switched model type for VIF from e1000 to virtio, and the issue is gone.
I was told that we use e1000 for Windows machines that don’t carry virtio drivers. If so, we can’t unconditionally change the default. For what I understand, current kubevirt API doesn’t provide a way to choose a different model type for VIFs. @vladikr mentioned that new API should support it, then we can switch to virtio for cirros template.
Ideally, cirros should wait for link up before starting udhcpc and / or issue DHCP requests more aggressively than once a minute. My experience tells me it won’t be easy to change anything in cirros, let alone we would still need to wait for a new image release. Regardless, we may want to have a way to deal with older images, so there must be a kubevirt side fix.