kairos: š Extremely slow boot with "Build Kairos from scratch" on bare metal
Issue
Following the instructions from https://kairos.io/docs/reference/build-from-scratch/ + https://kairos.io/docs/getting-started/#booting on bare metal leads to a successful installation which shows a BIOS boot entry for the specified USB device.
However, after booting into KairOS and selecting Kairos (interactive install) in GRUB2, it takes a extremely long time to load before it freezes.
This message is shown for about 1 minute 20 seconds:
Loading kernel⦠Loading initrdā¦
Then after another minute the boot process freezes at:
[drm] amdgpu kernel modesetting enabled. input: HDA ATI HDMI HDMI/DP,pcm=3 as /device/pci0000:00/0000:00:03.1/000:09:00.0/0000:0a:00.0/0000:0b:00.1/sound/card0/input16 amdgpu: Ignoring ACPI CRAFT on non-APU system
Reproduce
These are copy and pasted instructions from the documentation:
Dockerfile
FROM fedora:36
# Install any package wanted here
# Note we need to install _at least_ the minimum required packages for Kairos to work:
# - An init system (systemd)
# - Grub
# - kernel/initramfs
RUN echo "install_weak_deps=False" >> /etc/dnf/dnf.conf
RUN dnf install -y \
audit \
coreutils \
curl \
device-mapper \
dosfstools \
dracut \
dracut-live \
dracut-network \
dracut-squash \
e2fsprogs \
efibootmgr \
gawk \
gdisk \
grub2 \
grub2-efi-x64 \
grub2-efi-x64-modules \
grub2-pc \
haveged \
kernel \
kernel-modules \
kernel-modules-extra \
livecd-tools \
lvm2 \
nano \
NetworkManager \
openssh-server \
parted \
polkit \
rsync \
shim-x64 \
squashfs-tools \
sudo \
systemd \
systemd-networkd \
systemd-resolved \
tar \
which \
&& dnf clean all
RUN mkdir -p /run/lock
RUN touch /usr/libexec/.keep
# Copy the Kairos framework files. We use master builds here for fedora. See https://quay.io/repository/kairos/framework?tab=tags for a list
COPY --from=quay.io/kairos/framework:master_fedora / /
# Activate Kairos services
RUN systemctl enable cos-setup-reconcile.timer && \
systemctl enable cos-setup-fs.service && \
systemctl enable cos-setup-boot.service && \
systemctl enable cos-setup-network.service
## Generate initrd
RUN kernel=$(ls /boot/vmlinuz-* | head -n1) && \
ln -sf "${kernel#/boot/}" /boot/vmlinuz
RUN kernel=$(ls /lib/modules | head -n1) && \
dracut -v -N -f "/boot/initrd-${kernel}" "${kernel}" && \
ln -sf "initrd-${kernel}" /boot/initrd && depmod -a "${kernel}"
RUN rm -rf /boot/initramfs-*
Commands:
$ docker build -t test-byoi .
$ docker run -v "$PWD"/build:/tmp/auroraboot \
-v /var/run/docker.sock:/var/run/docker.sock \
--rm -ti quay.io/kairos/auroraboot:v0.2.4 \
--set container_image=docker://test-byoi \
--set "disable_http_server=true" \
--set "disable_netboot=true" \
--set "state_dir=/tmp/auroraboot"
# Flash ISO to USB
$ fdisk /dev/sdc
g
w
$ sudo dd if=build/iso/kairos.iso of=/dev/sdc bs=4MB
More info
Kairos version:
- quay.io/kairos/framework:master_fedora
- quay.io/kairos/auroraboot:v0.2.4
CPU architecture, OS, and Version: x86_64, Fedora 36
Expected behavior Booting Fedora Workstation & Server from the official ISO takes merely seconds and doesnāt freeze
Additional context CPU: AMD Ryzen 9 5950X GPU: AMD Radeon RX 6800 XT
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 18 (13 by maintainers)
Patch has landed on yip to run the datasources in parallel https://github.com/mudler/yip/pull/99 This will be in the v2.3.0 kairos release and should reduce the time it takes the cos-setup-boot to about 2/3 seconds max.
We have merged all the fixes, and 2.3.0 is about to be released soon (#1066 ) , closing this issue for now. Please re-open if itās still present in 2.3.0
They are facilities to fully configure the system and modify it via config files, cloud-config style. See the base system has several of those to enable services based on the system boot selection, store things to make immutability work and even generates the bind/ephemeral mounts during boot.
They are run at different times during boot and they differ, for example there is the
initramfsstage in which the configs are run in the initrafms with the system already mounted and chrooted into it, so you can modify things before moving into userspace like if you were on the system. Another stage isnetworkwhich is triggered once there is network availabel, another one isrootfswhich is the first one run, during initramfs when the sysroot is mounted but we dont run it inside the system, so you can modify things in the initramfs.For example, when you use the interactive installer, a config file is generated that on each boot creates your user during the initramfs stage (you can check your own file in any installed system under
/oem/90_custom.yaml):You can see all of the base system confis under: https://github.com/kairos-io/kairos/tree/master/overlay/files/system/oem And read more about it and the stages on: https://kairos.io/docs/architecture/cloud-init/
With the patch it looks much better:
Still 20 seconds to run, which makes sense as there are 10 providers with a max 2 seconfs timeout, so they are maxing out the timeout (we can ignore the cdrom provider as that one doesnāt have a timeout)
I think we could still improve this by shipping only the cdrom provider out of the box and letting users set the datasources themselves in they cloud-config @mudler @mauromorales @jimmykarily thoughts? Any reason we need to ship all 10 providers?
Unfortunately that is a manual step, when presented with the grub menu selection to boot from USB you can press
eand it will drop you into the entry config. There you can remove theconsole=ttyS0and pressctrl+xto boot from the edited entry:Pretty sure the delay is due to the metadata providers, specifically the
Packetprovider which doesnt have a default timeout of 2 seconds like the rest, and its NOT an internal address! So its always resolved but never answersā¦I guess it gets resolved internally if you run teh query inside an equinix machine šhttps://deploy.equinix.com/developers/docs/metal/server-metadata/metadata/
The rest of the providers have a 2 second timeout which should be more than enough. I already solved this issue in the upstream library but it never trickled down to kairos š¤¦
So fix to upstream is: https://github.com/rancher-sandbox/linuxkit/commit/432a87ba3e0931dc4819dd50bdf4536062b39768 Fix to yip: https://github.com/mudler/yip/commit/dcdc988cf703aeca003e868b1071c95e2a30ece5 Fix to kairos-agent: https://github.com/kairos-io/kairos-agent/pull/65
There is also a workaround for now in which you just need to remove the āpackerā provider from the
00_datasource.yamlfile which comes from the framework and should be under/system/oem/by running in your dockerfile after copying the framework files:Thanks for the replies guys!
Yes, @Itxaka Hereās a recap:
Docker Fedora 36 (^ my first comment) We can safely ignore the freeze as this doesnāt happen with Fedora 38, new AMD gpus are known to have all sorts of issues with older kernels. I think it might be worth to consider if the example at https://kairos.io/docs/reference/build-from-scratch/#build-a-container-image should reference a up to date image from https://github.com/kairos-io/kairos/tree/master/images to avoid confusion š
KairOS quay.io/kairos/core-opensuse-tumbleweed:latest (^ my update 1) Boot time: -/- Freezes: Yes Journalctl: kairos_tumbleweed_journalctl.log Dmesg: kairos_tumbleweed_dmesg.log Rdsosreport: kairos_tumbleweed_rdsosreport.txt
KairOS quay.io/kairos/core-fedora:latest (^ my update 2 - for reference) Boot: 1m14s initramfs + 1m21s systemd (total 2m35s) Freezes: No Journalctl: kairos_fedora_journalctl.log Dmesg: kairos_fedora_dmesg.log
OpenSuSE Tumbleweed official installation (for reference) Boot time: 2s initramfs + 10s systemd (total 12s) Freezes: No Journalctl: official_tumbleweed_journalctl.log Dmesg: official_tumbleweed_dmesg.log
Hope this helps!
just tested this on my baremetal and can reproduce it (with the first comment image)
You should edit the grub cmdline and remove the
console=ttyS0, that fixed it for me. Seems like as normal PCs dont have serial ports (usually) initrd gets kind of stuck there, maybe trying to connect or something?Removing that part from the cmdline causes the initrd to load in a couple of seconds and then it fails to load something nvidia related š
Testing with the update 1 way.
Iād check if itās due to compression of the initramfs. Try disabling it creating a file in
/etc/dracut.conf.d/disable_compression.confbefore thedracutcalls containing: