kairos: šŸ› Extremely slow boot with "Build Kairos from scratch" on bare metal

Issue

Following the instructions from https://kairos.io/docs/reference/build-from-scratch/ + https://kairos.io/docs/getting-started/#booting on bare metal leads to a successful installation which shows a BIOS boot entry for the specified USB device.

However, after booting into KairOS and selecting Kairos (interactive install) in GRUB2, it takes a extremely long time to load before it freezes.

This message is shown for about 1 minute 20 seconds:

Loading kernel… Loading initrd…

Then after another minute the boot process freezes at:

[drm] amdgpu kernel modesetting enabled. input: HDA ATI HDMI HDMI/DP,pcm=3 as /device/pci0000:00/0000:00:03.1/000:09:00.0/0000:0a:00.0/0000:0b:00.1/sound/card0/input16 amdgpu: Ignoring ACPI CRAFT on non-APU system

Reproduce

These are copy and pasted instructions from the documentation:

Dockerfile

FROM fedora:36

# Install any package wanted here
# Note we need to install _at least_ the minimum required packages for Kairos to work:
# - An init system (systemd)
# - Grub
# - kernel/initramfs 
RUN echo "install_weak_deps=False" >> /etc/dnf/dnf.conf

RUN dnf install -y \
    audit \
    coreutils \
    curl \
    device-mapper \
    dosfstools \
    dracut \
    dracut-live \
    dracut-network \
    dracut-squash \
    e2fsprogs \
    efibootmgr \
    gawk \
    gdisk \
    grub2 \
    grub2-efi-x64 \
    grub2-efi-x64-modules \
    grub2-pc \
    haveged \
    kernel \
    kernel-modules \
    kernel-modules-extra \
    livecd-tools \
    lvm2 \
    nano \
    NetworkManager \
    openssh-server \
    parted \
    polkit \
    rsync \
    shim-x64 \
    squashfs-tools \ 
    sudo \
    systemd \
    systemd-networkd \
    systemd-resolved \
    tar \
    which \
    && dnf clean all

RUN mkdir -p /run/lock
RUN touch /usr/libexec/.keep

# Copy the Kairos framework files. We use master builds here for fedora. See https://quay.io/repository/kairos/framework?tab=tags for a list
COPY --from=quay.io/kairos/framework:master_fedora / /

# Activate Kairos services
RUN systemctl enable cos-setup-reconcile.timer && \
          systemctl enable cos-setup-fs.service && \
          systemctl enable cos-setup-boot.service && \
          systemctl enable cos-setup-network.service

## Generate initrd
RUN kernel=$(ls /boot/vmlinuz-* | head -n1) && \
            ln -sf "${kernel#/boot/}" /boot/vmlinuz
RUN kernel=$(ls /lib/modules | head -n1) && \
            dracut -v -N -f "/boot/initrd-${kernel}" "${kernel}" && \
            ln -sf "initrd-${kernel}" /boot/initrd && depmod -a "${kernel}"
RUN rm -rf /boot/initramfs-*

Commands:

$ docker build -t test-byoi .

$ docker run -v "$PWD"/build:/tmp/auroraboot \
    -v /var/run/docker.sock:/var/run/docker.sock \
    --rm -ti quay.io/kairos/auroraboot:v0.2.4 \
    --set container_image=docker://test-byoi \
    --set "disable_http_server=true" \
    --set "disable_netboot=true" \
    --set "state_dir=/tmp/auroraboot"

# Flash ISO to USB
$ fdisk /dev/sdc
g
w

$ sudo dd if=build/iso/kairos.iso of=/dev/sdc bs=4MB

More info

Kairos version:

  • quay.io/kairos/framework:master_fedora
  • quay.io/kairos/auroraboot:v0.2.4

CPU architecture, OS, and Version: x86_64, Fedora 36

Expected behavior Booting Fedora Workstation & Server from the official ISO takes merely seconds and doesn’t freeze

Additional context CPU: AMD Ryzen 9 5950X GPU: AMD Radeon RX 6800 XT

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 18 (13 by maintainers)

Most upvoted comments

Patch has landed on yip to run the datasources in parallel https://github.com/mudler/yip/pull/99 This will be in the v2.3.0 kairos release and should reduce the time it takes the cos-setup-boot to about 2/3 seconds max.

We have merged all the fixes, and 2.3.0 is about to be released soon (#1066 ) , closing this issue for now. Please re-open if it’s still present in 2.3.0

This is boot with only the cdrom datasource enabled:

Thanks for taking the rime to debug this @Itxaka

I’m using KairOS to create my own minimal immutable desktop distro and I’m confused about the cos* services. Do I really need them at boot? What purpose do they have outside of k3s/cloud?

They are facilities to fully configure the system and modify it via config files, cloud-config style. See the base system has several of those to enable services based on the system boot selection, store things to make immutability work and even generates the bind/ephemeral mounts during boot.

They are run at different times during boot and they differ, for example there is the initramfs stage in which the configs are run in the initrafms with the system already mounted and chrooted into it, so you can modify things before moving into userspace like if you were on the system. Another stage is network which is triggered once there is network availabel, another one is rootfs which is the first one run, during initramfs when the sysroot is mounted but we dont run it inside the system, so you can modify things in the initramfs.

For example, when you use the interactive installer, a config file is generated that on each boot creates your user during the initramfs stage (you can check your own file in any installed system under /oem/90_custom.yaml):

#cloud-config
install:
    device: /dev/vda
name: Config generated by the installer
stages:
    initramfs:
        - users:
            kairos:
                groups:
                    - admin
                name: kairos
                passwd: kairos

You can see all of the base system confis under: https://github.com/kairos-io/kairos/tree/master/overlay/files/system/oem And read more about it and the stages on: https://kairos.io/docs/architecture/cloud-init/

With the patch it looks much better:

image

Still 20 seconds to run, which makes sense as there are 10 providers with a max 2 seconfs timeout, so they are maxing out the timeout (we can ignore the cdrom provider as that one doesn’t have a timeout)

I think we could still improve this by shipping only the cdrom provider out of the box and letting users set the datasources themselves in they cloud-config @mudler @mauromorales @jimmykarily thoughts? Any reason we need to ship all 10 providers?


![boot](https://user-images.githubusercontent.com/45666572/247134539-94607f7f-a91d-4418-a36c-2d4af805b05d.png)

> You should edit the grub cmdline and remove the console=ttyS0

I can't seem to find how to actually change boot parameters in the documentation @Itxaka There's instructions on how to append more options to Grub though? [kairos.io/docs/reference/configuration](https://kairos.io/docs/reference/configuration/) ?

Unfortunately that is a manual step, when presented with the grub menu selection to boot from USB you can press e and it will drop you into the entry config. There you can remove the console=ttyS0 and press ctrl+x to boot from the edited entry:

image

Pretty sure the delay is due to the metadata providers, specifically the Packet provider which doesnt have a default timeout of 2 seconds like the rest, and its NOT an internal address! So its always resolved but never answers…I guess it gets resolved internally if you run teh query inside an equinix machine šŸ‘

https://deploy.equinix.com/developers/docs/metal/server-metadata/metadata/

$ time curl https://metadata.platformequinix.com  -v
*   Trying 192.80.8.124:443...
* connect to 192.80.8.124 port 443 failed: Connection timed out
* Failed to connect to metadata.platformequinix.com port 443 after 130692 ms: Couldn't connect to server
* Closing connection 0
curl: (28) Failed to connect to metadata.platformequinix.com port 443 after 130692 ms: Couldn't connect to server
curl https://metadata.platformequinix.com -v  
0,00s user 0,00s system 0% cpu 2:10,69 total

The rest of the providers have a 2 second timeout which should be more than enough. I already solved this issue in the upstream library but it never trickled down to kairos 🤦

So fix to upstream is: https://github.com/rancher-sandbox/linuxkit/commit/432a87ba3e0931dc4819dd50bdf4536062b39768 Fix to yip: https://github.com/mudler/yip/commit/dcdc988cf703aeca003e868b1071c95e2a30ece5 Fix to kairos-agent: https://github.com/kairos-io/kairos-agent/pull/65

There is also a workaround for now in which you just need to remove the ā€œpackerā€ provider from the 00_datasource.yaml file which comes from the framework and should be under /system/oem/ by running in your dockerfile after copying the framework files:

sed -e 's/"packet", //g' overlay/files/system/oem/00_datasource.yaml

Thanks for the replies guys!

Are you able to get serial logs of the machine?

Yes, @Itxaka Here’s a recap:

Hope this helps!

just tested this on my baremetal and can reproduce it (with the first comment image)

You should edit the grub cmdline and remove the console=ttyS0 , that fixed it for me. Seems like as normal PCs dont have serial ports (usually) initrd gets kind of stuck there, maybe trying to connect or something?

Removing that part from the cmdline causes the initrd to load in a couple of seconds and then it fails to load something nvidia related šŸ˜„

Testing with the update 1 way.

I’d check if it’s due to compression of the initramfs. Try disabling it creating a file in /etc/dracut.conf.d/disable_compression.conf before the dracut calls containing:

compress="cat"