kairos: k3s fails to start on Raspberry Pi

Kairos version: NAME=“openSUSE Leap” VERSION=“15.5” ID=“opensuse-leap” ID_LIKE=“suse opensuse” VERSION_ID=“15.5” PRETTY_NAME=“openSUSE Leap 15.5” ANSI_COLOR=“0;32” CPE_NAME=“cpe:/o:opensuse:leap:15.5” BUG_REPORT_URL=“https://bugs.opensuse.org” HOME_URL=“https://www.opensuse.org/” DOCUMENTATION_URL=“https://en.opensuse.org/Portal:Leap” LOGO=“distributor-logo-Leap” KAIROS_NAME=“kairos-opensuse-leap-arm-rpi” KAIROS_VERSION=“v2.3.0-k3sv1.27.3+k3s1” KAIROS_ID=“kairos” KAIROS_ID_LIKE=“kairos-opensuse-leap-arm-rpi” KAIROS_VERSION_ID=“v2.3.0-k3sv1.27.3+k3s1” KAIROS_PRETTY_NAME=“kairos-opensuse-leap-arm-rpi v2.3.0-k3sv1.27.3+k3s1” KAIROS_BUG_REPORT_URL=“https://github.com/kairos-io/kairos/issues/new/choose” KAIROS_HOME_URL=“https://github.com/kairos-io/provider-kairos” KAIROS_IMAGE_REPO=“quay.io/kairos/kairos-opensuse-leap-arm-rpi” KAIROS_IMAGE_LABEL=“latest” KAIROS_GITHUB_REPO=“kairos-io/provider-kairos” KAIROS_VARIANT=“kairos”

CPU architecture, OS, and Version: Linux yak-001 5.14.21-150500.53-default #1 SMP PREEMPT_DYNAMIC Wed May 10 07:56:26 UTC 2023 (b630043) aarch64 aarch64 aarch64 GNU/Linux

Describe the bug K3s fails to start on Raspberry PI (have tried on Ubuntu and OpenSUSE based images). Due to an error writing to /var/lib/rancher/k3s/data

To Reproduce

  1. Install an extracted or custom created Raspberry Pi image on to a SD card
  2. Enable k3s in a cloud-init file
  3. Power on the Raspberry Pi

Expected behavior K3s should start successfully

Logs

sudo systemctl status k3s shows the service is stuck in the activating state, continually in an auto-restart loop:

● k3s.service - Lightweight Kubernetes
     Loaded: loaded (/etc/systemd/system/k3s.service; enabled; vendor preset: disabled)
    Drop-In: /etc/systemd/system/k3s.service.d
             └─override.conf
     Active: activating (auto-restart) (Result: exit-code) since Tue 2023-07-11 02:25:08 UTC; 776ms ago
       Docs: https://k3s.io
    Process: 2096 ExecStartPre=/bin/sh -xc ! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service (code=exited, status=0/SUCCESS)
    Process: 2098 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
    Process: 2099 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
    Process: 2100 ExecStart=/usr/bin/k3s server (code=exited, status=1/FAILURE)
   Main PID: 2100 (code=exited, status=1/FAILURE)

sudo journalctl -u k3s shows a failure loop due to being unable to extract data into var/lib/rancher/k3s/data:

systemd[1]: Starting Lightweight Kubernetes...
sh[3096]: + /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service
sh[3097]: Failed to get unit file state for nm-cloud-setup.service: No such file or directory
k3s[3100]: time="2023-07-11T02:34:23Z" level=info msg="Acquiring lock file /var/lib/rancher/k3s/data/.lock"
k3s[3100]: time="2023-07-11T02:34:23Z" level=info msg="Preparing data dir /var/lib/rancher/k3s/data/8147fdcc81517672a3573345f56cc1fc8eb>
k3s[3100]: time="2023-07-11T02:34:23Z" level=info msg="error extracting tarball into /var/lib/rancher/k3s/data/8147fdcc81517672a3573345>
k3s[3100]: time="2023-07-11T02:34:24Z" level=fatal msg="extracting data: error writing to /var/lib/rancher/k3s/data/8147fdcc81517672a35>
systemd[1]: k3s.service: Main process exited, code=exited, status=1/FAILURE
systemd[1]: k3s.service: Failed with result 'exit-code'.
systemd[1]: Failed to start Lightweight Kubernetes.
systemd[1]: k3s.service: Scheduled restart job, restart counter is at 139.
systemd[1]: Stopped Lightweight Kubernetes.
systemd[1]: Starting Lightweight Kubernetes...

Note: I tried this previously in Kairos 2.2.1 images for both openSUSE and Ubuntu. I wanted to try it on the latest release today before filing to make sure it wasn’t already addressed.

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 41 (29 by maintainers)

Most upvoted comments

By the way, when I run kairos --version, it returns version: 2.4.0, compiled with: go1.20.8. I was expecting it to say 2.4.1 ?

This is expected as you are checking the version of kairos-agent.

For the Kairos version you need to check /etc/os-release

Kairos-agent and Kairos have different release cadence, currently they are kind of matched in version numbers due to coincidence, but you could have different versions altogether depending on their release cadence, so it could report version 2.2.12 of the agent in the Kairos version 2.1 for example 😃

Could be that alpine is missing a required binary on the initramfs needed to expand.

@mauromorales this should now be fixed on master with model rpi4, if ther reason was not enough space under persistent.

Until this is fixed, you can run the command manually (annoying but only needs to be done once)

  1. In GRUB add rd.break=initqueue
  2. when you drop in the rescue shell run: sgdisk -g -d=4 -n=4:21635072:+0 -c=4:Linux\ filesystem -t=4:8300 /dev/mmcblk0
  • keep in mind the start from sector will depend on the values of the image, so validate before
  • and you might want to run the command with -P to dry run before running it without
  1. restart

On next boot you should see something like

lsblk -o NAME,SIZE,LABEL
NAME                   SIZE LABEL
loop0                    2G COS_ACTIVE
mmcblk0               29.2G 
├─mmcblk0p1             96M COS_GRUB
├─mmcblk0p2            6.1G COS_STATE
├─mmcblk0p3            4.2G 
│ ├─KairosVG-oem        64M COS_OEM
│ └─KairosVG-recovery  4.1G COS_RECOVERY
└─mmcblk0p4           18.9G COS_PERSISTENT

I think the issue might be related to the size of the persistent partition:

lsblk -o NAME,SIZE,LABEL
NAME                   SIZE LABEL
loop0                    2G COS_ACTIVE
mmcblk0               29.2G 
├─mmcblk0p1             96M COS_GRUB
├─mmcblk0p2            6.1G COS_STATE
├─mmcblk0p3            4.2G 
│ ├─KairosVG-oem        64M COS_OEM
│ └─KairosVG-recovery  4.1G COS_RECOVERY
└─mmcblk0p4             64M COS_PERSISTENT