Flatcar: arm64: unable to PXE boot, hang after AHCI errors

Description

On Gigabyte arm64 servers, the Flatcar PXE images hang during the boot process, making them unusable, while Fedora CoreOS images work. We think we narrowed it down to CMA not being enabled in the kernel config, and enabling it generally and for DMA seems to get the boot process farther along.

Impact

Inability to use flatcar images altogether on arm-based servers

Environment and steps to reproduce

We are running the flatcar pxe vmlinuz and initrd images on a bare metal arm64 server. This server is a Gigabyte R152-P31.

This seems to be happening with multiple (likely all) published Flatcar images for arm64. I have tested 3033.3.5, 3227.2.2, and 3374.0.0. We have yet to find an image that works.

We use the following grub config to boot these images (adapting the versions appropriately):

function load_video {
        insmod efi_gop
        insmod efi_uga
        insmod video_bochs
        insmod video_cirrus
        insmod all_video
}

load_video
set gfxpayload=keep
insmod gzio

set timeout=30

menuentry 'Install Flatcar 3033.3.5 (aarch64)' --class gnu-linux --class gnu --class os --id flatcar3033.3.5-aarch64 {
    linux flatcar3033.3.5/aarch64/vmlinuz ip=dhcp inst.ks=http://10.90.21.50/cgi-bin/kickstart?os=flatcar3033.3.5-aarch64 coreos.autologin ipv6.disable=1 cloud-config-url="http://10.90.21.50/cgi-bin/bootstrap.sh?name={name}&arch=aarch64&domain={domain}{bootstrap_params}" flatcar.first_boot=1 flatcar.autologin console=tty0 module_blacklist=nvme,nvme_core
    initrd flatcar3033.3.5/aarch64/initrd.img
}

The kernel seems to start correctly, however it invariably ends up printing this message and hanging:

[   18.562856] ata1.00: qc timeout (cmd 0xec)
[   19.352940] ahci 000c:01:00.0: AHCI controller unavailable!
[   19.352969] pcieport 000c:00:01.0: AER: Uncorrected (Non-Fatal) error received: 000c:00:00.0
[   20.144487] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[   20.144490] pcieport 000c:00:01.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
[   20.174143] pcieport 000c:00:01.0:   device [1def:e101] error status/mask=00004000/00400000
[   20.186763] pcieport 000c:00:01.0:    [14] CmpltTO                (First)
[   20.545764] ahci 000c:01:00.0: AHCI controller unavailable!
[   20.545779] ahci 000c:01:00.0: AER: can't recover (no error_detected callback)
[   20.567037] pcieport 000c:00:01.0: AER: device recovery failed
[   20.577034] pcieport 000c:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: 000c:00:00.0

… with stack dumps being periodically printed out because the whole thing is blocked. Full logs for a 3227.2.2 boot: flatcar-3227.2.2-boot.log

Booting the kernel with modules_blacklist=ahci of course works and gets us a shell, but means no disks are visible.

One thing we have noticed is that Fedora CoreOS 36.20220918.3.0 has no issue getting past this point. Here is a diff of the boot logs for both: flatcar-vs-fcos-boot.diff

One obvious difference is the kernel version (Flatcar uses 5.15 while FCOS uses 5.19), but we think we have narrowed it down to the following differences:

- Memory: 394529984K/402392064K available (9920K kernel code, 2314K rwdata, 7220K rodata, 39808K init, 780K bss, 7862080K reserved, 0K cma-reserved)
+ Memory: 394648504K/402392064K available (16576K kernel code, 4122K rwdata, 13564K rodata, 7488K init, 10738K bss, 7678024K reserved, 65536K cma-reserved)
...
- DMA: preallocated 4096 KiB GFP_KERNEL pool for atomic allocations
- DMA: preallocated 4096 KiB GFP_KERNEL|GFP_DMA pool for atomic allocations
- DMA: preallocated 4096 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
+ DMA: preallocated 16384 KiB GFP_KERNEL pool for atomic allocations
+ DMA: preallocated 16384 KiB GFP_KERNEL|GFP_DMA pool for atomic allocations
+ DMA: preallocated 16384 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
...
- pci 000c:01:00.0: [1b21:1164] typefailed to assign [io  size 0x1000]
- pci 0004:00:05.0: BAR 13: no space for [io  size 0x1000]
- pci 0004:00:05.0: BAR 13: failed to assign [io  size 0x1000]
- pci 0004:00:01.0: BAR 13: no space for [io  size 0x1000]
- pci 0004:00:01.0: BAR 13: failed to assign [io  size 0x1000]
- pci 0004:00:05.0: BAR 13: no space for [io  size 0x1000]
- pci 0004:00:05.0: BAR 13: failed to assign [io  size 0x1000]
- pci 0004:00:03.0: BAR 13: no spacTCP: Hash tables configured (established 524288 bind 65536)
+ pci 000c:01:00.0: [1b21:1164] type ata36: SATA max UDMA/133 abar m8192@0x40282000 port 0x40282680 irq 109

In particular, the DMA coherent pools are rather small and it’s been accepted that the coherent pool can only work correctly with CMA enabled. The inability to allocate space for the offending PCI device during boot seem to correlate this. On Flatcar, there is no space reserved for CMA while FCOS has 64MiB, and specifying cma=64M on the kernel cmdline prints that the argument is unrecognized, hinting that it’s config-disabled.

I have built the kernel of flatcar 3227.2.3 using the SDK, and enabled the following configs:

CONFIG_CMA=y
CONFIG_DMA_CMA=y
CONFIG_CMA_AREAS=19
CONFIG_CMA_SIZE_MBYTES=64

This seemed to have gotten the boot process farther along, except for dracut having issues using iscsi, but I think this may have something to do with my build more than anything. Boot logs for this are here: flatcar-with-cma.log. Currently investigating.

Expected behavior

We would have expected the boot to complete.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 16 (7 by maintainers)

Most upvoted comments

From serial through IPMI – I’ve added forward_to_console=1 and I think it might be a config issue pulling something over the network. Currently troubleshooting, but it seems that the IOMMU configs were what really unwedged me. I’ll report back if/once I get the config issues working and make a PR with the kconfig changes if nothing else is breaking

Would you be able to test build a Flatcar image with the following value set in the kernel config:

CONFIG_FORCE_MAX_ZONEORDER=13

That would be the source of the 16M vs 4M atomic pool difference. But to be honest I have no idea why CMA would make a difference here, especially with assigning io space.