operating-system: Fail to boot after Upgrade to 10.0 CM4 NVMe

Describe the issue you are experiencing

I didn’t get the note its not bootable with an nvme,

So i use a cm4 with an nvme boot, i Upgrade from 9.5 to 10 in the UI and now it doesnt boot anymore.

Any Options to downgrade via fileswap? E.g.

This is really dangures to provide updates in the UI that breaks some Setups totally

What operating system image do you use?

rpi4-64 (Raspberry Pi 4/400 64-bit OS)

What version of Home Assistant Operating System is installed?

10

Did you upgrade the Operating System.

Yes

Steps to reproduce the issue

  1. Go to the UI and hit the os 10 Update Button
  2. Wait for the Install
  3. Reboot, bricked boot

Anything in the Supervisor logs that might be useful for us?

Nothing

Anything in the Host logs that might be useful for us?

Nothing

System information

CM4 with nvme

Additional information

No response

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 33 (13 by maintainers)

Most upvoted comments

FWIW, it HAOS 10 is off the stable channel for RPi 4 devices now: https://github.com/home-assistant/version/pull/288.

I think you don’t have to go that far, however there should be a hardware compatibility list to which the core team and users can contribute. Is this being considered?

For most boards things are quite straight forward since folks usually just boot from SD-card. I do have every board we support. But for more advanced/special setup, such a list would be a nice to have indeed! Along with a list of users which are willing to test pre-releases on the beta channel. Maybe a GitHub wiki could do the job? 🤔

In any case, I’ve found the problem, PR https://github.com/home-assistant/operating-system/pull/2493 fixes it. The problem will be fixed in HAOS 10.1.

The problem is, that there is always a setup which breaks in some weird ways.

With more than 150k installations which opt-in to stats, according to analytics.home-assistant.io already 34k upgraded successfully (otherwise the stats would not get updated). If I pull the update for every failing installation I see, we won’t be able to publish a new OS release ever.

In general: USB SSD boot on Raspberry Pi has always been fragile, and being discouraged for a long time. That the CM4 NVMe boot on certain base boards is unstable is also known for a while, and often required EEPROM updates etc.

Ideally, you boot HAOS from a SD-card or from the eMMC, this is known to be much more reliable. The data disk feature then allows to use NVMe or USB attached SSD’s still.

Honestly, I wonder why this release is still available and not pulled back, unless there is an immediate fix available. This will brick more and more HA devices (especially when people have time on the upcoming weekend to click the “update” button), causing those who boot from an NVME a lot of trouble, potential data loss and maybe some frustration.

This issue is resolved by https://github.com/home-assistant/operating-system/pull/2493 and part of HAOS 10.1.

The problem is, that there is always a setup which breaks in some weird ways.

Well, I think in this case there is no weird way. HA 10 has a 99% chance to not boot when a CM4 is being used together with a NVME as boot drive. So this should be called out as “breaking change”, while the release is being fixed. Booting from NVMEs isn’t something weird nowadays, it became a norm.

Leaving the state like this and watching the number of complains is just killing brand reputation. Why would one risk that?

Just my two cents 😃 I don’t understand this way of product management and prefer quality over quantity.

@agners

Hm, so at this point you have a HAOS installation on the internal eMMC, that means you also boot from the eMMC. This use case should already work with HAOS 10.0. U-Boot is only used at boot time, and the bug in OS 10.0 is that U-Boot can’t continue booting when booting from the external NVMe SSD.

Correct (in theory), but previously not possible due to https://github.com/home-assistant/operating-system/issues/1887

Nit: Technically, there is no such thing as a “M.2 eMMC”. That is just a M.2 SSD (solid-state disk). eMMC stands for embedded multimedia card, which is the name of the protocol the CM4 on-board flash storage. The M.2 SSDs on Waveshare/Yelow use NVMe as the protocol (over PCIe), so more precise would be M.2 NVMe SSD.

I’m using Steam Deck marketing lingo here (and they refer to it as eMMC) - first line is the module I have:

64 GB eMMC (PCIe Gen 2 x1) 256 GB NVMe SSD (PCIe Gen 3 x4 or PCIe Gen 3 x2*) 512 GB high-speed NVMe SSD (PCIe Gen 3 x4 or PCIe Gen 3 x2*) All models use socketed 2230 m.2 modules (not intended for end-user replacement)

Does it means that booting from NVMe is also broken for Yellow?

No, the configuration was present for Yellow. It just didn’t apply for the rpi4/rpi4-64 images.

In the meantime, will replacing u-boot.bin from HAOS 9.5 work?

Yes that should work.

How long is wait for the first 10.1 beta?

I’ll trigger a dev build tonight, it will be available from https://os-builds.home-assistant.io/ tomorrow.

@agners

  1. Add it to the generic Raspberry Pi U-Boot configuration so Yellow as well as other CM4 based systems can boot from NVMe SSD again.

Does it means that booting from NVMe is also broken for Yellow?

  1. The problem will be fixed in HAOS 10.1.

In the meantime, will replacing u-boot.bin from HAOS 9.5 work?

  1. How long is wait for the first 10.1 beta?
  2. I’ve been able to reproduce, and to me it seems a EEPROM issue

I’ve updated to the latest EEPROM on CM4 - no changes.

  1. Ideally, you boot HAOS from a SD-card or from the eMMC, this is known to be much more reliable. The data disk feature then allows to use NVMe or USB attached SSD’s still.

That’s the thing, it doesn’t work on Waveshare boards #1887

bold move, appreciate that!

I am happy to add to testing with my config if that helps and if desired, since I think the TOFU board I am using is a great piece of hardware but less common.

To your point:

If we’d really want to increase quality, we should just prevent boot on any board we don’t test.

I think you don’t have to go that far, however there should be a hardware compatibility list to which the core team and users can contribute. Is this being considered?