operating-system: Stability problems since updating to 10 and 10.1 Pi4 8GB NVMe SSD via USB adapter

Describe the issue you are experiencing

Hi there,

I just want to add to this. I have a: Pi 4b 8GB USB Boot to ORICO SSD Portable External 128GB Mini M.2 NVME

I updated from HA OS 9.5 to 10.0, the day it was released and it has been a nightmare since. I read that some people were not even able to boot when they updated with a similar NVME SSD Pi4 hardware configuration. https://github.com/home-assistant/operating-system/issues/2479

Luckily mine did, it just kept crashing every 5 hours or so. I connected the HDMI and saw that it was the SQUASHFS becoming read only and journald errors. 20230427_204604

compare: https://community.home-assistant.io/t/squashfs-error-ext4-fs-error/293167

I since changed the power supply from a 20W 4 Ampere to a macbook usb-c charger and updated to HA OS 10.1 which, brought some stability improvement. But still it crashed, then about every other day.

Today I rolled back to HA OS 9.5: ha os update --version 9.5 ha core update --version=2023.1.7

and its currently migrating my DB back

Database is about to upgrade from schema version: 41 to: 30

so it’s still very busy. It yet remains to be seen whether I get my old regular 1 month or more uptime without crashes. I really hope so.

This is not OKAY!

I suspect it has something to do with the following ‘features’, from release notes:

  • zswap instead of swap in zram is used. This should allow to use Home Assistant OS on systems with lower amounts of RAM with the trade-off of slightly higher storage wear.

What operating system image do you use?

rpi4-64 (Raspberry Pi 4/400 64-bit OS)

What version of Home Assistant Operating System is installed?

10.1

Did you upgrade the Operating System.

Yes

Steps to reproduce the issue

1.Upgrade from 9.5 to 10.0 2.Upgrade from 10.0 to 10.1

Anything in the Supervisor logs that might be useful for us?

can't read relevant logs since I downgraded

Anything in the Host logs that might be useful for us?

can't read relevant logs since I downgraded

System information

System Information

version core-2023.1.7
installation_type Home Assistant OS
dev false
hassio true
docker true
user root
virtualenv false
python_version 3.10.7
os_name Linux
os_version 5.15.84-v8
arch aarch64
timezone Europe/Berlin
config_dir /config
Home Assistant Community Store
GitHub API ok
GitHub Content ok
GitHub Web ok
GitHub API Calls Remaining 5000
Installed Version 1.32.1
Stage running
Available Repositories 1280
Downloaded Repositories 18
Home Assistant Cloud
logged_in false
can_reach_cert_server ok
can_reach_cloud_auth ok
can_reach_cloud ok
Home Assistant Supervisor
host_os Home Assistant OS 9.5
update_channel stable
supervisor_version supervisor-2023.04.1
agent_version 1.4.1
docker_version 20.10.22
disk_total 116.7 GB
disk_used 33.4 GB
healthy true
supported true
board rpi4-64
supervisor_api ok
version_api ok
installed_addons Samba share (10.0.1), SSH & Web Terminal (13.1.0), Duck DNS (1.15.0), File editor (5.6.0), Mosquitto broker (6.2.1), ESPHome (2023.4.4), SQLite Web (3.7.1)
Dashboards
dashboards 1
resources 11
views 6
mode storage
Recorder
oldest_recorder_run 7 November 2022 at 21:22
current_recorder_run 5 May 2023 at 00:06
estimated_db_size 8282.91 MiB
database_engine sqlite
database_version 3.38.5

Additional information

I downgraded to 9.5 today

I also posted this on the forum: https://community.home-assistant.io/t/home-assistant-os-10-update-has-broken-my-pi-4b-4gb/561918/24

I hope you are aware that many Pi4b users have a very unstable system at the moment.

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 7
  • Comments: 60 (4 by maintainers)

Most upvoted comments

HA-OS 10.x uses Linux-Kernel 6.1, HA-OS 9.5 uses Linux-Kernel 5.15. Maybe there are issues in the 6.1 Kernel with your RTL9210 Adapter.

As far as i know the most problems comes from the uas-mode of the usb-driver. Therefore HA-OS has ab “Blacklist” which contains a list of reported problematic adapters in uas-mode.

Function is simple: If the adaper is identified via VendorID:ProductID and blacklisted, then the usb-driver uses usb-storage instead of uas.

Lower “performance” of course, but better than crashing.

What i would do: Check which mode the adapter uses: dmesg | grep usb should show…

[  1.926422] usb X-X: New USB device strings: Mfr=2, Product=3, SerialNumber=1
[  1.926448] usb X-X: Product: Hersteller Storage Device
[  1.926472] usb X-X: Manufacturer: Hersteller
[  1.926494] usb X-X: SerialNumber: XXXXXXXXXXX
[  1.929365] usb X-X: UAS is ignored for this device, using usb-storage instead
[  1.929495] usb X-X: UAS is ignored for this device, using usb-storage instead

… if your adapter is blacklisted.

Is that so, then i have no more ideas.

Otherwise (not blacklisted) you can blacklist your adapter by yourself by editing /mnt/boot/cmdline.txt and adding your adapter’s VendorID:ProductID at the end of the line. (see cmdline.txt) Then save and (re)boot your system.

If that works (i hope so), then the adapter could/should be blacklisted by HA-OS-Team too.

I am running

  • Core 2023.11.1
  • Supervisor 2023.10.1
  • Operating System 11.1
  • Frontend 20231030.1
  • Samsung 1TB T7 USB SSD (UASP mode, USB3 socket)
    • I only use T5 & T7 SSDs on various projects on RPi and Odroid N2 as I’ve found them to be extremely reliable & performant
    • Manually enabled FSTRIM
  • Raspberry Pi 8GB + original RPi plugpack PSU
  • Wired LAN
  • ESP8266/NodeMCU running ESPHome also connected via USB 2 (for power & fallback access as this is a remote site)

Survived several core & OS upgrades, working perfectly so far for around 3 months (touch wood 🤞 - hope I haven’t cursed myself by mentioning this…).

This still is a problem with the latest 10.3 release. I’ve upgraded back to the latest version, after rolling back for a few weeks, since I want to use some of the new features. I’m trying a workaround by restarting the host twice a day via an automation.

I know this is not very helpful for solving the problem, but downgrading to 9.5 really helps, just in case you need HA to be running stable again. image

ha os update --version 9.5

[    2.016595] usb 2-2: New USB device found, idVendor=0bda, idProduct=9210, bcdDevice=30.00
[    2.016638] usb 2-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[    2.016670] usb 2-2: Product: GV100- 128G
[    2.016697] usb 2-2: Manufacturer: ORICO

Does somewone know if it’s still necessary to blacklist the Orico SSD with Home Assistant OS 11.1?

This is most likely related to Raspberry Pi’s Linux kernel and/or firmware. There hasn’t been an update to them since a while, so this is kinda expected.

Are you using USB boot? Can you try to use SD-card boot along with the data disk feature to see if that works better?

@agners ^This is really sad, Raspberry Pi USB SSD support is still broken. My system is a sinking ship, are you guys planning to fix this in the foreseeable future ? Meanwhile I’m starting to migrate my installation back to docker container on my qnap. Thx in advance for the effort.

I’m having similar issues, it’s behaving like a system, starting with HomeAssistant OS 10.0, where the hard-drive got removed while running. Things keep running, but the longer they do, the less functions.

The status page on port 4357 isn’t available, the dashboard loads, but everything on it fails to display, error while loading setting pages,…

“Error while loading page hardware”

the app failing to connect locally, automations stop running, Addons crashed, you get the picture.

And since there are no logs kept after a restart, it’s impossible to get an idea why the system went into this state.

I’m running the system on a Raspberry Pi 4B+, with an external SSD from the supported RPi SSD-Adapter list and I’ve even reflashed the OS 2 times already, before restoring my backup - it still keeps happening every 48 hours.

I’ve attached a capture card, to be able to see what the system prints out, when it crashes again.

Have exactly the same issue and obviously lots of people do. Since 10.0 my HA crashes at least every second day. Before the Update it ran seamlessly on a Raspberry Pi 4b with an external SSD.

There hasn’t been any activity on this issue recently. To keep our backlog manageable we have to clean old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant OS version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

For the sake of reliability, stability and my sanity is there an easy migration path to the other supported solution with SSDs with having HAOS installed on the SD card and HA on the SSD? Would it be a case of creating a backup in my current configuration of it all on an SSD, installing a new instance direct on an SD card, restore the backup to the SD card and then migrate the data disk back to the SSD? Or would it be better to migrate the data disk and then restore the backup? The back up is universally supported as long as its a supervised instance of HA right regardless of what hardware and configuration i stick it on?

@Baxxy13, I think you got me wrong. I said, “if”

I’m going to stay, optimistic here. If I come home, back from work, tomorrow evening, and all lights wont work, again, because my SSD became unresponsive again…

All is peachy so far: image

because my SSD became unresponsive again…

Sad to hear that. I really hoped blacklisting the uas-mode works. But ist seems that the active uas-mode is not the problem.

Searching for…

[    2.020055] usb 2-2: Enable of device-initiated U1 failed.
[    2.020983] usb 2-2: Enable of device-initiated U2 failed.

leads to articles about problematic U1/U2 implementations which had to do with LPM (low power mechanism) of USB3-Devices. e.g. here

Disabling LPM for your USB-Device might be worth a try. What i have read, this could also be done via quirks in cmdline.txt See here in the answer But i never used or tested this and i don’t know if it’s supported by HA-OS.

tobi@introvision2:~$ sudo lsblk
[sudo] password for tobi:
NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
...
(blah)
...
sda           8:0    0 119,2G  0 disk
├─sda1        8:1    0    32M  0 part
├─sda2        8:2    0    24M  0 part /media/tobi/disk
├─sda3        8:3    0   256M  0 part /media/tobi/disk3
├─sda4        8:4    0    24M  0 part /media/tobi/disk1
├─sda5        8:5    0   256M  0 part /media/tobi/disk2
├─sda6        8:6    0     8M  0 part
├─sda7        8:7    0    96M  0 part /media/tobi/hassos-overlay
└─sda8        8:8    0 118,6G  0 part /media/tobi/hassos-data
nvme0n1     259:0    0 931,5G  0 disk
├─nvme0n1p1 259:1    0   100M  0 part /boot/efi
├─nvme0n1p2 259:2    0    16M  0 part
├─nvme0n1p3 259:3    0   465G  0 part /media/win10
├─nvme0n1p4 259:4    0   614M  0 part
└─nvme0n1p5 259:5    0 465,8G  0 part /

tobi@introvision2:~$ sudo mount -t vfat /dev/sda1 /mnt

tobi@introvision2:~$ ls -l /mnt/
total 6652
-rwxr-xr-x 1 root root   52656 Aug  1 19:29 bcm2711-rpi-400.dtb
-rwxr-xr-x 1 root root   52524 Aug  1 19:29 bcm2711-rpi-4-b.dtb
-rwxr-xr-x 1 root root   53265 Aug  1 19:29 bcm2711-rpi-cm4.dtb
-rwxr-xr-x 1 root root    2411 Aug  1 19:29 boot.scr
-rwxr-xr-x 1 root root     137 Aug  1 19:29 cmdline.txt
-rwxr-xr-x 1 root root    2160 Aug  1 19:29 config.txt
-rwxr-xr-x 1 root root    3170 Aug  1 19:29 fixup4cd.dat
-rwxr-xr-x 1 root root    5398 Aug  1 19:29 fixup4.dat
-rwxr-xr-x 1 root root    8386 Aug  1 19:29 fixup4x.dat
drwxr-xr-x 2 root root   24576 Aug  1 19:29 overlays
-rwxr-xr-x 1 root root  805436 Aug  1 19:29 start4cd.elf
-rwxr-xr-x 1 root root 2250848 Aug  1 19:29 start4.elf
-rwxr-xr-x 1 root root 2998344 Aug  1 19:29 start4x.elf
-rwxr-xr-x 1 root root  533432 Aug  1 19:29 u-boot.bin

tobi@introvision2:~$ sudo vim /mnt/cmdline.txt
dwc_otg.lpm_enable=0 console=tty1 usb-storage.quirks=174c:55aa:u,2109:0715:u,152d:0578:u,152d:0579:u,152d:1561:u,174c:0829:u,14b0:0206:u,0bda:9210:u
:x

tobi@introvision2:~$ sudo umount /dev/sda1

Okay, went back to 9.5 a week ago and it’s running stable ever since. Do you see a problem staying there for a longer period of time? Doesn’t look like there will be a fix in the near future?

Same issue here when using 10.1. I have a usb adapter to a 2.5 SSD (not an NVMe). I went back to 9.5 a month or so ago and it’s been solid. I came to check the bug reports to see if others had the same issue and it seems so. This reminds me of an issue back in late 2020 early 2021 where it came down to an issue with the Pi firmware after months of people debugging differences.

I can’t SSH into my pi anymore after it hangs. Sending logs to a remote host with Rsyslog would really be helpful.

Me neither but after a reboot you can still see the old logs using journalctl.

That unfortunately does not help much when the disk with those logs becoming offline is the issue at hand.

I can’t SSH into my pi anymore after it hangs. Sending logs to a remote host with Rsyslog would really be helpful.

Me neither but after a reboot you can still see the old logs using journalctl.

@danir-de

I’ve attached a capture card, to be able to see what the system prints out, when it crashes again.

Would be nice to have rsyslog as an addon maybe. sudo apt install rsyslog

I have experienced the same thing with my Yellow/NVMe combo and symptoms persisted after an in-place downgrade to 9.5 until I did a full reinstall of HAOS from scratch (i.e. wiping the NVMe and reinstalling using USB mass storage mode, as well as uploading a known stable firmware).

I have some serial logs showing that journald can’t access its log directory, squashfs errors etc., but they don’t show the initial stages of the problem. Once the connection to the drive is lost, there appears to be no way to get a root shell or access kernel logs even with a serial connection. A reboot fixes the issue for some time (a few hours to days), but all logs from the beginning of the fault are wiped.

Since this persisted after the in-place downgrade, I think that the latest firmware maybe to blame, but I don’t have anything approaching “proof” for this hypothesis.

Hardware:

  • Non-POE Yellow
  • CM4 8 GB (full)
  • Samsung 980 1 TB
  • eMMC is empty (i.e. direct NVMe boot)

Hi,

I just wanted to update on this issue. I have had no crashes since I downgraded to HA OS 9.5, uptime is now since 9 May 2023 at 18:17, which was a normal host reboot. I also updated to core 2023.5.2 again, that does not seem to cause any problems.