harvester: [BUG] v1.2.0 Interactive ISO Fails to Install On Some Bare-Metal Devices
Describe the bug Interactive ISO Install Fails to install on some bare-metal devices.
Devices reported so far:
- AMD Ryzen 9 7940HS “nuc-like” machine
- AMD Ryzen 5900 w/ 64GB
- Intel 770T i7 CPU (i7 vPro 7th Gen) ThinkCentre M910q w/ 32GB
Two paths seem to take place:
Path “A”:
- Once GRUB boot entry fires up, dmesg/journalctl logs kick off but get hung right-after/at
squashfs: version 4.0 (2009/01/31) Phillip Lougher
Path “B”:
- hit “e” at GRUB boot menu to edit boot menu item, remove
console=ttyS0, hit cntrl+x to boot - boot continues
- boot will hit 50min limit on
a start job is running for /dev/mapper/live-rw (50min) - will result in:
timed out waiting for device /dev/mapper/live-rw
dependency failed for /sysroot
dependency failed fo cOS system initramfs setup before switch root
dependency failed for initrd default target
dependency failed for migrate config to new version
Related to:
[ 3002.862138] localhost systemd[1]: systemd-ask-password-console.path: Deactivated successfully.
[ 3002.876793] localhost systemd[1]: Stopped Dispatch Password Requests to Console Directory Watch.
[ 3002.877445] localhost systemd[1]: Stopped target Basic System.
[ 3002.878015] localhost systemd[1]: Stopped target System Initialization.
[ 3002.878491] localhost systemd[1]: dracut-pre-mount.service: Deactivated successfully.
[ 3002.878601] localhost systemd[1]: Stopped dracut pre-mount hook.
[ 3002.879091] localhost systemd[1]: dracut-initqueue.service: Deactivated successfully.
[ 3002.879145] localhost systemd[1]: Stopped dracut initqueue hook.
[ 3002.879628] localhost systemd[1]: dracut-pre-trigger.service: Deactivated successfully.
[ 3002.879678] localhost systemd[1]: Stopped dracut pre-trigger hook.
[ 3002.880124] localhost systemd[1]: dracut-pre-udev.service: Deactivated successfully.
[ 3002.880160] localhost systemd[1]: dracut-pre-udev.service: Unit process 706 (rpcbind) remains running after unit stopped.
[ 3002.880176] localhost systemd[1]: dracut-pre-udev.service: Unit process 710 (rpc.statd) remains running after unit stopped.
[ 3002.880190] localhost systemd[1]: dracut-pre-udev.service: Unit process 715 (rpc.idmapd) remains running after unit stopped.
[ 3002.880254] localhost systemd[1]: Stopped dracut pre-udev hook.
[ 3002.880742] localhost systemd[1]: dracut-cmdline.service: Deactivated successfully.
[ 3002.880805] localhost systemd[1]: Stopped dracut cmdline hook.
[ 3002.881866] localhost systemd[1]: Started Emergency Shell.
[ 3002.882341] localhost systemd[1]: Reached target Emergency Mode.
[ 3002.882790] localhost systemd[1]: Reached target Initrd Root File System.
[ 3002.883747] localhost systemd[1]: Starting cOS system early rootfs setup...
[ 3002.910231] localhost elemental[1113]: [36mINFO[0m[2023-09-08T20:07:42Z] Starting elemental version 0.3.1
[ 3002.910231] localhost elemental[1113]: [36mINFO[0m[2023-09-08T20:07:42Z] reading configuration form '/etc/elemental'
[ 3002.910455] localhost elemental[1113]: [36mINFO[0m[2023-09-08T20:07:42Z] Running stage: rootfs.before
[ 3002.910455] localhost elemental[1113]: [36mINFO[0m[2023-09-08T20:07:42Z] Done executing stage 'rootfs.before'
[ 3002.910455] localhost elemental[1113]: [36mINFO[0m[2023-09-08T20:07:42Z] Running stage: rootfs
[ 3002.910455] localhost elemental[1113]: [36mINFO[0m[2023-09-08T20:07:42Z] Done executing stage 'rootfs'
[ 3002.910455] localhost elemental[1113]: [36mINFO[0m[2023-09-08T20:07:42Z] Running stage: rootfs.after
[ 3002.910455] localhost elemental[1113]: [36mINFO[0m[2023-09-08T20:07:42Z] Done executing stage 'rootfs.after'
[ 3002.910520] localhost elemental[1113]: [36mINFO[0m[2023-09-08T20:07:42Z] Running stage: rootfs.before
[ 3002.910584] localhost elemental[1113]: [36mINFO[0m[2023-09-08T20:07:42Z] Done executing stage 'rootfs.before'
[ 3002.910584] localhost elemental[1113]: [36mINFO[0m[2023-09-08T20:07:42Z] Running stage: rootfs
[ 3002.910662] localhost elemental[1113]: [36mINFO[0m[2023-09-08T20:07:42Z] Done executing stage 'rootfs'
[ 3002.910662] localhost elemental[1113]: [36mINFO[0m[2023-09-08T20:07:42Z] Running stage: rootfs.after
[ 3002.910730] localhost elemental[1113]: [36mINFO[0m[2023-09-08T20:07:42Z] Done executing stage 'rootfs.after'
[ 3002.911691] localhost systemd[1]: Finished cOS system early rootfs setup.
[ 3002.912704] localhost systemd[1]: Starting cOS system immutable rootfs mounts...
[ 3002.916317] localhost systemctl[1122]: Failed to stop oem.mount: Unit oem.mount not loaded.
[ 3002.923074] localhost systemd[1]: Finished cOS system immutable rootfs mounts.
[ 3002.923497] localhost systemd[1]: Reached target Initrd File Systems.
[ 3002.923872] localhost systemd[1]: Startup finished in 23.595s (firmware) + 3min 16.756s (loader) + 2.664s (kernel) + 0 (initrd) + 50min 259ms (userspace) = 53min 43.276s.
Resulting in:
Generating "/run/initramfs/rdosreport.txt"
Entering emergency mode. Exit the shell to continue.
Type "journalctl" to view system logs.
You might want to save "/run/initramfs/rdsoreport.txt" to a USB stick or /boot
after mounting them and attach it to a bug report.
Press Enter for maintenance
(or press control-d to continue)
To Reproduce Pre-Reqs:
- have a machine that is close to those series of devices
- AMD 5900 & Intel i7 vPro 7th Gen machines-> were reproduced/tested in UEFI boot mode, not legacy BIOS Steps to reproduce the behavior:
- Have a bootable USB stick with the interactive ISO flashed to it
- Attempt to boot ( Either exercise Path B or allow Path A to run the course)
Expected behavior The installer to not hit:
[ 3002.911691] localhost systemd[1]: Finished cOS system early rootfs setup.
[ 3002.912704] localhost systemd[1]: Starting cOS system immutable rootfs mounts...
[ 3002.916317] localhost systemctl[1122]: Failed to stop oem.mount: Unit oem.mount not loaded.
[ 3002.923074] localhost systemd[1]: Finished cOS system immutable rootfs mounts.
And other moments. And allow the user to proceed to the entry point of the first page of the interactive iso install.
Environment NOTE: This is not reproducible with v1.2.0-rc5. But is reproducible with v1.2.0-rc6.
- Harvester ISO version: v1.2.0 & v1.2.0-rc6
- Underlying Infrastructure: Bare-metal only, unsuccessful in reproducing on HP Proliant Server Also unsuccessful in reproducing in qemu/kvm
Additional context
Attaching some logs:
dmesg-logs.log etc-initrd-release.log etc-os-release.log journalctllogs.log rdsosreport.txt
And additional pictures:
Update
- this can be virtualized in a hybrid fashion by, having a USB, flashing the USB with rc6 or v1.2.0 - then passing through the USB to the VM, then setting up the boot order, to leverage the USB
Other Update
- this is also reproducible with BIOS not juse UEFI
About this issue
- Original URL
- State: closed
- Created 10 months ago
- Reactions: 1
- Comments: 20 (3 by maintainers)
I think I found the cause!! It seems we changed from kernel.xz to just kernel – I have no idea or what that means 😃
I noticed the search at the top of grub.cfg tho is still looking for kernel.xz!!
search --no-floppy --file --set=root /boot/kernel**.xz**It should be (I think?)search --no-floppy --file --set=root /boot/kernelAt the same time of discovering this, I tried booting specifying all paths manually. (hd0,msdos1) for kernel and initrd, as well specifying the location for COS_LIVE.I thought this made sense, and immediately wrote the above. But i’ve been retesting, the only change I really had to do was root=live:/dev/sda1 . Which makes me wonder if the search line actually is not the problem at all 😃 I would of thought both kernel and initrd of failed to be loaded given the search was looking for kernel.zx?
My ultimate ‘work around’ however was just specifying the partition for the rd.live
$linux ($root)/boot/kernel cdroot root=live:CDLABEL=COS_LIVE rd.live.dir=/ rd.live.squashimg=rootfs.squashfs console=tty1 console=ttyS0 rd.cos.disable net.ifnames=1to$linux ($root)/boot/kernel cdroot root=live:**/dev/sdX1** rd.live.dir=/ rd.live.squashimg=rootfs.squashfs console=tty1 console=ttyS0 rd.cos.disable net.ifnames=1Warning I am changing from label based to hard path, I know in my case the USB is /dev/sda. But if the system had a sata it could of been /dev/sdb, etc. Do not do this unless you know as you are using a less reliable method to load the image.
Hi @Roguito , the docs have been updated with: https://github.com/harvester/docs/pull/435 To add content to the USB Installation section, with a link out to where a user can download the patched ISO 😄
@Vicente-Cheng based on the documentation update working for both solutions in kvm/qemu & on bare-metal (consumer-grade) in what was mentioned above, I feel comfortable closing this out. Thanks again for the doc update on the workaround 😄 !
cc: @bk201
Update the current status.
We found there are two COS_LIVE label partitions of USB sticks. When bootup, some checking script would hang because of the wrong partition.
And the timeout default is 3000 seconds, so we must wait 50 minutes. You can refer to this for the timeout setup https://github.com/haraldh/dracut/blob/master/modules.d/90dmsquash-live/dmsquash-generator.sh#L75-L80.
So that might be the root cause of this situation. Also, I tried with the original ISO (which means no repack), and it works well.
NOTE: We repack the ISO because we need to support legacy BIOS bootup
BTW, we also found some ISO editors (I tried
rufus) will resolve this problem because they change the original partition layout.@slackspace-io Thanks. We can reproduce it. Your workaround works quite well! Specifying the real partition name or UUID path works, just not sure why it breaks with the label.
So where are we at? I just flashed 1.2 from the current releases with this bug. Is 1.2.0-patch1 available?
Pre Ready-For-Testing Checklist
If labeled: require/HEP Has the Harvester Enhancement Proposal PR submitted? The HEP PR is at:
Where is the reproduce steps/test steps documented? The reproduce steps/test steps are at:
Perform installation with the following methods:
Is there a workaround for the issue? If so, where is it documented? The workaround is at:
Have the backend code been merged (harvester, harvester-installer, etc) (including
backport-needed/*)? The PR is at:Does the PR include the explanation for the fix or the feature?
Does the PR include deployment change (YAML/Chart)? If so, where are the PRs for both YAML file and Chart? The PR for the YAML change is at: The PR for the chart change is at:
If labeled: area/ui Has the UI issue filed or ready to be merged? The UI issue/PR is at:
If labeled: require/doc, require/knowledge-base Has the necessary document PR submitted or merged? The documentation/KB PR is at:
If NOT labeled: not-require/test-plan Has the e2e test plan been merged? Have QAs agreed on the automation test case? If only test case skeleton w/o implementation, have you created an implementation issue?
If the fix introduces the code for backward compatibility Has a separate issue been filed with the label
release/obsolete-compatibility? The compatibility issue is filed at:@Vicente-Cheng I updated the initial description as this seems to be both reproducible with UEFI & BIOS
Yes, the only change that was required was the root=live portion. It was not needed to set hd0,msdos1 on the ($root) portions. However setting these did not break anything.
When I press ‘e’ to edit the config, I do not gain access to the 'search ’ line. So was not able to change the search line itself and test.
Would the incorrect ‘search’ line, cause the label based root=live: to not be found ?
I was able to successfully install both a nuc gen 8 ,and optiplex 3080 by setting root=live:/dev/sda1 instead of the labeled based identification.
Another related (duplicate) issue: https://github.com/harvester/harvester/issues/4472
I have the exact same on intel NUC12 i5-1240p.
Same issue;
Dell Optiplex 3080 i5-12500
I’ve reset bios, upgraded bios, disabled SATA/M2, every usb port. Tried the few bios options I’ve seen for any issue, all same exact behaviour as this.
I compared a v1.1.2 iso to the v1.2 iso, and I noticed bootx64.efi is not executable in v1.2.0 but was in v1.1.2 iso. I have no idea if this can be a problem, but my uneducated efforts to try and compare differences I noticed this. As well a change form using kernel.xz and rootfs.xz to initrd. But I really have no idea if any of this matters.