lima: VM disk corruption with Apple Silicon

[!TIP]

EDIT by @AkihiroSuda

For --vm-type=vz, this issue seems to have been solved in Lima v0.19 (https://github.com/lima-vm/lima/pull/2026)


Description

Lima version: 0.18.0 macOS: 14.0 (23A344) VM: Almalinux9

I was trying to do a big compile, using a VM with the attached configuration (vz)

NAME           STATUS     SSH                VMTYPE    ARCH       CPUS    MEMORY    DISK      DIR
myalma9        Running    127.0.0.1:49434    vz        aarch64    4       16GiB     100GiB    ~/.lima/myalma9

The build aborted with:

from /Volumes/Lima/build/build/AthenaExternals/src/Geant4/source/processes/hadronic/models/lend/src/[xDataTOM_LegendreSeries.cc](http://xdatatom_legendreseries.cc/):7:
/usr/include/bits/types.h:142:10: fatal error: /usr/include/bits/time64.h: Input/output error

And afterwards, even in a different terminal, I see:

[emoyse@lima-myalma9 emoyse]$ ls
bash: /usr/bin/ls: Input/output error

I was also logged into a display, and there I saw e.g.

Screenshot 2023-10-26 at 17 44 45

If I try to log in again with:

limactl shell myalma9

each time I see something like the following appear in the display window:

[56247.6427031] Core dump to l/usr/lib/systemd/systemd-coredump pipe failed

Edit: there has been a lot of discussion below, and the corruption can happen with both vz and qemu, and on external (to the VM) and internal disks. Some permutations seem more likely to provoke a corruption than others. I have reproduced my experiments in the table in the following comment below.

About this issue

  • Original URL
  • State: open
  • Created 8 months ago
  • Reactions: 1
  • Comments: 50 (24 by maintainers)

Most upvoted comments

My apologies for the delay in replying, but i have been looking into this. The workflow is the same - compile https://gitlab.cern.ch/atlas/atlasexternals using the attached template with various configurations of host, qemu/vz, cores and memory.

TLDR; updating to 6.5.10-1 was more stable on M2 (even on ‘shared’ volume /tmp/lima), but apparently worse on M1 Pro (though the M1Pro has more cores and we pushed this a lot harder). Updating to 6.6.1 was better on M1 Pro (have not tested M2 yet) but got xfs corruption at the very end.

With 6.6.1 I also disabled sleeping on guest:

sudo systemctl mask sleep.target suspend.target hibernate.target hybrid-sleep.target

(from hint here)

VM Type Kernel Cores RM (GB) Where Attempt 1 Attempt 2 Attempt 3 Host Processor
qemu 5.14 6 24 /tmp Crash + xfs Crash + xfs Crash + xfs M1 Pro
vz 5.14 6 24 /Volumes/Lima Crash + xfs M1 Pro
vz 5.14 6 24 /tmp OK M1 Pro
qemu 5.6.10.1 6 24 /tmp OK (but slow) M1 Pro
vz 5.6.10.1 6 24 /Volumes/Lima Crash + xfs M1 Pro
vz 5.6.10.1 6 24 /tmp Crash a Crash b M1 Pro
vz 6.6.1 6 24 /tmp xfs M1 Pro
vz 6.6.2-1 4 12 /home/emoyse.linux xfs M1 Pro

Notes:

  • xfs means xfs corruption was reported.
  • Once xfs corruption has occurred, I trash the VM and restart
  • Often crash is preceded in dmesg by e.g “hrtimer: interrupt took 32332585ns”
  • crash a in /var/log/messages I see :
978.3062161 BUG: Bad rss-counter state mm:0000000076c5940f type:M_FILEPAGES val: 402
[978.3067761 BUG: Bad rss-counter state mm:0000000076c5940f type:MM_ANONPAGES val:206
978.3071421 BUG: non-zero pgtables_bytes on freeing mm: 69632
[+0.0116951 BUG: Bad rss-counter state mm:0000000076c5940f type:MM FILEPAGES val: 402
  • crash b I see:
Nov 7 16:44:19 lima-myalma92 kernel: BUG: workqueue lockup - pool cpus=5 node=0 flags=0x0 nice=0 stuck for 2164s!
Nov 7 16:44:19 lima-myalma92 kernel: Showing busy workqueues and worker pools:
Nov 7 16:44:19 lima-myalma92 kernel: workqueue events: flags=0x0
Nov 7 16:44:19 lima-myalma92 kernel: pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
Nov 7 16:44:19 lima-myalma92 kernel: pending: drm_fb_helper_damage_work [drm_kms_helper]
Nov 7 16:44:19 lima-myalma92 kernel: workqueue mm_percpu_wq: flags=0x8
Nov 7 16:44:19 lima-myalma92 kernel: pwq 10: cpus=5 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
Nov 7 16:44:19 lima-myalma92 kernel: pending: vmstat_update
  • for the last run with 6.6.1 it all completed fine and looked great, but then I got:
[emoyse@lima-alma9661c6 tmp]$ ls
bash: /usr/bin/ls: Input/output error

And in the display I see: image

So it seems like there are a lot of references to people mentioning issues related to external disks and non-APFS filesystems. I am using the internal disk on my m2 mini with the default APFS filesystem and I’ve experienced disk corruption once but haven’t specifically been able to force it to be reproduced but I haven’t tried very hard to be honest but I did want to point out that maybe external disks and other filesystems may not be the specific cause but may just be easier to trigger compared to internal APFS.

I run Debian Bookworm and after repairing the filesystem with a fsck I did also upgrade my kernel from linux-image-cloud-arm64 6.1.55-1 to 6.5.3-1~bpo12+1 in backports.

I can trigger filesystem corruption if my external disk is formatted with ExFAT

Oh, so that might be why it is mostly affecting external disks ? Did people forget to (re-)format them before using ?

EDIT: no, no so simple

“I create a separate APFS (Case-sensitive) Volume,”

And for me, I’m not using external (to the VM) disks any more - if you look at the table I posted here you will see that in the Where column, I’m mostly using /tmp to work in i.e. completely inside the VM. Using an external disk might provoke the corruption earlier, but it’s certainly not the only route to it (though later kernels seem quite a bit more stable).

I was just curious about the comment about moving the build to /tmp seems to have “cured” the corruption…

Hey @afbjorklund I’ve been running some more tests, and I just had corruption from /tmp so it doesn’t cure it (but perhaps it is slightly less likely to happen). Updating the original post.

ARM64 atomics have been broken until last year, when I found the issue and got it fixed (it was breaking workqueues which was causing problems with TTYs for me, but who knows what else). 5.14 (released 2021) is definitely broken unless it’s a branch with all the required backports.

Try 6.4, that should work. 6.5.0 was a very recent regression. I would not put much faith in older kernels, especially anything older than 5.18 which is where we started. All bets are off if you’re running kernels that old on bleeding edge hardware like this. Lots of bugfixes don’t get properly backported into stable branches either. Apple CPUs are excellent at triggering all kinds of nasty memory ordering bugs that no other CPUs do, because they speculate/reorder across ridiculous numbers of instructions and even things like IRQs (yes really).

I’ll admit I’m not familiar with Lima. When you say “make it mountable from within the VM”, what does that mean?

  • You have a virtual hard disk file that lives on that separate APFS volume, and your VM is configured to have that as a second disk drive?
  • You boot the VM, and somehow from Linux user/kernel land mount your /Volumes/Lima directory? (How?)
  • Something else?

Perhaps Lima does this all for you under the hood, but I suppose that I’d need to know exactly what it’s doing to have any hope of understanding what’s going on.

Is this relevant?

(UTM uses vz too)

Looks like people began to hit this issue since September, so I wonder if Apple introduced a regression on that time?

I still can’t repro the issue locally though. (macOS 14.1 on Intel MacBookPro 2020, macOS 13.5.2 on EC2 mac2-m2pro)