crc: [BUG] CRC Unstable on Apple Silicon

CRC will run on my MacBook Air M2, but it is not stable.

It will often fail to start a fresh instance, and does not seem to be able to ever restart a stopped instance.

Upon investigation of logs, it looks like something causes a kernel panic that results in the VM rebooting.

Even when OpenShift is running, the VM will eventually stop accepting ssh connections, which results in Podman not working.

export SSH_KEY=${HOME}/.crc/machines/crc/id_ecdsa 
ssh core@127.0.0.1 -p 2222 -i ${SSH_KEY}

The output is:

kex_exchange_identification: read: Connection reset by peer
Connection reset by 127.0.0.1 port 2222

Take a look at the log info below, and let me know what other info I can capture.

General information

OS: macOS
Hypervisor: vfkit
Apple MacBook Air M2
24 GB RAM

CRC version

CRC version: 2.8.0+ff5e73ea
OpenShift version: 4.11.1
Podman version: 4.1.1

CRC config

- consent-telemetry                     : no
- memory                                : 16384

Host Operating System

ProductName:	macOS
ProductVersion:	12.6
BuildVersion:	21G115

Logs Captured from .crc/machines/crc/vfkit.log

[  290.579026] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[  290.579148] Mem abort info:
[  290.579181]   ESR = 0x96000047
[  290.579211]   EC = 0x25: DABT (current EL), IL = 32 bits
[  290.579311]   SET = 0, FnV = 0
[  290.579353]   EA = 0, S1PTW = 0
[  290.579384]   FSC = 0x07: level 3 translation fault
[  290.579415] Data abort info:
[  290.579434]   ISV = 0, ISS = 0x00000047
[  290.579460]   CM = 0, WnR = 1
[  290.579494] user pgtable: 4k pages, 48-bit VAs, pgdp=00000001196d9000
[  290.579545] [0000000000000008] pgd=0800000116a9d003, p4d=0800000116a9d003, pud=0800000116a9e003, pmd=080000011854c003, pte=0000000000000000
[  290.579620] Internal error: Oops: 96000047 [#1] SMP
[  290.579671] Modules linked in: nf_conntrack_netlink veth xt_nat xt_addrtype ipt_REJECT nf_reject_ipv4 xt_CT xt_MASQUERADE xt_conntrack xt_comment nft_counter xt_mark nft_compat nft_chain_nat nf_tables tun overlay dummy rfkill vxlan ip6_udp_tunnel udp_tunnel nfnetlink_cttimeout nfnetlink openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ext4 mbcache jbd2 vmw_vsock_virtio_transport vmw_vsock_virtio_transport_common vsock ip_tables xfs libcrc32c crct10dif_ce ghash_ce sha2_ce virtiofs sha256_arm64 virtio_blk virtio_console sha1_ce dm_multipath dm_mirror dm_region_hash dm_log dm_mod be2iscsi cxgb4i cxgb4 tls libcxgbi libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi ipmi_devintf ipmi_msghandler fuse scsi_transport_iscsi
[  290.580018] CPU: 1 PID: 58 Comm: kworker/1:2 Not tainted 5.14.0-70.22.1.el9_0.aarch64 #1
[  290.580064] Workqueue: cgroup_destroy css_killed_work_fn
[  290.580125] pstate: a04000c5 (NzCv daIF +PAN -UAO -TCO BTYPE=--)
[  290.580156] pc : memcg_drain_all_list_lrus+0x114/0x2f0
[  290.580195] lr : memcg_drain_all_list_lrus+0x1d0/0x2f0
[  290.580222] sp : ffff80001321bc60
[  290.580243] x29: ffff80001321bc60 x28: 0000000000000001 x27: ffff0000cec87000
[  290.580291] x26: ffff0000c0c234a0 x25: 0000000000000000 x24: ffff000220dc4000
[  290.580335] x23: ffff800011920b20 x22: ffff800011d36cb8 x21: 0000000000000000
[  290.580384] x20: ffff0000cb689b80 x19: ffff0000c0d18a00 x18: 0000000000000000
[  290.580430] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[  290.580480] x14: 0000000000000000 x13: 0000000000000030 x12: 0101010101010101
[  290.580526] x11: 7f7f7f7f7f7f7f7f x10: feff786e71737264 x9 : ffff8000102cd9d0
[  290.580570] x8 : fefefefefefefeff x7 : 000000000000000f x6 : ffff0000c2906474
[  290.580614] x5 : 00000000000020f4 x4 : 0000000000000000 x3 : 0000000000000000
[  290.580662] x2 : ffff0000cebdbe00 x1 : 0000000000000000 x0 : ffff0000cebdbe00
[  290.580702] Call trace:
[  290.580711]  memcg_drain_all_list_lrus+0x114/0x2f0
[  290.580736]  memcg_offline_kmem.part.0+0x150/0x180
[  290.580776]  mem_cgroup_css_offline+0xe0/0x130
[  290.580798]  css_killed_work_fn+0x5c/0x160
[  290.580815]  process_one_work+0x1ec/0x4b0
[  290.580843]  worker_thread+0x180/0x540
[  290.580872]  kthread+0x134/0x140
[  290.580894]  ret_from_fork+0x10/0x18
[  290.580921] Code: eb00029f 54000120 a9400684 f9400040 (f9000482) 
[  290.580964] ---[ end trace f219181f41465674 ]---
[  290.580994] Kernel panic - not syncing: Oops: Fatal exception
[  290.581029] SMP: stopping secondary CPUs
[  290.581086] Kernel Offset: disabled
[  290.581177] CPU features: 0x00000141,4b307ec0
[  290.581214] Memory Limit: none
[  290.581238] Rebooting in 10 seconds..

Logs Captured from `journalctl -f` in SSH Session

Sep 17 17:06:28 crc-nz7ls-master-0 systemd-udevd[19338]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Sep 17 17:06:28 crc-nz7ls-master-0 systemd-udevd[19338]: Could not generate persistent MAC address for vethc305cf9e: No such file or directory
Sep 17 17:06:28 crc-nz7ls-master-0 NetworkManager[1279]: <info>  [1663434388.0299] device (vethc305cf9e): carrier: link connected
Sep 17 17:06:28 crc-nz7ls-master-0 NetworkManager[1279]: <info>  [1663434388.0301] manager: (vethc305cf9e): new Veth device (/org/freedesktop/NetworkManager/Devices/178)
Sep 17 17:06:28 crc-nz7ls-master-0 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): vethc305cf9e: link becomes ready
Sep 17 17:06:28 crc-nz7ls-master-0 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Sep 17 17:06:28 crc-nz7ls-master-0 NetworkManager[1279]: <info>  [1663434388.0421] manager: (vethc305cf9e): new Open vSwitch Port device (/org/freedesktop/NetworkManager/Devices/179)
Sep 17 17:06:28 crc-nz7ls-master-0 ovs-vswitchd[1099]: ovs|00797|bridge|INFO|bridge br0: added interface vethc305cf9e on port 60
Sep 17 17:06:28 crc-nz7ls-master-0 kernel: device vethc305cf9e entered promiscuous mode
Sep 17 17:06:28 crc-nz7ls-master-0 ovs-vswitchd[1099]: ovs|00798|connmgr|INFO|br0<->unix#584: 5 flow_mods in the last 0 s (5 adds)
Sep 17 17:06:28 crc-nz7ls-master-0 ovs-vswitchd[1099]: ovs|00799|connmgr|INFO|br0<->unix#587: 2 flow_mods in the last 0 s (2 deletes)

About this issue

Original URL
State: closed
Created 2 years ago
Comments: 69 (67 by maintainers)

Most upvoted comments

There’s not an arm64 build of OKD yet, so I can’t try it with FCOS.

Unfortunately no, https://github.com/okd-project/okd/issues/1165

@cgruver I am still interested to see microshift bundle testing if you can do.

praveenkumar on Dec 8, 2022

my IDE is ssh+vim 😕 but I can retest running some builds in the background

cfergeau on Dec 7, 2022

crc was running by itself, I had some (failing) loop communicating with the cluster (odo dev running with a sample nodejs application which failed to deploy, deployment was retried every 30s because I was doing touch app.js).

I used crc config memory ...

cfergeau on Dec 7, 2022

@praveenkumar testing with MicroShift is a good idea.

I’ll try the crc Podman bundle too. I’ve not used it since I have Podman already on my MacBook.

cgruver on Dec 6, 2022

The thread starting at https://github.com/Code-Hex/vz/pull/63#issuecomment-1275811980 and issue https://github.com/Code-Hex/vz/issues/13 hints at some potential memory corruption coming from Code-Hex/vz, this may be related to what you are seeing.

cfergeau on Oct 12, 2022

Perhaps set up a call. At the moment I can only suggest to monitor. Have to look for someone else to repro this on an M2 or otherwise obtain one.

gbraad on Sep 27, 2022

When I built it from code, it seemed pretty stable.

That should not matter (for crc unless you do a full rebuild). The VM runs with a driver based on vfkit+vz. This is independent of the crc management tool. If a panic happens, this is in the Virtualization driver maintained by another codebase.

gbraad on Sep 26, 2022

I built the crc executable from the v2.9.0 tag and am testing with the 4.11.3 bundle.

So far, so good…

If things continue to look good, I’ll close this tomorrow as resolved in release v2.9.0.

cgruver on Sep 19, 2022

Yeah, worried it’s something m2 specific 😕 A 4.11.3 bundle with a newer kernel is about to be released, would be worth trying it to see if it’s better

cfergeau on Sep 19, 2022