firecracker: Occasional KVM_EXIT_SHUTDOWN and bad syscall (14) during VM shutdown
Firecracker-containerd recently updated a test case that previously ran 5 VMs to now run 100 and started seeing the following errors occur in ~0-3 VMs per test case (issue).
As part of terminating VMs, we tell systemd in the guest to reboot the system. Most of the time this works fine, we see reboot: machine restart in the systemd logs and the VMM exits with 0 as expected right after.
Occasionally though, the guest reaches that reboot: machine restart line and then we see a few logs lines from Firecracker like:
2019-12-06T20:29:30.007995741 [anonymous-instance:INFO] Received KVM_EXIT_SHUTDOWN signal
2019-12-06T20:29:30.008087608 [anonymous-instance:ERROR] Shutting down VM after intercepting a bad syscall (14).
Followed by the VMM exiting with code 148. The KVM_EXIT_SHUTDOWN case and bad syscall (which appears to be rt_sigprocmask) ~seem to only ever occur together (never just one or the other)~ and only right after the reboot call should be occurring.
- EDIT: I’ve now very rarely seen
KVM_EXIT_SHUTDOWNoccur without an accompanyingrt_sigprocmaskcall that results in the seccomp violation. WhenKVM_EXIT_SHUTDOWNoccurs by itself, the VMM exits with 0 with no apparent issues.
Additionally, I don’t see this for every VM that exits like that, but am occasionally getting a line like this in dmesg after a run of the test case where failures occur:
[10548769.845897] traps: fc_vcpu0[52462] trap invalid opcode ip:527d36 sp:7f7ee67f0590 error:0 in firecracker[400000+1ab000]
No idea if it’s related or, if it is, why it wouldn’t be written every time a VM exits with KVM_EXIT_SHUTDOWN+the bad syscall, but figured it was worth mentioning.
This same behavior occurs in both 0.19 (w/ rust1.35+musl) and a 0.20 binary I built from here (w/ rust1.38+musl).
Let us know what, if anything, would be helpful for further reproducing or debugging as it seems to be a pretty rare issue. We can run our 100 VM test case with debugging binaries and/or see if it’s possible to make docker container reproducer for you to use or something like that.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 16 (12 by maintainers)
I’m not sure if it’s helpful, but I ran the test case with strace attached to all firecracker processes, the occurrence of the error w/ full rt_sigprocmask args looks like:
What’s kind of interesting is that I can also see some strace logs for different VMs showing
fc_vcpu0threads that apparently receiveKVM_EXIT_SHUTDOWNbut then don’t callrt_sigprocmask, allowing them to exit with 0:so I think my previous statement that
KVM_EXIT_SHUTDOWNand the seccomp violation always occur together actually isn’t true, very rarelyKVM_EXIT_SHUTDOWNwill happen without the bad syscall.Background
On reboot each vcpu will:
x86_64:CMD_RESET_CPUto thei8042which will generate avmm-exit-event.x86_64andaarch64, on more recent host kernels (I need to crawl the linux changelog to see where exactly this was introduced):KVM_EXIT_SHUTDOWNwhich also generates avcpu-exit-eventwhich is interpreted as avmm-exit-event.On a
vmm-exit-eventthe vmm will cleanup some things then exit firecracker withexit code 0.Bug root cause
The problem described in this issue is that firecracker vcpu threads finish their execution thread shortly after issuing the
vcpu-exit-event. As part of cleaning up a finished threadmuslinvokessigprocmask. Usually the vcpu thread cleanup doesn’t get that far because the vmm thread quickly captures thevmm-exit-eventand exits the process cleanly, but on a system with high load, the vmm thread might be scheduled out for a while and this bug in the firecracker code is exposed.Proposed solution
Vcpumain loop should not end after issuingvcpu-exit-event, instead it should move to thePausedstate and wait for further commands fromVmm. This will allow theVmmthread full control on what should happen next, and will also fix the incorrect behavior reported in this issue.KVM_EXIT_SHUTDOWNis not something we need to whitelist. It’s just a type ofVcpuExit, and we handle it in the vcpu’s emulation loop.The weird thing is this:
rt_sigprocmaskis called by libc, originating inpthread_exit. See https://github.com/firecracker-microvm/firecracker/issues/1088#issuecomment-539974241So, when the vcpu threads exit, they should be calling
sigprocmaskand causing aSIGSYS.@sipsma this looks like a bug on our side. We should gracefully handle
KVM_EXIT_SHUTDOWN, thanks for the deep-dive and help on finding the root cause 👍We decided to go with just having the guest init system do a reboot because ctrl-alt-del doesn’t work on all kernels and isn’t supported on aarch64 (which we’d like to support someday), so avoiding it keeps our dependencies to a minimum.