blog_os: General Protection Fault when running on hardware
I am getting Exception number 0xD when running on hardware.
This also happens on enabling KVM using -enable-kvm on qemu.
What could be the cause of this ?
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 15 (8 by maintainers)
Inserting
unsafe {asm!("push rax" :::: "intel");}at the beginning of the page fault handler fixes it. But I don’t understand why 😄.Ok, here is what I found so far: It’s caused by a
movapsinstruction with an unaligned stack pointer (must be 16byte aligned but is only 8byte aligned). It occurs inside thewrite_fmtcall in theprint_errorfunction, when the page fault handler is called the first time.However, the stack pointer is correctly aligned when the page fault handler is called (it seems like the CPU does this automatically before pushing anything).
I don’t think that the red zone causes this. The red zone is a software convention that allows programs to use some bytes below the stack pointer. It does not change any stack alignment.
IntermezzOS doesn’t have this GP fault because the custom target disables SSE. Thus, the
movapsinstruction, which requires the stack alignment, is no longer available and the compiler generates different code. However, the compiler still assumes that the handler function is called with a correctly aligned stack pointer. It just has no way to exploit this requirement.So disabling SSE might be a workaround, but it’s not a clean solution. A better solution is to write custom assembly stubs for exceptions with and without error code and align the stack correctly in them.
I first inserted random
loop{}statements to narrow it down. Thus I found out that the page fault handler is called correctly. It’s just theprint_errorcall that causes the general protection fault. QEMU interrupt debugging doesn’t seem to work with-enable-kvmand normal GDB breakpoints does not work either. But you can use GDB with hardware breakpoints, e.g.hb vga_buffer.rs:41.So I added a breakpoint on
print_errorand stepped through it. It fails onwriter.write_fmt(fmt);. Next I tried todisassemlethe code in GDB and stepped through it throughsi, which steps a single assembly instruction. The fault occured on amovapsinstruction, so I checked the docs. It throws a general protection fault “if a memory operand is not aligned on a 16-byte boundary”. So I checkedrspjust before themovapsthroughinfo registers rspand indeed, it was not correctly aligned.But I still don’t know what causes the wrong alignment.