blog_os: General Protection Fault when running on hardware

I am getting Exception number 0xD when running on hardware. This also happens on enabling KVM using -enable-kvm on qemu. What could be the cause of this ?

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 15 (8 by maintainers)

Commits related to this issue

Most upvoted comments

Inserting unsafe {asm!("push rax" :::: "intel");} at the beginning of the page fault handler fixes it. But I don’t understand why 😄.

Ok, here is what I found so far: It’s caused by a movaps instruction with an unaligned stack pointer (must be 16byte aligned but is only 8byte aligned). It occurs inside the write_fmt call in the print_error function, when the page fault handler is called the first time.

However, the stack pointer is correctly aligned when the page fault handler is called (it seems like the CPU does this automatically before pushing anything).

I don’t think that the red zone causes this. The red zone is a software convention that allows programs to use some bytes below the stack pointer. It does not change any stack alignment.

IntermezzOS doesn’t have this GP fault because the custom target disables SSE. Thus, the movaps instruction, which requires the stack alignment, is no longer available and the compiler generates different code. However, the compiler still assumes that the handler function is called with a correctly aligned stack pointer. It just has no way to exploit this requirement.

So disabling SSE might be a workaround, but it’s not a clean solution. A better solution is to write custom assembly stubs for exceptions with and without error code and align the stack correctly in them.

I first inserted random loop{} statements to narrow it down. Thus I found out that the page fault handler is called correctly. It’s just the print_error call that causes the general protection fault. QEMU interrupt debugging doesn’t seem to work with -enable-kvm and normal GDB breakpoints does not work either. But you can use GDB with hardware breakpoints, e.g. hb vga_buffer.rs:41.

So I added a breakpoint on print_error and stepped through it. It fails on writer.write_fmt(fmt);. Next I tried to disassemle the code in GDB and stepped through it through si, which steps a single assembly instruction. The fault occured on a movaps instruction, so I checked the docs. It throws a general protection fault “if a memory operand is not aligned on a 16-byte boundary”. So I checked rsp just before the movaps through info registers rsp and indeed, it was not correctly aligned.

But I still don’t know what causes the wrong alignment.