symbolic: Project Metabug: Kill Breakpad

High-level goal: improve rust-minidump (and related libraries) to the point that it can replace all the uses of breakpad in Sentry and Firefox.

For now this should be restricted to the scope of:

  • x86, x64, ARM, ARM64
  • Windows, Android, MacOS, Linux, (iOS?)

Subtasks

NOTE: Larger tasks are checked off even if they have incomplete subtasks to indicate that they are complete for the purposes of the current milestone.

  • Ensure rust-minidump can parse and expose all the minidump details we rely on

  • Replace the derlict breakpad-symbols subcrate with a new symbolication implementation

  • Complete dump_syms support of the various unwinding info formats (via symbolic):

  • Implement an (offline) unwinder

    • Potentially make a new independent crate for this, instead of being in minidump-processor
    • ARM stack walker
      • Scanning
      • Frame-pointer-based
      • CFI-based
    • x86/x64 stack walker
    • Can handle native unwinding tables (minidump-analyzer)
      • PDB
      • DWARF CFI
      • Compact Unwind Info
    • Can run online (moz-stackwalk)
      • Can be a backtracer (moz-stackwalk)
      • Can implement the panic! usecase (handling personality/lsdas to run dtors, catch_panic) (no use, just cool)
      • nostd compatible (don’t allocate!) (no use, just cool)
  • Implement client-side minidump generation (Bugzilla#1588530) (moz-breakpad-client, sentry-breakpad-client)

    • Can invoke native windows minidump APIs
    • Can generate fake minidump on Linux
    • Can generate fake minidump on MacOS
    • Can generate fake minidump on Android
    • Can generate fake minidump on iOS? (only Sentry would need this?)

The Context

Minidumps are a Microsoft-designed format for more compact dumps of a process’s state when it crashes, notably including full memory dumps of every thread’s stacks/registers and mapped code modules (libraries that are linked in and what addresses they were mapped to).

Windows has native APIs for generating minidumps, but this is a feature that’s desirable on other platforms, so google-breakpad was created to generate “fake” minidumps on other platforms and process them all uniformly. The most important output of this process is backtraces for every thread, but additional context stored in minidumps may be useful for debugging weird stuff like “the user’s antivirus DLL-injected itself into out process and messed everything up” or “oh look the last syscall failed right before we crashed”.

Both Firefox and Sentry rely on breakpad for minidump generation and handling. Unfortunately, breakpad is written in dangerous C++ and basically abandoned by google. Mozilla doesn’t bother upstreaming our patches anymore, and it’s too much work to maintain it.

Usecases

Here’s the places where we use breakpad now that should work with a replacement. Each has a codename so that tasks/milestones can reference them.

Mozilla Usecases

  • minidump-stackwalk: On the server-side, Mozilla uses breakpad in minidump-stackwalk to process minidump-based crash reports for socorro.

  • moz-breakpad-client: On the client-side, Mozilla uses breakpad in our crash-reporter to generate minidumps. For content-process (~tab) crashes, the main process does this work out-of-crashing-process. For main-process (full browser) crashes, the main-process does this work in-crashing-process. Ideally we would have a separate crash-reporting process on the side that monitors the others so that all our handling can be out-of-crashing-process.

  • minidump-analyzer: On the client-side, Mozilla uses breakpad in our minidump-analyzer to try to analyze the contents of the minidump using the client machine’s knowledge of its own system libraries and any local debuginfo we ship with firefox. This allows us to get more accurate symbolication/unwinding. (This also includes some of our own adhoc symbolication/unwinding code which is Buggy) and ideally would be replaced

  • moz-stackwalk: As a stretch-goal, this work would ideally also replace the need for moz-stackwalk (our own runtime backtracer for debug build backtraces and profiler probing) and fix-stacks (cleans up moz-stackwalk’s output using native symbols).

Sentry Usecases

  • symbolicator: On the server-side, Sentry uses breakpad inside of symbolicator to process minidumps and extract a meaningful stack trace.

  • sentry-breakpad-client: On the client-side, sentry-native uses crashpad or alternatively breakpad to create minidumps of the crashing process to send over for server-side post-processing.

Microsoft Usecases?

TBD!

Current Roadmap

Milestone 1 - minidump-stackwalk

Metabug: https://github.com/luser/rust-minidump/issues/153

Mozilla would like to first get the minidump-stackwalk usecase working, as it’s the simplest but also very high traffic (performance matters), and processing user-provided data on our servers (security matters).

minidump-stackwalk only needs to handle pre-processed symbols from our symbol servers (i.e. the breakpad text format), and is operating completely offline from where the minidump was generated.

The hardest part will be generating backtraces for all the threads, which requires a complete offline unwinder.

Milestone 2 - minidump-analyzer

TBD, may choose different goal based on how Milestone 1 goes

Milestone 3 - symbolicator

TBD, may choose different goal based on how Milestone 2 goes

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 5
  • Comments: 20 (18 by maintainers)

Most upvoted comments

We removed breakpad from our own symbolicator service 🎉 , and the code in this repo is now behind a feature flag, and will be removed completely on the next major.