go: runtime/debug: add SetCrashOutput(file *os.File)

Currently there are difficulties with nicely capturing crash output from Go programs. Crashes print to stderr, however, usually both stdout and stderr are used by programs for other purposes. While it’s not wrong to output it to stderr, it mixes two outputs together making later separation more difficult.

Capturing crashes is useful for post-mortem debugging and sending reports. Of course, due to unknown crashed program state using a custom callback is understandably dangerous.

A simpler alternative would be to allow changing file descriptor where to print crashes. In the simplest form it could look like:

package main
import "runtime/debug"

func init(){
	debug.SetCrashOutputFD(3)
}

This would allow passing in a fd for separate file, pipe to another “crash monitor” process or connect to an external server. Of course there’s a slight possibility of write taking a long time, when passing in a pipe/connection slowing down the crashing of the program.

With regards to naming it could also be SetCrashFD, SetTracebackFD or SetTracebackOutputFD.

About this issue

Original URL
State: open
Created 4 years ago
Reactions: 25
Comments: 124 (117 by maintainers)

Commits related to this issue

ringlogger: hook into global panic writer This is a grotesque hack, and hopefully upstream Go will provide a nicer way of doing this, but already it seems quite adept at catching panics. See https://... — committed to WireGuard/wireguard-windows by zx2c4 4 years ago
ringlogger: hook into global panic writer This is a grotesque hack, and hopefully upstream Go will provide a nicer way of doing this, but already it seems quite adept at catching panics. See https://... — committed to WireGuard/wireguard-windows by zx2c4 4 years ago
ringlogger: hook into global panic writer This is a grotesque hack, and hopefully upstream Go will provide a nicer way of doing this, but already it seems quite adept at catching panics. See https://... — committed to WireGuard/wireguard-windows by zx2c4 4 years ago
ringlogger: hook into global panic writer This is a grotesque hack, and hopefully upstream Go will provide a nicer way of doing this, but already it seems quite adept at catching panics. See https://... — committed to WireGuard/wireguard-windows by zx2c4 4 years ago
ringlogger: hook into global panic writer This is a grotesque hack, and hopefully upstream Go will provide a nicer way of doing this, but already it seems quite adept at catching panics. See https://... — committed to WireGuard/wireguard-windows by zx2c4 4 years ago
logpolicy: only log panics when running under systemd Given that https://github.com/golang/go/issues/42888 is coming, this catches most practical panics without interfering in our development environ... — committed to tailscale/tailscale by crawshaw 3 years ago
logpolicy: only log panics when running under systemd Given that https://github.com/golang/go/issues/42888 is coming, this catches most practical panics without interfering in our development environ... — committed to tailscale/tailscale by crawshaw 3 years ago
logpolicy: only log panics when running under systemd Given that https://github.com/golang/go/issues/42888 is coming, this catches most practical panics without interfering in our development environ... — committed to tailscale/tailscale by crawshaw 3 years ago
runtime: allow builtin write function to be redirected with function pointer The x/sys/windows package currently uses go:linkname for other facilities inside of runtime that are not suitable to be ex... — committed to golang/go by zx2c4 4 years ago
gopls/internal/lsp/protocol: report panics in RPC goroutines This change adds recover() + bug.Reportf to the generated dispatch routines for the server and client, so that we learn through telemetry ... — committed to golang/tools by adonovan 7 months ago

Most upvoted comments

On further reflection, one important benefit of *os.File would be that it would let the runtime/debug package hold on to the underlying file object and keep it from being garbage collected, which in turn would keep the fd from being closed. Changing to *os.File.

rsc on Apr 28, 2021

OK, so it sounds like this is where we are.

We separate out crash output from user-generated and other runtime tracing prints (GODEBUG=gctrace=1). Crash output is eligible to be copied into a separate system log, but it continues to be printed to standard error as well (otherwise a crashing program would exit silently).

On all systems, debug.SetCrashFD(fd uintptr) sets the current redirection file descriptor / handle.

Additionally, x/sys/windows has a linkname backdoor to implement

// SetCrashEvent makes Go runtime output during crashes and panics
// to be delivered to the event log l using the specified event id eid.
func (l *Log) SetCrashEvent(eid uint32) {

We discussed having GODEBUG=crashfd=3, but on reflection that seems like a dangerous thing to have child processes inherit (especially if the parent doesn’t know to preserve fd 3 when starting the child, fd 3 will be some unrelated file at crash time). So let’s not do that part.

Any objections to this plan?

rsc on Mar 31, 2021

Unexported hooks are really problematic from an API standpoint, especially if widely used programs adopt them. That can’t possibly be the right approach here.

And hooks generally have the problem that the runtime may not be in a good enough state to run the code in the hook. It is basically impossible to do anything interesting in a signal handler these days on Unix for the same reasons. Lots of signal handlers seem to work but actually wedge the program in rare cases. I don’t want to repeat that mistake here.

It seems OK for x/sys/windows to do setup, but we really need the critical “can’t do much safely” implementation to be in package runtime, where it can be adjusted and updated as the runtime changes. What is the API that x/sys/windows would need for that?

rsc on Feb 17, 2021

The issue I see here is just how much we let Go programs attempt to do during a crash, in which case the runtime may be in a terrible state and really not capable of much. In particular it may not be capable of running arbitrary Go code.

Writing to a different fd is easy. Calling a reporting DLL call on Windows with a user-supplied dwEventId also seems easy. In both cases it’s just one system call. I’m not sure we need to (or can reliably) support arbitrary reporting callbacks such as code to write to an mmap’ed ring-buffer file. (Especially if you can already write to a non-ring-buffer file with the fd redirect.)

@alexbrainman says that it’s complicated to explain what the dwEventId is, but we don’t need to do that in the runtime. We just need to accept a dwEventId and pass it along to the Windows API call.

rsc on Feb 3, 2021

@alexbrainman, so roughly what I’m thinking is the following. However, I might also forget some internal details that could make the following difficult with Windows.

The most basic thing is to send the output to a separate file:

package main

import (
	"os"
	"runtime/debug"
)

func main() {
	f, err := os.Open("service_log.txt")
	if err != nil {
		os.Exit(1)
		return
	}

	debug.SetCrashFD(f.Fd())

	// rest of the service logic
}

The next step would be to create a watchdog sub-process:

package main

import (
	"os"
	"os/exec"
	"runtime/debug"
)

func main() {
	r, w, err := os.Pipe()
	if err != nil {
		os.Exit(1)
		return
	}

	// this could also self-start, but in a different mode
	// but we'll use a separate binary for now
	cmd := exec.Command("watchdog.exe") 
	cmd.Stdin = r
	// also set other flags here to ensure that the watchdog
	// doesn't immediately close together with the parent.
	cmd.Start()

	debug.SetCrashFD(w.Fd())

	// rest of the service logic
}

You could also create governors using ExtraFiles:

package main

import (
	"os"
	"os/exec"
	"runtime/debug"
)

func main() {
	r, w, err := os.Pipe()
	if err != nil {
		os.Exit(1)
		return
	}

	cmd := exec.Command("watchdog.exe") 
	cmd.ExtraFiles = []*os.File{w}
	cmd.Start()

	// monitor `r`
}

// The other process has `debug.SetCrashFD(3)` in init,
// with some check that `3` actually exists.

For windows, I think (named) pipes could also be used:

f, err := os.Open(`\\.\pipe\randomhexvalue`)
...
debug.SetCrashFD(f.Fd())

egonelbre on Dec 2, 2020

Like @zx2c4, I’d also be interested in getting the the runtime’s crash println statement, as they often provide critical context about a throw, which can’t do formatting.

In my view, there are three different types of println calls:

prints immediately prior to a throw. In my view, these are effectively part of the throw, providing context about the crash, which may be critical to understanding the crash. To be honest, I’m not sure why one wouldn’t want this, beyond technical complexity.
Runtime prints unrelated to crashing (e.g., gctrace, schedtrace). I personally don’t have a preference either way here.
User-generated prints. These probably shouldn’t be included, but I don’t think I’ve ever seen one in practice, so if they are sufficiently rare, I don’t think it would be awful to include them.

prattmic on Apr 21, 2021

Adding that it’s possible to write programs that work both as a service and as a regular command line tool.

egonelbre on Jan 27, 2021

I don’t think it’s a good idea to have the runtime call ReportEvent on its own. Most apps do not use the eventlog. Some services use the eventlog, but not all services use it. The Go api for the eventlog is not in the Go runtime; it’s in a helper auxilary module of x/sys/windows. Finally, my own services do not use the eventlog; in my case, I have a file-backed mmap’d ring buffer for this stuff where I simply copy the bytes.

Hence, I’m learning toward CL278792 as the best general solution, combined with https://github.com/golang/go/issues/42888#issuecomment-763638252 .

zx2c4 on Jan 27, 2021

Change https://go.dev/cl/548121 mentions this issue: runtime/debug: add Example use of SetCrashOutput for a watchdog

gopherbot on Dec 8, 2023

Change https://go.dev/cl/547978 mentions this issue: runtime/debug: SetCrashOutput sets the FD for fatal panics

gopherbot on Dec 7, 2023

What’s the status on this? Proposal has been accepted but no patch yet, correct? Is anybody working on that? We have a Windows service executable which once in a while crashes, we suspect concurrent map writes, but without the traces we are pretty much lost how and where this happens. So a way to redirect that to a file would be greatly appreciated!

martinrode on Oct 14, 2023

No change in consensus, so accepted. 🎉 This issue now tracks the work of implementing the proposal. — rsc for the proposal review group

rsc on May 12, 2021

Instead of debug.SetCrashFD(fd uintptr) it would be debug.SetCrashFile(f *os.File), because of the concerns raised by @alexbrainman in https://github.com/golang/go/issues/42888#issuecomment-749352457.

Unless there’s a reason we don’t want to use *os.File there.

egonelbre on Apr 21, 2021

It is effectively a global defer, so I can’t think of any panics it wouldn’t work on.

Exactly. Indeed, it is limited, but to us there was a whole class of crashes we could now properly inspect.

Although I do think a better way to have a generic way to go about it would be to leverage a native crash reported, such as breakpad or crashpad, but there are some blocking issues for that on ARM64.

steeve on Feb 18, 2021

@steeve @zx2c4 I believe that https://github.com/znly/go/commit/6af49debb0b604c1bfe209d5801f985afb327c72 should work for all panics, but only panics. It is effectively a global defer, so I can’t think of any panics it wouldn’t work on.

However it would certainly not work for throws, which this issue would like to capture as well. If the runtime is in a corrupt state, I’m not sure it would even be able to do the interface conversion to pass an error to the crash function. The contents of onPanic allocating a buffer and calling complex functions like runtime.Stacks and fmt.Println are also examples of the kinds of things that the runtime may not be able to support while throwing.

prattmic on Feb 18, 2021

@zx2c4 SetPanicNotifier is an example of a callback that, as @rsc says, has the problem that the runtime may not be in a good enough state to run the code in the hook.

ianlancetaylor on Feb 18, 2021

For what is worth, we have implemented it at https://github.com/znly/go/commit/6af49debb0b604c1bfe209d5801f985afb327c72

It works well for us and it’s pretty simple:

    import (
        "fmt",
        "runtime"
    )

    func onPanic(err interface{}) {
    	   fmt.Println("panic reason:", err)
    	   dump := make([]byte, 1*1024*1024) // 1 MB buffer
    	   runtime.Stack(dump, true)
    	   fmt.Println(string(dump))
    }

    func main() {
    	   runtime.SetGlobalPanicHandler(onPanic)

    	   // provoke a panic from a SIGNAL
    	   var i *int
    	   *i = 42

    	   println("program exit normally")
    }

steeve on Feb 17, 2021

Change the API to accept *os.File instead of fd.

@egonelbre I prefer your option 3. It makes new API usage obvious. And I could use approach similar to https://github.com/golang/go/issues/42888#issuecomment-737616766 to write Windows Service crash into a file.

Thank you.

Alex

alexbrainman on Dec 22, 2020

I hate how complex that logic is for such a fundamental operation, but maybe it would make sense to do something similar for Windows services and have their writes go straight to the event log? IMO, in both cases this should be the OS’s job, but it’s not doing it.

Sticking that into the Go runtime itself is not a good idea. EventLog is complicated – more so than the Android logger functions – and as such will require configuration. This is the kind of thing best suited to go in x/sys/windows/svc or similar. The x/sys/windows package already uses //go:linkname for necessary faculties exposed by the runtime. Adding one for setting the logger callback function would be very sensible. I’ll send a patch for that.

zx2c4 on Dec 16, 2020

Does ExtraFiles work on Windows?

It does not.

I wonder why not. StartProcess can pass Files just fine. Do you know what’s different about ExtraFiles? Seems like that’s a bug we should fix.

zx2c4 on Dec 5, 2020

Oh, you know if all you’re trying to do is dup2 over stderr on Windows, you can already hack the panic and println output fd like this:

	f, _ := os.Create(`C:\helloworld.txt`)
	windows.SetStdHandle(^uint32(11), windows.Handle(f.Fd()))
	panic("oh nose!")

That works reasonably well, even inside of services:

Changing the fd isn’t particularly difficult, as you can see there, which is why I keep poking at, “what about running custom logger functions?”, which is the much more interesting goal (to me, at least).

zx2c4 on Dec 3, 2020

The first process is the monitor process and opens the FD, then starts the second process with the GODEBUG environment set. Does this not work for Windows services?

Yes, it could work. The main drawback is that windows services need to respond to control messages. https://pkg.go.dev/golang.org/x/sys/windows/svc#Handler

This would mean that the governor process would need to delegate all those messages to the subprocess, which would introduce an additional complication. Although, there might be a way to make subprocess handle the service messages directly, I haven’t seen it.

egonelbre on Dec 2, 2020

@zx2c4 I’m not fond of documenting a callback to use with go:linkname either. I tend to think that people who operate at that level can customize their Go distribution.

Yea, documenting it is probably the wrong thing to do. But having something in there akin to nanotime would be “very nice”…and would allow me to implement something for Go’s Windows service package that uses it to ship things to eventlog automatically.

Anyway, I realize pushing this too hard is an loosing battle, for good reason. But if somehow the implementation happened to be structured in a way that made it coincidentally possible to go:linkname it from elsewhere, I would be very happy.

zx2c4 on Dec 2, 2020

@zx2c4 I’m not fond of documenting a callback to use with go:linkname either. I tend to think that people who operate at that level can customize their Go distribution.

ianlancetaylor on Dec 2, 2020

Should we just have GODEBUG=debugfd=3?

/cc @aclements

rsc on Dec 2, 2020

@egonelbre how do you propose to use your API on Windows?

I agree. I wanted this feature to capture crash dumps of a Windows service.

/cc @zx2c4

Alex

alexbrainman on Dec 1, 2020