go: proposal: errors: let GODEBUG=errstacktrace request stack backtraces

Errors in Go serve two main purposes: reporting problems to the user of the program, and reporting problems to the developer of the program. The user essentially never wants to see stack traces. For a developer, they can be helpful. Thus it is reasonable for developers to use packages like https://pkg.go.dev/github.com/go-errors/errors to collect developer errors–errors that the user is not expected to see.

That said, while developing a program, there are times when it is helpful for a user error to include a stack trace. For example, this is useful when some program bug is causing an unexpected user error. Getting a stack trace may help speed up debugging. This can also be useful when a user reports an inexplicable user error that is repeatable with proprietary user data; here again a stack trace may help the developer debug the problem.

Therefore, I propose a new GODEBUG setting: GODEBUG=errstacktrace=1. When this GODEBUG entry is set in the environment, the errors.New and fmt.Errorf functions will change to incorporate a stack trace into the message. This stack trace will be part of the string returned by the Error method (which will now be a lengthy, multi-line, string).

This will permit developers debugging an obscure but repeatable problem to set an environment variable to request a stack trace for the error that they are seeing.

We could also consider providing a simple mechanism for other error generation packages to check whether this GODEBUG setting is turned on.

This was inspired by a comment by @fsaintjacques : https://github.com/golang/go/issues/60873#issuecomment-1745646649.

About this issue

Original URL
State: open
Created 9 months ago
Reactions: 26
Comments: 26 (13 by maintainers)

Most upvoted comments

As an author of the gitlab.com/tozd/go/errors stack trace attaching errors package, I think having an opt-in stack trace recording of standard errors would be great. But the current proposal has some issues:

I do not think it is reasonable or even workable to attach stack trace to Error string:

This makes it behave very differently when stack trace is attached or not. For example, how does fmt.Errorf("%w: extra", err) behaves, when err already contains a stack trace? It attaches extra to the end of the stack trace while without GODEBUG it does it to the end of the Error line.
Attaching a stack trace AND formatting it is expensive. Adding it blindly to all errors might introduce a lot of unnecessary work of string manipulation.

I think instead:

Standard errors should implement StackTrace() []uintptr interface which return nil without GODEBUG flag and what callers() provide when GODEBUG is set.
- Alternatively, standard errors could implement this interface only when GODEBUG is set.
This records a minimum of the stack trace and this is it. Then you can do with this stack trace information whatever you want. Also, the return type is simply []uintptr to not have another type defined.
Formatting of standard errors can then include a stack trace after the Error if formatting is done with %+v, which is kinda standard for this these days.

How exactly is recording of stack traces enabled, using GODEBUG or some other mechanism I will leave to others to discuss. But I feel strongly that my proposal above nicely combines with existing approaches while not breaking anything.

mitar on Oct 3, 2023

This proposal seems to have a gap where sentinel errors come into the picture. That is, things like:

var ErrFoo = errors.New("blah blah blah")

func Bar() error {
    return ErrFoo
}

For one, that call to errors.New probably cannot attach a stack trace, at least not in any meaningful sense. But also, I kind of want the return to implicitly attach a stack trace, which is a thorny matter in legacy cases where people are using == instead of errors.Is.

It seems like under this proposal, I’d have to instead do:

var ErrFoo = errors.New("blah blah blah")

func Bar() error {
    return fmt.Errorf("%w", ErrFoo)
}

which seems kind of awkward. (Yes I probably want to do that anyway to force the use of errors.Is but still.)

rittneje on Oct 3, 2023

I like @mateusz834 suggested approach with zig Error return traces.

From a developer’s perspective, I’d prefer to have the option to switch to errors with a full stack trace, even if it comes with a slight performance trade-off, as it enhances code readability. Let’s consider the example of the CopyFile function. In doing so, we can observe that adding a stack trace to just one function would require developers to use fmt.Errorf five times. If I have to implement a full stack trace in my entire application, it would necessitate modifications in thousands of places where I’d need to wrap err with fmt.Errorf.

func Copy(dst, src string) error {
        if dst == src {
                return fmt.Errorf("%w", ErrInvalid)
        }

        srcF, err := Open(src)
        if err != nil {
                return fmt.Errorf("%w", err)
        }
        defer srcF.Close()

        info, err := srcF.Stat()
        if err != nil {
                 return fmt.Errorf("%w", err)
        }

        dstF, err := OpenFile(dst, O_RDWR|O_CREATE|O_TRUNC, info.Mode())
        if err != nil {
                 return fmt.Errorf("%w", err)
        }
        defer dstF.Close()

        if _, err := io.Copy(dstF, srcF); err != nil {
                 return fmt.Errorf("%w", err)
        }
        return nil
}

krhubert on Nov 2, 2023

The advantage of adding the stack trace to the Error string is that it works in a program that has not already been designed to work with error stack traces. If you have to change the code to print or otherwise handle the stack trace, then it seems to me that you may as well use an existing package that manages stack traces for you.

That said, I agree that it makes sense to add a way to retrieve just the stack trace from the error, if there is one.

I also agree that we should arrange that using %v or %w in a fmt.Errorf doesn’t blindly copy the stack trace string. We should fetch the error string without the stack trace, format it, and then add in the stack trace. Right now I think it should just preserve the original stack trace, and not add a new one. Perhaps some more clever option is available.

ianlancetaylor on Oct 3, 2023

Re: Zig Error Return Traces, which tracks the code path that an error took to get to the user.

Stacktraces are useful in figuring out problems with how an unexpected error happened, but Error Return Traces are useful in figuring out problems in how an expected error was handled (flow control) and surfaced to the end user.

Stack traces have a large initial cost, while error return traces scale with each frame through which an error is returned.

There’s a library for Error Return Traces in Go: https://github.com/bracesdev/errtrace

It seems to me that an ideal hybrid solution would have non-sentinel New() errors attach a StackTrace, but any subsequent error wrapping would add a single frame reference for a Return Error Trace.

Similarly, initial wrapping of a sentinel error would attach a StackTrace, but subsequent wrapping would add a single frame reference for a Return Error Trace.

StevenACoffman on Nov 30, 2023

I want to echo @carlmjohnson’s point about the risk with changing the return of Error() even on an opt-in basis. Yes, matching on Error() is bad but it exists in codebases—usually in tests, but sometimes in business logic. There tends to be mix of strings.Contains(err.Error(), ...) which won’t be affected by this change and err.Error() == ... which will break.

A search across GitHub for instances of exact error string matches outside test code:

https://github.com/search?q=path%3A.go+"err.Error()+%3D%3D"+-path%3A*_test.go&type=code

Some popular Go projects from that search: chaosmonkey, GitLab, nomad, istio. (My intent with those examples is not to call those projects out. I’m trying to highlight that this is a thing that happens even in well-established Go projects.)

abhinav on Nov 1, 2023

The above comment from @fsaintjacques makes me consider a different design, which also happens to be a solution to my earlier feedback. 😄

Building on the design approach for errors.Is and errors.As with special optional methods that error implementations could implement, add errors.StackTrace(err error) *Stacktrace which:

Tests whether err has a method ErrorStackTrace() *Stacktrace. If not, immediately returns nil.
If so, return err.ErrorStackTrace()

The default situation is that no errors have backtraces.

When GODEBUG=errstacktrace is enabled, the errors.New and fmt.Errorf functions include a stack trace in the generated error value. The types returned by both implement the ErrorStackTrace method as returning the saved stack trace, if present. Whether they do that by always returning a type containing a *Stacktrace field that will often be nil or by returning a different type depending on whether the stack trace functionality is enabled is an implementation detail up to the implementer, as far as I can tell.

The Error() function for each of these types does not change at all: it just returns the specified message, as before.

If nothing in the program ever passes an error to errors.StackTrace then the stack trace collection is just some wasted time collecting a stack trace nobody will refer to. But a program that does want to capture a stack trace, for example to send it to Sentry as described above, can try to pass the error to errors.StackTrace and make use of the result whenever it’s non-nil.

In the above I’ve assumed there’s a material cost to capturing a stack trace and thus it would be too expensive to enable by default, but I’ve not actually measured that. If capturing a stack trace is relatively cheap then it would be nicer to not require the extra environment variable at runtime, but I don’t know exactly how the Go runtime tracks stack traces internally.

apparentlymart on Oct 3, 2023

The general idea of this resonates with me, but it feels strange to me that this would be a runtime-changeable setting rather than a compile-time-changeable setting. In my application it would mean that setting this environment variable would expose the detail of which end-user-facing error messages are being derived from error.Error() results and which are hand-written specifically for our application.

I tend to bristle a bit each time the Go runtime gains another way to globally change some runtime/library behavior out from under my application, since that adds yet another dimension of variation to consider when someone reports some unexpected behavior, but I suppose I am more sensitive to that than most Go application developers given that I use Go to develop CLI tools for which it should be irrelevant to the user what language they are written in. Probably it’s more common to use Go to write server processes that are designed to run only in a particular execution environment and where everybody knows well that the program is written in Go.

This is honestly a more broad piece of feedback than just on this proposal, and so I should probably record it somewhere else 🤔 but I would enjoy some way to compile an application such that it does not respond to GODEBUG feature-switching at all, but have a way to explicitly turn on some of these features programmatically based on mechanisms that make more sense for my application, such as the application’s configuration file. Then I would expose only the ones I know about and can see a reason for a user to turn on, and exclude any that have questionable value to my application.

Since I’ve already contributed a large number of words to this issue across two comments I’m going to extend this comment to include a response to Ian’s later comment about it being an advantage to extend the error string, even though that makes this response appear out of order, just because it’s an extension of my existing point above.

I consider it to be a disadvantage that it would affect the behavior of existing programs that have not been modified to request or expect stack traces, since some error messages are written for users and turning this on would significantly change those error messages.

Even though it’s opt-in for a user, there are some applications (such as my own) where the user is not the person that ought to be making this decision, or if it is the user making the decision then it would be better to make it in a way that is more natural for the application.

apparentlymart on Oct 4, 2023

Changing the return value of Error() will break anything doing manual string matching. It’s bad and you shouldn’t do that, but some programs do it anyway, especially when you’re introspecting a third party error you don’t control. Adding a new method seems much less likely to create unexpected breakages.

carlmjohnson on Oct 4, 2023

I want to emphasize that the goal of this proposal is to permit developers to easily enable a way for errors to include stack traces. In particular, developers can ask users to do this when running on their own systems with their own data. Ideas that rely on rebuilding the program should be a different proposal. Ideas that rely on using a different set of calls, or a different set of packages, to get a stack trace should be a different proposal. Thanks.

ianlancetaylor on Oct 4, 2023

Errors are rarely errors, mostly values, i mean nothing exceptional. It would be overkill to add a trace to all errors, like io.EOF, sql.ErrNoRows, custom errors/values… It’s why I like the idea of %w where we decide at each step in the code itself if we need to keep something or not and not globally.

flibustenet on Oct 4, 2023

@mitar Let me restate your suggestion. We keep the Error method as is, but change the fmt package (other than fmt.Errorf) so that in any case where we call the Error method, we also check for and call a Stacktrace method, and, if that method is not empty we call it and also print the stack trace. We change fmt.Errorf so that if it wraps an error it does something intelligent with any stack trace. Does that sound right?

@neild This proposal suggests an optional feature that we expect to only be enabled while debugging. As such, I am thinking it should capture a complete stack trace.

ianlancetaylor on Oct 3, 2023

@flibustenet @StevenACoffman

I agree, the full stack trace is really useful only with the first error in the stack. And one may not always want to include a frame at all.

On another note: I do not think the frame should ever be in the error message…(except for ugly patches 😃)

what if for example I want to share the error with an external service, to obfuscate the frame I would need to manipulate the string (not very nice)
also, good compatibility with previous versions is a big strength of go, therefore the error string should remain unchanged to avoid a lot of refactoring in a lot of projects, and other reasons I guess…(this was already mentioned somewhere in this chat I think).
if anything, changing the error message could be an optional thing, to avoid said refactoring. I don’t see much value in this. I think It should be the logging library that outputs the informations I want/need (i.e.: frame,…).

Personally my favourite decisions would be:

err.Error() output should remain unchanged
the frame should be a [xx]unitptr for first error in the stack and [1]uintptr for other errors
as for the internal logic I guess there are already libraries with good implemnteations that could be refactored and/or integrated with std library…or not if one thinks it’s better to start from scratch (from the std library, not really from scratch)

My favourite questions would be:

do we want this traces always enabled?
do we want this traces enabled by an environment variable?
do we want this traces enabled by an in memory variable (like log/slog level)?
- this would be real nice from a devops perspective, for example:
  - error rate is low -> no traces -> save some overheaf and save some money with less log streaming
  - error rate is high -> traces are automatically enabled without redeployment -> by the time I react to an alert I already can read some traces
- on this note I developed this library just to experiment it’s usability from a coder perspective and for helping with these kind of decisions (you can try it as a drop in replacement to the std errors (probably with some bugs)). this is not going to be maintained and it’s purpose is not to be integrated in the std library if successful, it is a “what if the std library were to behave like this” kind of project.

inifares23lab on Dec 1, 2023

To sum up what I understood the objectives are from the conversations:

standard/unbiased stacktraces as []uintptr
stacktrace string in error string
configurable stacktrace (on/off) (string/standard)
fine grained control over the configuration (for example we want to avoid meaningless stacktraces)
issue when using errors.Join or similar feature, do we climb the error tree or not?

hopefully I got it more or less right 😅

My opinions:

I agree []uintptr should be the standard for the reasons stated above

I think it has already been said but providing a machine readable format(like json) from []uintptr would be much more beneficial and could allow better automation for observability, incident response…

I agree this should be a formatting feature

I think err.Error() should return only the string representation of the error, nothing more

I don’t dislike the environment variable option but:

it makes it difficult to have fine-grained configuration
- i.e.: 70% of code is beautiful, bugless and errors are nicely decorated(i want only plain error strings), the other 30% is a weird mess (i want the stacktraces)
It’s not easily configurable at runtime, why would we want to have to redeploy a server? the issue might not happen at a later time, the error decorations may not be optimal and we may have to spend lot of time just to guess where the error happened; runtime configuration can mitigate this.

I agree
no Idea what would be best but it also could be configurable

Possible solutions:

As I experimented here https://github.com/golang-exp/errors-poc, configuration can happen at runtime with an ErrorLevel akin to a LogLevel using methods similar to log/slog.

let this be the standard following what @mitar suggested satisfying marshaling and unmarshaling interfaces
maybe we can add a configurable function for formatting that defaults to err.Error() i.e. var String = func(err error) string and this can be picked up by fmt, logger,… when handling an error (??)
@ianlancetaylor the runtime configuration like log/slog would allow a simple line or two in the main function to have the same outcome of the GODEBUG=errorstacktrace=1 environment variable but with this configuration being dynamic (see here)
@rittneje also addressed here, it is a bit verbose but it’s entirely optional, the code can become:

`var ErrFoo = errors.New(“blah blah blah”)

func Bar() error { return fmt.NewErrorFormatter(LEVEL_NO_STACK).Errorf(“%w”, ErrFoo) }`

this could also be configurable dynamically using the same techniques

Opaque I think I have seen the Opaque function being implemented in the future errors (or maybe I have seen it somewhere else); that’s nice.

Pop It would be nice to have a Pop function extracting the last error struct{} whatever that may be(string, programCounter,…)

ErrorLevelVar with offsets: moreover different errorer or errorformatter instances could be initialized with an offset, this is to allow a single ErrorLevelVar to configure the defaults as well as other instances. example:

package X is very nice and clean and uses default errorer with default error level (i.e LEVEL_NO_STACK = 0),thus behaving as current errors
package Y is slightly messy the errorer instance here has an offset +1 from the default, thus it’s generating stacktraces when `ErrorLevelVar is 0 or more
package Z is extremely messy the errorer instance here has an offset +2 from the default, thus it’s generating stacktraces when ErrorLevelVar is -1 or more

PS:

this last part is not experimented in the package I wrote.
also I accounted for 4 levels (NO_STACK, PROGRAM_COUNTER, STACK_ON FIRST, FULL_STACK) the last two levels retrieve an array of callers and I actually find them kind of useless; on one hand they may be useful to somebody (maybe ??), on the other hand they would make it difficult to implement the offset solution.

inifares23lab on Oct 4, 2023

This proposal is to add stack traces to errors.

The Go 1.13 errors proposal was to capture a single stack frame, and allow error wrapping to identify the trace of locations producing the error. This approach reduces the size of error values and can represent a trace that passes across goroutine boundaries, but produces less information in the simplest case.

I think any proposal to add stack information to errors needs to consider full stacks vs. frames, and present an argument for which approach is used.

neild on Oct 3, 2023