go: proposal: errors: let GODEBUG=errstacktrace request stack backtraces
Errors in Go serve two main purposes: reporting problems to the user of the program, and reporting problems to the developer of the program. The user essentially never wants to see stack traces. For a developer, they can be helpful. Thus it is reasonable for developers to use packages like https://pkg.go.dev/github.com/go-errors/errors to collect developer errors–errors that the user is not expected to see.
That said, while developing a program, there are times when it is helpful for a user error to include a stack trace. For example, this is useful when some program bug is causing an unexpected user error. Getting a stack trace may help speed up debugging. This can also be useful when a user reports an inexplicable user error that is repeatable with proprietary user data; here again a stack trace may help the developer debug the problem.
Therefore, I propose a new GODEBUG setting: GODEBUG=errstacktrace=1. When this GODEBUG entry is set in the environment, the errors.New and fmt.Errorf functions will change to incorporate a stack trace into the message. This stack trace will be part of the string returned by the Error method (which will now be a lengthy, multi-line, string).
This will permit developers debugging an obscure but repeatable problem to set an environment variable to request a stack trace for the error that they are seeing.
We could also consider providing a simple mechanism for other error generation packages to check whether this GODEBUG setting is turned on.
This was inspired by a comment by @fsaintjacques : https://github.com/golang/go/issues/60873#issuecomment-1745646649.
About this issue
- Original URL
- State: open
- Created 9 months ago
- Reactions: 26
- Comments: 26 (13 by maintainers)
As an author of the gitlab.com/tozd/go/errors stack trace attaching errors package, I think having an opt-in stack trace recording of standard errors would be great. But the current proposal has some issues:
I do not think it is reasonable or even workable to attach stack trace to Error string:
fmt.Errorf("%w: extra", err)behaves, whenerralready contains a stack trace? It attachesextrato the end of the stack trace while withoutGODEBUGit does it to the end of the Error line.I think instead:
StackTrace() []uintptrinterface which returnnilwithoutGODEBUGflag and whatcallers()provide whenGODEBUGis set.GODEBUGis set.[]uintptrto not have another type defined.%+v, which is kinda standard for this these days.How exactly is recording of stack traces enabled, using
GODEBUGor some other mechanism I will leave to others to discuss. But I feel strongly that my proposal above nicely combines with existing approaches while not breaking anything.This proposal seems to have a gap where sentinel errors come into the picture. That is, things like:
For one, that call to
errors.Newprobably cannot attach a stack trace, at least not in any meaningful sense. But also, I kind of want thereturnto implicitly attach a stack trace, which is a thorny matter in legacy cases where people are using==instead oferrors.Is.It seems like under this proposal, I’d have to instead do:
which seems kind of awkward. (Yes I probably want to do that anyway to force the use of
errors.Isbut still.)I like @mateusz834 suggested approach with zig Error return traces.
From a developer’s perspective, I’d prefer to have the option to switch to errors with a full stack trace, even if it comes with a slight performance trade-off, as it enhances code readability. Let’s consider the example of the CopyFile function. In doing so, we can observe that adding a stack trace to just one function would require developers to use
fmt.Errorffive times. If I have to implement a full stack trace in my entire application, it would necessitate modifications in thousands of places where I’d need to wrap err withfmt.Errorf.The advantage of adding the stack trace to the
Errorstring is that it works in a program that has not already been designed to work with error stack traces. If you have to change the code to print or otherwise handle the stack trace, then it seems to me that you may as well use an existing package that manages stack traces for you.That said, I agree that it makes sense to add a way to retrieve just the stack trace from the error, if there is one.
I also agree that we should arrange that using
%vor%win afmt.Errorfdoesn’t blindly copy the stack trace string. We should fetch the error string without the stack trace, format it, and then add in the stack trace. Right now I think it should just preserve the original stack trace, and not add a new one. Perhaps some more clever option is available.Re: Zig Error Return Traces, which tracks the code path that an error took to get to the user.
Stacktraces are useful in figuring out problems with how an unexpected error happened, but Error Return Traces are useful in figuring out problems in how an expected error was handled (flow control) and surfaced to the end user.
Stack traces have a large initial cost, while error return traces scale with each frame through which an error is returned.
There’s a library for Error Return Traces in Go: https://github.com/bracesdev/errtrace
It seems to me that an ideal hybrid solution would have non-sentinel
New()errors attach a StackTrace, but any subsequent error wrapping would add a single frame reference for a Return Error Trace.Similarly, initial wrapping of a sentinel error would attach a StackTrace, but subsequent wrapping would add a single frame reference for a Return Error Trace.
I want to echo @carlmjohnson’s point about the risk with changing the return of
Error()even on an opt-in basis. Yes, matching onError()is bad but it exists in codebases—usually in tests, but sometimes in business logic. There tends to be mix ofstrings.Contains(err.Error(), ...)which won’t be affected by this change anderr.Error() == ...which will break.A search across GitHub for instances of exact error string matches outside test code:
https://github.com/search?q=path%3A.go+"err.Error()+%3D%3D"+-path%3A*_test.go&type=code
Some popular Go projects from that search: chaosmonkey, GitLab, nomad, istio. (My intent with those examples is not to call those projects out. I’m trying to highlight that this is a thing that happens even in well-established Go projects.)
The above comment from @fsaintjacques makes me consider a different design, which also happens to be a solution to my earlier feedback. 😄
Building on the design approach for
errors.Isanderrors.Aswith special optional methods thaterrorimplementations could implement, adderrors.StackTrace(err error) *Stacktracewhich:errhas a methodErrorStackTrace() *Stacktrace. If not, immediately returns nil.return err.ErrorStackTrace()The default situation is that no errors have backtraces.
When
GODEBUG=errstacktraceis enabled, theerrors.Newandfmt.Errorffunctions include a stack trace in the generated error value. The types returned by both implement theErrorStackTracemethod as returning the saved stack trace, if present. Whether they do that by always returning a type containing a*Stacktracefield that will often benilor by returning a different type depending on whether the stack trace functionality is enabled is an implementation detail up to the implementer, as far as I can tell.The
Error()function for each of these types does not change at all: it just returns the specified message, as before.If nothing in the program ever passes an error to
errors.StackTracethen the stack trace collection is just some wasted time collecting a stack trace nobody will refer to. But a program that does want to capture a stack trace, for example to send it to Sentry as described above, can try to pass the error toerrors.StackTraceand make use of the result whenever it’s non-nil.In the above I’ve assumed there’s a material cost to capturing a stack trace and thus it would be too expensive to enable by default, but I’ve not actually measured that. If capturing a stack trace is relatively cheap then it would be nicer to not require the extra environment variable at runtime, but I don’t know exactly how the Go runtime tracks stack traces internally.
The general idea of this resonates with me, but it feels strange to me that this would be a runtime-changeable setting rather than a compile-time-changeable setting. In my application it would mean that setting this environment variable would expose the detail of which end-user-facing error messages are being derived from
error.Error()results and which are hand-written specifically for our application.I tend to bristle a bit each time the Go runtime gains another way to globally change some runtime/library behavior out from under my application, since that adds yet another dimension of variation to consider when someone reports some unexpected behavior, but I suppose I am more sensitive to that than most Go application developers given that I use Go to develop CLI tools for which it should be irrelevant to the user what language they are written in. Probably it’s more common to use Go to write server processes that are designed to run only in a particular execution environment and where everybody knows well that the program is written in Go.
This is honestly a more broad piece of feedback than just on this proposal, and so I should probably record it somewhere else 🤔 but I would enjoy some way to compile an application such that it does not respond to
GODEBUGfeature-switching at all, but have a way to explicitly turn on some of these features programmatically based on mechanisms that make more sense for my application, such as the application’s configuration file. Then I would expose only the ones I know about and can see a reason for a user to turn on, and exclude any that have questionable value to my application.Since I’ve already contributed a large number of words to this issue across two comments I’m going to extend this comment to include a response to Ian’s later comment about it being an advantage to extend the error string, even though that makes this response appear out of order, just because it’s an extension of my existing point above.
I consider it to be a disadvantage that it would affect the behavior of existing programs that have not been modified to request or expect stack traces, since some error messages are written for users and turning this on would significantly change those error messages.
Even though it’s opt-in for a user, there are some applications (such as my own) where the user is not the person that ought to be making this decision, or if it is the user making the decision then it would be better to make it in a way that is more natural for the application.
Changing the return value of
Error()will break anything doing manual string matching. It’s bad and you shouldn’t do that, but some programs do it anyway, especially when you’re introspecting a third party error you don’t control. Adding a new method seems much less likely to create unexpected breakages.I want to emphasize that the goal of this proposal is to permit developers to easily enable a way for errors to include stack traces. In particular, developers can ask users to do this when running on their own systems with their own data. Ideas that rely on rebuilding the program should be a different proposal. Ideas that rely on using a different set of calls, or a different set of packages, to get a stack trace should be a different proposal. Thanks.
Errors are rarely errors, mostly values, i mean nothing exceptional. It would be overkill to add a trace to all errors, like io.EOF, sql.ErrNoRows, custom errors/values… It’s why I like the idea of
%wwhere we decide at each step in the code itself if we need to keep something or not and not globally.@mitar Let me restate your suggestion. We keep the
Errormethod as is, but change the fmt package (other thanfmt.Errorf) so that in any case where we call theErrormethod, we also check for and call aStacktracemethod, and, if that method is not empty we call it and also print the stack trace. We changefmt.Errorfso that if it wraps an error it does something intelligent with any stack trace. Does that sound right?@neild This proposal suggests an optional feature that we expect to only be enabled while debugging. As such, I am thinking it should capture a complete stack trace.
@flibustenet @StevenACoffman
I agree, the full stack trace is really useful only with the first error in the stack. And one may not always want to include a frame at all.
On another note: I do not think the frame should ever be in the error message…(except for ugly patches 😃)
what if for example I want to share the error with an external service, to obfuscate the frame I would need to manipulate the string (not very nice)
also, good compatibility with previous versions is a big strength of go, therefore the error string should remain unchanged to avoid a lot of refactoring in a lot of projects, and other reasons I guess…(this was already mentioned somewhere in this chat I think).
if anything, changing the error message could be an optional thing, to avoid said refactoring. I don’t see much value in this. I think It should be the logging library that outputs the informations I want/need (i.e.: frame,…).
Personally my favourite decisions would be:
My favourite questions would be:
hi
To sum up what I understood the objectives are from the conversations:
hopefully I got it more or less right 😅
My opinions:
Possible solutions:
As I experimented here https://github.com/golang-exp/errors-poc, configuration can happen at runtime with an
ErrorLevelakin to aLogLevelusing methods similar tolog/slog.var String = func(err error) stringand this can be picked up by fmt, logger,… when handling an error (??)log/slogwould allow a simple line or two in the main function to have the same outcome of theGODEBUG=errorstacktrace=1environment variable but with this configuration being dynamic (see here)`var ErrFoo = errors.New(“blah blah blah”)
func Bar() error { return fmt.NewErrorFormatter(LEVEL_NO_STACK).Errorf(“%w”, ErrFoo) }`
Opaque I think I have seen the Opaque function being implemented in the future
errors(or maybe I have seen it somewhere else); that’s nice.Pop It would be nice to have a
Popfunction extracting the last error struct{} whatever that may be(string, programCounter,…)ErrorLevelVar with offsets: moreover different
errorerorerrorformatterinstances could be initialized with an offset, this is to allow a singleErrorLevelVarto configure the defaults as well as other instances. example:errorerwith default error level (i.e LEVEL_NO_STACK = 0),thus behaving as current errorserrorerinstance here has an offset +1 from the default, thus it’s generating stacktraces when `ErrorLevelVar is 0 or moreerrorerinstance here has an offset +2 from the default, thus it’s generating stacktraces whenErrorLevelVaris -1 or morePS:
This proposal is to add stack traces to errors.
The Go 1.13 errors proposal was to capture a single stack frame, and allow error wrapping to identify the trace of locations producing the error. This approach reduces the size of error values and can represent a trace that passes across goroutine boundaries, but produces less information in the simplest case.
I think any proposal to add stack information to errors needs to consider full stacks vs. frames, and present an argument for which approach is used.