kubernetes: Propogated errors missing context
Problem
Our code is riddled with snippets that look like this:
if err != nil {
return nil, err
}
These lines bubble errors up the call stack until someone decides to log them and move on:
if err != nil {
glog.Errorf("Something failed: %v", err)
}
Which can yield an unhelpful error message like (fake, but not totally unreasonable):
E0426 01:02:59.506727 2740 manager.go:110] Something failed: unexpected EOF
And now we’re tasked with trying to crawl through the code to figure out where that error actually came from.
We discourage logging an error that’s also returned (duplicate log statements interwoven throughout the code can also be confusing), but if you (the person debugging) is lucky someone has attached a unique context message that can help you find the source (e.g. Operation foo called on pod bar (441234) failed: %v
).
Assumptions
- Errors are relatively uncommon - i.e. this doesn’t apply to “not found” errors that should probably be
bool
s, or errors which can be generated in a tight loop - Errors are ephemeral - that is, they are handled (e.g. logged) and then discarded (not stored anywhere)
- When an error is produced, it is not always possible to know how it will be handled, or even how severe it is
Proposal
From the above assumptions, I get 2 guiding principles:
- If returning an error, it is your job to provide the maximum amount of debugging information and context around that error.
- If handling an error, it is your job to use the appropriate context for the specific case (e.g. a critical failure should have more detail than something that is expected to occassionally fail and will be retried).
To this end, I’d like to create a kubernetes error package that we promote everywhere. The constructor could look something like (rough sketch, not final API):
func New(msg string, context ...Context) Error { ... } // Context TBD
func Wrap(msg string, err error, context ...Context) Error { ... }
In addition to the provided message, error, and context the constructors could automatically attach additional metadata such as: a full stack trace, timestamp, etc.
We would also provide formatting functions that could flatten the error objects and format them to various levels of detail for logging. Finally, to get the maximum benefit, we would want to discourage the pattern of returning a raw error without first wrapping it with extra context.
For implementation, there are already several projects which provide similar functionality:
X-ref: https://github.com/kubernetes/kubernetes/issues/6461, https://github.com/kubernetes/kubernetes/issues/23338 - This proposal is independent of which log library we use, we just need to provide functions to log the appropriate context in whatever format.
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 23 (16 by maintainers)
I think we would get a lot of mileage from
Adding key-value lists and other stuff is secondary, and might make more sense if/when we have a (related but different) logging overhaul).
I do think that many hands make light work, and if we converted one substantial package (e.g. kubelet) to use the new style and then declared a bounty or a fixit or something, we could get a good fraction of the codebase converted.
On Fri, May 13, 2016 at 10:43 AM, Tim St. Clair notifications@github.com wrote: