kubernetes: Propogated errors missing context

Problem

Our code is riddled with snippets that look like this:

if err != nil {
  return nil, err
}

These lines bubble errors up the call stack until someone decides to log them and move on:

if err != nil {
  glog.Errorf("Something failed: %v", err)
}

Which can yield an unhelpful error message like (fake, but not totally unreasonable):

E0426 01:02:59.506727    2740 manager.go:110] Something failed: unexpected EOF

And now we’re tasked with trying to crawl through the code to figure out where that error actually came from.

We discourage logging an error that’s also returned (duplicate log statements interwoven throughout the code can also be confusing), but if you (the person debugging) is lucky someone has attached a unique context message that can help you find the source (e.g. Operation foo called on pod bar (441234) failed: %v).

Assumptions

  • Errors are relatively uncommon - i.e. this doesn’t apply to “not found” errors that should probably be bools, or errors which can be generated in a tight loop
  • Errors are ephemeral - that is, they are handled (e.g. logged) and then discarded (not stored anywhere)
  • When an error is produced, it is not always possible to know how it will be handled, or even how severe it is

Proposal

From the above assumptions, I get 2 guiding principles:

  1. If returning an error, it is your job to provide the maximum amount of debugging information and context around that error.
  2. If handling an error, it is your job to use the appropriate context for the specific case (e.g. a critical failure should have more detail than something that is expected to occassionally fail and will be retried).

To this end, I’d like to create a kubernetes error package that we promote everywhere. The constructor could look something like (rough sketch, not final API):

func New(msg string, context ...Context) Error { ... } // Context TBD
func Wrap(msg string, err error, context ...Context) Error { ... }

In addition to the provided message, error, and context the constructors could automatically attach additional metadata such as: a full stack trace, timestamp, etc.

We would also provide formatting functions that could flatten the error objects and format them to various levels of detail for logging. Finally, to get the maximum benefit, we would want to discourage the pattern of returning a raw error without first wrapping it with extra context.

For implementation, there are already several projects which provide similar functionality:

X-ref: https://github.com/kubernetes/kubernetes/issues/6461, https://github.com/kubernetes/kubernetes/issues/23338 - This proposal is independent of which log library we use, we just need to provide functions to log the appropriate context in whatever format.

/cc @thockin @timothysc @pwittrock @bgrant0607

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 23 (16 by maintainers)

Most upvoted comments

I think we would get a lot of mileage from

  1. declaring a standard error-utils package
  2. writing down “this is how to handle errors, when to return directly, when to wrap-and-return, and when to log and discard”
  3. converting large swath of code to the new technique

Adding key-value lists and other stuff is secondary, and might make more sense if/when we have a (related but different) logging overhaul).

I do think that many hands make light work, and if we converted one substantial package (e.g. kubelet) to use the new style and then declared a bounty or a fixit or something, we could get a good fraction of the codebase converted.

On Fri, May 13, 2016 at 10:43 AM, Tim St. Clair notifications@github.com wrote:

I agree that errors need to be programmatically inspectable (which is why it’s important to distinguish errors and logs), but I think that could be accomplished if we enforced a standard error package across the entire project. As the slide deck linked above mentions, most of the uses for inspecting errors I’ve seen could have been better served using a different signal (e.g. return bool instead of NotFound error). Note that this logic doesn’t apply to library code intended to be reused outside of k8s (like the clients), since we cannot expect other projects to agree to our non-standard errors.

pkg/errors looks nice, but as @bgrant0607 https://github.com/bgrant0607 mentioned above adding a bit of context in addition to the message could go along way. What about creating a wrapper library around that package which adds the ability to attach arbitrary key:value (string: interface{}) pairs? That could also help with the error-inspection (e.g. if errors.GetString(err, “foo”) == “bar” …)

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/kubernetes/kubernetes/issues/25521#issuecomment-219111248