kubernetes: Apiserver should map etcd errors to proper response status

What happened?

Etcdserver exposes a myriad of error codes, see - https://github.com/kubernetes/kubernetes/blob/87721fb71075582a1ee1e89b6a3391813e411a1c/vendor/go.etcd.io/etcd/server/v3/etcdserver/errors.go#L23-L47

While some of those may be non-retriable errors for the k8s client, such as:

ErrRequestTooLarge               = errors.New("etcdserver: request is too large")
ErrNoSpace                       = errors.New("etcdserver: no space")
ErrKeyNotFound                   = errors.New("etcdserver: key not found")
ErrCorrupt                       = errors.New("etcdserver: corrupt cluster")

Many of them are retryable iiuc:

ErrTimeoutDueToLeaderFail        = errors.New("etcdserver: request timed out, possibly due to previous leader failure")
ErrTimeoutDueToConnectionLost    = errors.New("etcdserver: request timed out, possibly due to connection lost")
ErrTimeoutLeaderTransfer         = errors.New("etcdserver: request timed out, leader transfer took too long")
ErrTimeoutWaitAppliedIndex       = errors.New("etcdserver: request timed out, waiting for the applied index took too long")
ErrTooManyRequests               = errors.New("etcdserver: too many requests")

However, the k8s apiserver seems to always map all etcd errors to http.StatusInternalServerError (code 500) and does not set any retry spec:

https://github.com/kubernetes/kubernetes/blob/41cef06f66dd4d7e87ff852dc228ab69587be9de/staging/src/k8s.io/apiserver/pkg/endpoints/handlers/get.go#L72-L76

https://github.com/kubernetes/kubernetes/blob/bdebc62d49293a0fbbd7e0d95bfd94b1ce21015c/staging/src/k8s.io/apiserver/pkg/endpoints/handlers/rest.go#L113-L115

https://github.com/kubernetes/kubernetes/blob/67d75db8905f16bb0d9d0a14b13a8736cb614533/staging/src/k8s.io/apiserver/pkg/endpoints/handlers/responsewriters/status.go#L60-L81

What did you expect to happen?

Apiserver should look at the error returned by etcd-server and in turn compute:

  • response code it should return back to the k8s client (for e.g 429 in the case of etcdserver: too many requests)
  • retryAfter details to tell the client that certain errors are retryable (for e.g etcdserver: request timed out, possibly due to connection lost)

How can we reproduce it (as minimally and precisely as possible)?

Any error returned by etcdserver should show up as http 500 in the apiserver audit logs, where the error string from etcd is returned as is back to the k8s client.

/cc @lavalamp @deads2k @liggitt /sig api-machinery

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Reactions: 1
  • Comments: 37 (35 by maintainers)

Most upvoted comments

Catching up on that - just some quick thoughts:

  1. I agree with Daniel that if we treat this issue as a generic one (as it was filed) we don’t have good enough understanding of urgency (I’m not aware of any significant complaints about it from GKE side as an example) and we didn’t explore the problem well enough to agree on something larger.

  2. That said, for some individual errors, I can definitely buy confusions. The last example is a good one. If the problem is that the object was too large to be committed, I agree it;s somewhat aesthetic what exact error code we return, but I actually believe that we should return the same thing independently of the code path (where exactly we caught the error).

I can definitely imagine people building some monitoring/alerting based on some specific errors and if the build that on 413 RequestEntityTooLarge and now they find they can hit the same client-side problem (too large object) and get a different response - that’s actually super confusing. So from that angle - I think we should try unifying this - I would be happy to help with reviewing improvements to it.

[Other individual examples - like 429s are much less obvious to me, although I agree that defrags are problematic. But I’m not sure that proper translating to 429s would make a difference here - we need to analyze it a bit more.

  1. I actually like Daniel`s proposal of obscuring etcd messages: https://github.com/kubernetes/kubernetes/issues/112152#issuecomment-1234735630 I agree those are sometimes confusing to people and exposing less internal details to end clients can reduce confusion…