envoy: Add retry policies for gRPC error codes

From the documentation on retries, the x-envoy-retry-on header can be configured for handling HTTP status codes like 5XX and 4XX. This works great for HTTP services. However, with gRPC all HTTP status codes (if the server is running properly) will return a 200 OK. The actual error code is found within the gRPC response.

Would it be feasible to create a retry policy for a x-envoy-retry-grpc-on header that respects a list of gRPC error codes? There are a few codes that could be deemed as retriable CANCELLED, DEADLINE_EXCEEDED, RESOURCE_EXHAUSTED but I am hesitant to make assumptions about implementation details within a service by grouping them together. Which is why I think a list may work best. I’m open to ideas here.

Sample Header

x-envoy-retry-grpc-on: CANCELLED, DEADLINE_EXCEEDED, RESOURCE_EXHAUSTED

The existing retry header would then be configured as a fallback for when a service is unreachable and the HTTP status codes have more meaning.

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 22 (18 by maintainers)

Most upvoted comments

Probably worth thinking through how this interacts with https://github.com/grpc/proposal/blob/master/A6-client-retries.md.

@markroth probably has some thoughts.

On Fri, Apr 7, 2017, 7:52 PM Feng Li notifications@github.com wrote:

Make sense to me, also need to clarify whether these headers will be eliminated by envoy or will be propragated to next hop in a multiple layers proxying case. A per URL config in envoy is also a good complementary.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lyft/envoy/issues/721#issuecomment-292689987, or mute the thread https://github.com/notifications/unsubscribe-auth/AJpudT5QhNOSo3rue3mesvx4MvAm7d3Zks5rtvaKgaJpZM4M3YPS .