envoy: Add retry policies for gRPC error codes
From the documentation on retries, the x-envoy-retry-on header can be configured for handling HTTP status codes like 5XX and 4XX. This works great for HTTP services. However, with gRPC all HTTP status codes (if the server is running properly) will return a 200 OK. The actual error code is found within the gRPC response.
Would it be feasible to create a retry policy for a x-envoy-retry-grpc-on header that respects a list of gRPC error codes? There are a few codes that could be deemed as retriable CANCELLED, DEADLINE_EXCEEDED, RESOURCE_EXHAUSTED but I am hesitant to make assumptions about implementation details within a service by grouping them together. Which is why I think a list may work best. I’m open to ideas here.
Sample Header
x-envoy-retry-grpc-on: CANCELLED, DEADLINE_EXCEEDED, RESOURCE_EXHAUSTED
The existing retry header would then be configured as a fallback for when a service is unreachable and the HTTP status codes have more meaning.
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 22 (18 by maintainers)
Probably worth thinking through how this interacts with https://github.com/grpc/proposal/blob/master/A6-client-retries.md.
@markroth probably has some thoughts.
On Fri, Apr 7, 2017, 7:52 PM Feng Li notifications@github.com wrote: