envoy: Add retry policies for gRPC error codes
From the documentation on retries, the x-envoy-retry-on
header can be configured for handling HTTP
status codes like 5XX
and 4XX
. This works great for HTTP
services. However, with gRPC
all HTTP
status codes (if the server is running properly) will return a 200 OK
. The actual error code is found within the gRPC
response.
Would it be feasible to create a retry policy for a x-envoy-retry-grpc-on
header that respects a list of gRPC
error codes? There are a few codes that could be deemed as retriable CANCELLED
, DEADLINE_EXCEEDED
, RESOURCE_EXHAUSTED
but I am hesitant to make assumptions about implementation details within a service by grouping them together. Which is why I think a list may work best. I’m open to ideas here.
Sample Header
x-envoy-retry-grpc-on: CANCELLED, DEADLINE_EXCEEDED, RESOURCE_EXHAUSTED
The existing retry header would then be configured as a fallback for when a service is unreachable and the HTTP
status codes have more meaning.
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 22 (18 by maintainers)
Probably worth thinking through how this interacts with https://github.com/grpc/proposal/blob/master/A6-client-retries.md.
@markroth probably has some thoughts.
On Fri, Apr 7, 2017, 7:52 PM Feng Li notifications@github.com wrote: