linkerd2: Server Side Streaming gRPC Proper Status Code Not Making it to Client
Bug Report
What is the issue?
Hello! I am working with a server side streaming gRPC application, and in accordance to their best practices, involving long-lived rpc’s, I have given the server a max connection age. Once this age is reached, the server will sever ties with the client, and the client side’s onError will be called with the status code UNAVAILABLE. gRPC defines the status code as:
* The service is currently unavailable. This is a most likely a
* transient condition and may be corrected by retrying with
* a backoff. Note that it is not always safe to retry
* non-idempotent operations.
Once the client receives this status code, it will reconnect to the server and continue. I am able to get this to work with linkerd disabled. However, when linkerd is enabled, I get a different status code. Rather than getting UNAVAILABLE, I get the following code: INTERNAL. (INTERNAL ERROR a.k.a. not good)
How can it be reproduced?
Here is a simple reproduction I threw together: https://github.com/byblakeorriver/SimpleGrpcLinkerd
Logs, error output, etc
Here are the logs from the linkerd-proxy. This is the whole logs. About 80ish seconds in is where I get the INTERNAL instead of UNAVAILABLE. https://gist.github.com/byblakeorriver/e07d45b7d2dc705bfe96fe6df97b994d
linkerd check
output
I don’t have permissions right now to do this. If I ask nicely I can probably get it.
Environment
- Kubernetes Version: 1.11.8
- Cluster Environment: (GKE, AKS, kops, …) ???
- Host OS: Linux
- Linkerd version: edge-20.3.4
Possible solution
Additional context
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 18 (14 by maintainers)
@zaharidichev I probably won’t be able to try today, but as soon as I can I will. Thanks again!
@byblakeorriver Huge thanks for the repro! This should give us enough to track this down.