linkerd2: Server Side Streaming gRPC Proper Status Code Not Making it to Client

Bug Report

What is the issue?

Hello! I am working with a server side streaming gRPC application, and in accordance to their best practices, involving long-lived rpc’s, I have given the server a max connection age. Once this age is reached, the server will sever ties with the client, and the client side’s onError will be called with the status code UNAVAILABLE. gRPC defines the status code as:

* The service is currently unavailable.  This is a most likely a
* transient condition and may be corrected by retrying with
* a backoff. Note that it is not always safe to retry
* non-idempotent operations.

Once the client receives this status code, it will reconnect to the server and continue. I am able to get this to work with linkerd disabled. However, when linkerd is enabled, I get a different status code. Rather than getting UNAVAILABLE, I get the following code: INTERNAL. (INTERNAL ERROR a.k.a. not good)

How can it be reproduced?

Here is a simple reproduction I threw together: https://github.com/byblakeorriver/SimpleGrpcLinkerd

Logs, error output, etc

Here are the logs from the linkerd-proxy. This is the whole logs. About 80ish seconds in is where I get the INTERNAL instead of UNAVAILABLE. https://gist.github.com/byblakeorriver/e07d45b7d2dc705bfe96fe6df97b994d

linkerd check output

I don’t have permissions right now to do this. If I ask nicely I can probably get it.

Environment

  • Kubernetes Version: 1.11.8
  • Cluster Environment: (GKE, AKS, kops, …) ???
  • Host OS: Linux
  • Linkerd version: edge-20.3.4

Possible solution

Additional context

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 18 (14 by maintainers)

Most upvoted comments

@zaharidichev I probably won’t be able to try today, but as soon as I can I will. Thanks again!

@byblakeorriver Huge thanks for the repro! This should give us enough to track this down.