linkerd2: Received http2 header with status: 502

Bug Report

What is the issue?

Our application shows a couple of errors that seem to be related to linkerd-proxy( without linkerd injection these errors don’t happen):

      "code": 4,
      "metadata": {
        "_internal_repr": {}
      },
      "details": "Deadline Exceeded"
    }

followed by

err: {
      "code": 1,
      "metadata": {
        "_internal_repr": {
          ":status": [
            "502"
          ],
          "content-length": [
            "0"
          ],
          "date": [
            "Tue, 11 Jun 2019 13:01:30 GMT"
          ]
        }
      },
      "details": "Received http2 header with status: 502"
    }

On linkerd-proxy pods we can see the following errors:

DBUG [   388.710739s] proxy={client=in dst=10.24.26.128:9090 proto=Http2} linkerd2_proxy::proxy::http::h2 http2 conn error: http2 error: protocol error: unspecific protocol error detected
ERR! [   388.710816s] proxy={server=in listen=0.0.0.0:4143 remote=10.24.23.122:35662} linkerd2_proxy::app::errors unexpected error: http2 error: protocol error: unspecific protocol error detected

Based on the logs it seems like service A sends a request to service B via linkerd-proxy, service B processes the request successfully (both service B logs and linkerd-proxy of service B show no errors and data is written further downstream in the database). However this is where the issue is reported - linkerd-proxy of service A instead of reporting the request successfully it shows and the linkerd2_proxy::app::errors unexpected error: http2 error: protocol error: unspecific protocol error detected error and the service A logs show the above application errors.

How can it be reproduced?

Not 100% sure - we were seeing this issue when we started using linkerd but after we applied the fix described in this ticket https://github.com/linkerd/linkerd2/issues/2813#issuecomment-496641996 we stopped seeing this issue until today. I’ve attempted upgrading linkerd with --disable-h2-upgrade and I can still see the issue.

Logs, error output, etc

(If the output is long, please create a gist and paste the link here.)

linkerd check output

--------------
√ can initialize the client
√ can query the Kubernetes API

kubernetes-version
------------------
√ is running the minimum Kubernetes API version
√ is running the minimum kubectl version

linkerd-existence
-----------------
√ control plane namespace exists
√ controller pod is running
√ can initialize the client
√ can query the control plane API

linkerd-api
-----------
√ control plane pods are ready
√ control plane self-check
√ [kubernetes] control plane can talk to Kubernetes
√ [prometheus] control plane can talk to Prometheus
√ no invalid service profiles

linkerd-version
---------------
√ can determine the latest version
√ cli is up-to-date

control-plane-version
---------------------
√ control plane is up-to-date
√ control plane and cli versions match

Status check results are √

Environment

  • Kubernetes Version:v1.14.1
  • Cluster Environment: (GKE, AKS, kops, …) GKE v1.12.8-gke.6
  • Host OS:
  • Linkerd version: tested with edge-19.6.1, stable-2.3.2, and --proxy-version=fix-2863-0 (which contained the memory leak fix)

Possible solution

Not a solution but internally we think that linkerd is adding the status field wrapped in : and grpc doesn’t like that.

Additional context

There isn’t a pattern or much consistency with this - every time we run the job the error is returned at the different point in the flow.

This seems to be a very similar issue to what is described in https://github.com/linkerd/linkerd2/issues/2801

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 21 (12 by maintainers)

Most upvoted comments

@siggy @seanmonstar @olix0r I am super pleased to confirm that the changes you guys have rolled out have fixed the 502 issues. I’ve solved our internal issue and have managed to test multiple times with edge-19.7.1 and once with edge-19.7.2. Thank you for your assistance 🙏