linkerd2: Intermittent 502 status code

Bug Report

What is the issue?

We have an restful application, where the client receives intermittent 502 status codes, but the application itself logs a 201. If we disable linkerd2 we are unable to reproduce this issue.

The basic traffic flow is as follows (all supposedly HTTP1.1): Client -> Ambassador(envoy) -(via linkerd2)-> App (see additional context for diagram)

This only happens for a very particular route, which also calls a cluster external http-service and for no other routes so far!

How can it be reproduced?

We tried to reproduce this with an artificial setup, setting up the ingress with ambassador and using httpbin as application both using linkerd2 as service-mesh. However this was unsuccessful and we were unable to reproduce this outside our production deployments or with other routes.

Logs, error output, etc

In the linkerd sidecar attached to ambassador the following error pops up, whenever the route fails:

[figo-ambassador-586c797dc-p9pt8 linkerd-proxy] WARN [  1861.009733s] proxy={server=out listen=127.0.0.1:4140 remote=10.7.73.113:44428} linkerd2_proxy::proxy::http::orig_proto unknown l5d-orig-proto header value: "-"
[figo-ambassador-586c797dc-p9pt8 linkerd-proxy] WARN [  1861.009760s] proxy={server=out listen=127.0.0.1:4140 remote=10.7.73.113:44428} hyper::proto::h1::role response with HTTP2 version coerced to HTTP/1.1
[figo-ambassador-586c797dc-p9pt8 linkerd-proxy] ERR! [  1864.515657s] proxy={server=out listen=127.0.0.1:4140 remote=10.7.73.113:44428} linkerd2_proxy::app::errors unexpected error: http2 general error: protocol error: unspecific protocol error detected
[figo-ambassador-586c797dc-7s9x6 linkerd-proxy] ERR! [  1833.975088s] proxy={server=out listen=127.0.0.1:4140 remote=10.7.69.131:57912} linkerd2_proxy::app::errors unexpected error: http2 general error: protocol error: unspecific protocol error detected

(The warnings were caused by a previous successful call)

We increase the log level via config.linkerd.io/proxy-log-level: trace https://gist.github.com/trevex/ca0791aad3402137ed551b251970d329

linkerd check output

kubernetes-api
--------------
√ can initialize the client
√ can query the Kubernetes API

kubernetes-version
------------------
√ is running the minimum Kubernetes API version
√ is running the minimum kubectl version

linkerd-existence
-----------------
√ control plane namespace exists
√ controller pod is running
√ can initialize the client
√ can query the control plane API

linkerd-api
-----------
√ control plane pods are ready
√ control plane self-check
√ [kubernetes] control plane can talk to Kubernetes
√ [prometheus] control plane can talk to Prometheus
√ no invalid service profiles

linkerd-version
---------------
√ can determine the latest version
√ cli is up-to-date

control-plane-version
---------------------
√ control plane is up-to-date
√ control plane and cli versions match

Status check results are √

Environment

  • Kubernetes Version: 1.14.1
  • Cluster Environment: bare-metal
  • Host OS: ContainerLinux
  • Linkerd version: stable-2.3.0
  • CNI: Cilium 1.4.4
  • DNS: CoreDNS 1.5.0

Possible solution

Additional context

Diagram from Slack: https://files.slack.com/files-pri/T0JV2DX9R-FJA61H9CH/ambassador-linkerd2.png

Please let me know if I can provide more information 😃

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 42 (31 by maintainers)

Most upvoted comments

@olix0r we’re seeing the same issue - I’ve opened a ticket as I wasn’t sure how they are triaged. We’ve just tested with 2.3.2 and still seeing the same issues.

Yeah, didn’t mean to imply the problem was on the Ambassador side of that proxy — I just meant the proxy injected into the Ambassador pod (which is the only one we have logs for thus far)