linkerd2: Intermittent 502 status code
Bug Report
What is the issue?
We have an restful application, where the client receives intermittent 502 status codes, but the application itself logs a 201. If we disable linkerd2 we are unable to reproduce this issue.
The basic traffic flow is as follows (all supposedly HTTP1.1): Client -> Ambassador(envoy) -(via linkerd2)-> App (see additional context for diagram)
This only happens for a very particular route, which also calls a cluster external http-service and for no other routes so far!
How can it be reproduced?
We tried to reproduce this with an artificial setup, setting up the ingress with ambassador and using httpbin as application both using linkerd2 as service-mesh. However this was unsuccessful and we were unable to reproduce this outside our production deployments or with other routes.
Logs, error output, etc
In the linkerd sidecar attached to ambassador the following error pops up, whenever the route fails:
[figo-ambassador-586c797dc-p9pt8 linkerd-proxy] WARN [ 1861.009733s] proxy={server=out listen=127.0.0.1:4140 remote=10.7.73.113:44428} linkerd2_proxy::proxy::http::orig_proto unknown l5d-orig-proto header value: "-"
[figo-ambassador-586c797dc-p9pt8 linkerd-proxy] WARN [ 1861.009760s] proxy={server=out listen=127.0.0.1:4140 remote=10.7.73.113:44428} hyper::proto::h1::role response with HTTP2 version coerced to HTTP/1.1
[figo-ambassador-586c797dc-p9pt8 linkerd-proxy] ERR! [ 1864.515657s] proxy={server=out listen=127.0.0.1:4140 remote=10.7.73.113:44428} linkerd2_proxy::app::errors unexpected error: http2 general error: protocol error: unspecific protocol error detected
[figo-ambassador-586c797dc-7s9x6 linkerd-proxy] ERR! [ 1833.975088s] proxy={server=out listen=127.0.0.1:4140 remote=10.7.69.131:57912} linkerd2_proxy::app::errors unexpected error: http2 general error: protocol error: unspecific protocol error detected
(The warnings were caused by a previous successful call)
We increase the log level via config.linkerd.io/proxy-log-level: trace
https://gist.github.com/trevex/ca0791aad3402137ed551b251970d329
linkerd check
output
kubernetes-api
--------------
√ can initialize the client
√ can query the Kubernetes API
kubernetes-version
------------------
√ is running the minimum Kubernetes API version
√ is running the minimum kubectl version
linkerd-existence
-----------------
√ control plane namespace exists
√ controller pod is running
√ can initialize the client
√ can query the control plane API
linkerd-api
-----------
√ control plane pods are ready
√ control plane self-check
√ [kubernetes] control plane can talk to Kubernetes
√ [prometheus] control plane can talk to Prometheus
√ no invalid service profiles
linkerd-version
---------------
√ can determine the latest version
√ cli is up-to-date
control-plane-version
---------------------
√ control plane is up-to-date
√ control plane and cli versions match
Status check results are √
Environment
- Kubernetes Version: 1.14.1
- Cluster Environment: bare-metal
- Host OS: ContainerLinux
- Linkerd version: stable-2.3.0
- CNI: Cilium 1.4.4
- DNS: CoreDNS 1.5.0
Possible solution
Additional context
Diagram from Slack: https://files.slack.com/files-pri/T0JV2DX9R-FJA61H9CH/ambassador-linkerd2.png
Please let me know if I can provide more information 😃
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 42 (31 by maintainers)
@olix0r we’re seeing the same issue - I’ve opened a ticket as I wasn’t sure how they are triaged. We’ve just tested with 2.3.2 and still seeing the same issues.
Yeah, didn’t mean to imply the problem was on the Ambassador side of that proxy — I just meant the proxy injected into the Ambassador pod (which is the only one we have logs for thus far)