linkerd2: invalid HTTP method parsed
Bug Report
What is the issue?
After upgrading Linkerd to stable-2.10.1
some proxies (not all) are logging following lines:
{"timestamp":"[ 517.977900s]","level":"INFO","fields":{"message":"Connection closed","error":"invalid HTTP method parsed"},"target":"linkerd_app_core::serve","spans":[{"name":"daemon"},{"listen.addr":"0.0.0.0:4191","name":"admin"},{"client.addr":"10.0.5.162:51408","name":"accept"}],"threadId":"ThreadId(2)"}
About 2 messages per second.
How can it be reproduced?
Not sure how to reproduce, in my case at least 2 clusters were affected by this, haven’t upgraded prod cluster.
Logs, error output, etc
I enabled debug logs as well then the output is following:
{"timestamp":"[ 517.977546s]","level":"DEBUG","fields":{"message":"Peeked bytes from TCP stream","sz":289},"target":"linkerd_tls::server","spans":[{"name":"daemon"},{"listen.addr":"0.0.0.0:4191","name":"admin"},{"client.addr":"10.0.5.162:51408","name":"accept"}],"threadId":"ThreadId(2)"}
{"timestamp":"[ 517.977605s]","level":"DEBUG","fields":{"message":"Identified foreign SNI","sni":"..."},"target":"linkerd_tls::server","spans":[{"name":"daemon"},{"listen.addr":"0.0.0.0:4191","name":"admin"},{"client.addr":"10.0.5.162:51408","name":"accept"}],"threadId":"ThreadId(2)"}
{"timestamp":"[ 517.977643s]","level":"DEBUG","fields":{"message":"DetectResult","protocol":"None","elapsed":"7.9µs"},"target":"linkerd_detect","spans":[{"name":"daemon"},{"listen.addr":"0.0.0.0:4191","name":"admin"},{"client.addr":"10.0.5.162:51408","name":"accept"}],"threadId":"ThreadId(2)"}
{"timestamp":"[ 517.977663s]","level":"DEBUG","fields":{"message":"Failed to parse HTTP request; handling as HTTP/1"},"target":"linkerd_app::admin","spans":[{"name":"daemon"},{"listen.addr":"0.0.0.0:4191","name":"admin"},{"client.addr":"10.0.5.162:51408","name":"accept"}],"threadId":"ThreadId(2)"}
{"timestamp":"[ 517.977687s]","level":"DEBUG","fields":{"message":"Creating HTTP service","version":"Http1"},"target":"linkerd_proxy_http::server","spans":[{"name":"daemon"},{"listen.addr":"0.0.0.0:4191","name":"admin"},{"client.addr":"10.0.5.162:51408","name":"accept"}],"threadId":"ThreadId(2)"}
{"timestamp":"[ 517.977717s]","level":"DEBUG","fields":{"message":"Handling as HTTP","version":"Http1"},"target":"linkerd_proxy_http::server","spans":[{"name":"daemon"},{"listen.addr":"0.0.0.0:4191","name":"admin"},{"client.addr":"10.0.5.162:51408","name":"accept"}],"threadId":"ThreadId(2)"}
{"timestamp":"[ 517.977831s]","level":"DEBUG","fields":{"message":"The client is shutting down the connection","res":"Err(hyper::Error(Parse(Method)))"},"target":"linkerd_proxy_http::server","spans":[{"name":"daemon"},{"listen.addr":"0.0.0.0:4191","name":"admin"},{"client.addr":"10.0.5.162:51408","name":"accept"}],"threadId":"ThreadId(2)"}
{"timestamp":"[ 517.977900s]","level":"INFO","fields":{"message":"Connection closed","error":"invalid HTTP method parsed"},"target":"linkerd_app_core::serve","spans":[{"name":"daemon"},{"listen.addr":"0.0.0.0:4191","name":"admin"},{"client.addr":"10.0.5.162:51408","name":"accept"}],"threadId":"ThreadId(2)"}
IP 10.0.5.162
belongs to linkerd-viz.prometheus
.
linkerd check
output
kubernetes-api
--------------
√ can initialize the client
√ can query the Kubernetes API
kubernetes-version
------------------
√ is running the minimum Kubernetes API version
√ is running the minimum kubectl version
linkerd-existence
-----------------
√ 'linkerd-config' config map exists
√ heartbeat ServiceAccount exist
√ control plane replica sets are ready
√ no unschedulable pods
√ controller pod is running
linkerd-config
--------------
√ control plane Namespace exists
√ control plane ClusterRoles exist
√ control plane ClusterRoleBindings exist
√ control plane ServiceAccounts exist
√ control plane CustomResourceDefinitions exist
√ control plane MutatingWebhookConfigurations exist
√ control plane ValidatingWebhookConfigurations exist
√ control plane PodSecurityPolicies exist
linkerd-identity
----------------
√ certificate config is valid
√ trust anchors are using supported crypto algorithm
√ trust anchors are within their validity period
√ trust anchors are valid for at least 60 days
√ issuer cert is using supported crypto algorithm
√ issuer cert is within its validity period
√ issuer cert is valid for at least 60 days
√ issuer cert is issued by the trust anchor
linkerd-webhooks-and-apisvc-tls
-------------------------------
√ proxy-injector webhook has valid cert
√ proxy-injector cert is valid for at least 60 days
√ sp-validator webhook has valid cert
√ sp-validator cert is valid for at least 60 days
linkerd-api
-----------
√ control plane pods are ready
√ can initialize the client
√ can query the control plane API
linkerd-version
---------------
√ can determine the latest version
√ cli is up-to-date
control-plane-version
---------------------
√ control plane is up-to-date
√ control plane and cli versions match
Status check results are √
Linkerd extensions checks
=========================
linkerd-viz
-----------
√ linkerd-viz Namespace exists
√ linkerd-viz ClusterRoles exist
√ linkerd-viz ClusterRoleBindings exist
√ tap API server has valid cert
√ tap API server cert is valid for at least 60 days
√ tap API service is running
√ linkerd-viz pods are injected
√ viz extension pods are running
√ prometheus is installed and configured correctly
√ can initialize the client
√ viz extension self-check
Status check results are √
Environment
- Kubernetes Version: v1.18.9-eks-d1db3c
- Cluster Environment: AWS EKS
- Host OS: Ubuntu 20.04.1 LTS kernel: 5.4.0-1029-aws
- Linkerd version: 2.10.1
Possible solution
Additional context
I understand this is harmless when looking debug log. But it will create a lot of additional log and affect the logging collectors/storage and etc.
When I disable linker-proxy for linkerd-viz.prometheus
pods then there are no such messages, also when I add following annotation:
config.linkerd.io/skip-outbound-ports: "4191"
It looks like proxy is affecting somehow outbound connections. This might explain this issue as well.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 20 (14 by maintainers)
@lupusllc Thanks for adding in here. This issue should be fixed in the edge that was released yesterday
edge-21.4.5
. This will also be available in astable-2.10.2
at some point soon. I’ll wait to close this issue until that is released.For those who have been involved in this issue, this was fixed in tower-rs/tower#581 which is a proxy dependency. Reconnects—which are the background tasks in this context—were not being aborted properly, so they would get stuck in an infinite reconnect loop.
This most often happened with Viz’s Prometheus and resulted in the errors a lot of you have seen. What would happen is a pod
10.0.0.1
would be scraped by Prometheus at10.0.0.1:4191
with some SNIfoo
. After some point, that pod is terminated but Prometheus’s proxy keeps trying to reconnect to10.0.0.1:4191
. Eventually, a new pod takes that IP and all of a sudden the Prometheus’s proxy starts sending requests to the new pod at10.0.0.1:4191
withfoo
SNI. But since this is a new pod, it may havebar
identity. This results in the error.This issue is not specific to Prometheus’s proxy so we’re hoping this also cleans up a lot of the fail fast errors in general.
@olix0r I deployed
edge-21.4.5
in our dev env this morning (previously runningstable-2.10.1
, and the log messages have disappeared. Went from ~10K every 15m to under 200.Hi, we have the same issue after upgrading to linkerd 2.10.1 in our testing environments. Logs have many “Connection closed error=invalid HTTP method parsed” errors. Pods seem to be randomly affected, but in general the issue is reproducible, when I restart some deployment’s pods couple of times. I enabled tracing configuration for one of the deployments and this is what I got from logs:
I hope it helps to troubleshoot, thanks.