linkerd2: Linkerd 2.5.0: linkerd2_proxy::app::errors unexpected error: error trying to connect: No route to host (os error 113) (address: 10.10.3.181:8080)
Bug Report
What is the issue?
We have injected pod which is running for days connecting to a partially (1/3) injected deployment, which is eventually throwing the mentioned error.
How can it be reproduced?
Run a pod for days and let it talk to a deployment which is regularly restarted, which is causing new pods and new IP addresses etc.
Logs, error output, etc
linkerd-proxy ERR! [589538.085822s] linkerd2_proxy::app::errors unexpected error: error trying to connect: No route to host (os error 113) (address: 10.10.3.181:8080)
linkerd check
output
kubernetes-api
--------------
√ can initialize the client
√ can query the Kubernetes API
kubernetes-version
------------------
√ is running the minimum Kubernetes API version
√ is running the minimum kubectl version
linkerd-config
--------------
√ control plane Namespace exists
√ control plane ClusterRoles exist
√ control plane ClusterRoleBindings exist
√ control plane ServiceAccounts exist
√ control plane CustomResourceDefinitions exist
√ control plane MutatingWebhookConfigurations exist
√ control plane ValidatingWebhookConfigurations exist
√ control plane PodSecurityPolicies exist
linkerd-existence
-----------------
√ 'linkerd-config' config map exists
√ control plane replica sets are ready
√ no unschedulable pods
√ controller pod is running
√ can initialize the client
√ can query the control plane API
linkerd-api
-----------
√ control plane pods are ready
√ control plane self-check
√ [kubernetes] control plane can talk to Kubernetes
√ [prometheus] control plane can talk to Prometheus
√ no invalid service profiles
linkerd-version
---------------
√ can determine the latest version
√ cli is up-to-date
control-plane-version
---------------------
√ control plane is up-to-date
√ control plane and cli versions match
Status check results are √
Environment
- Kubernetes Version: 1.15.2
- Cluster Environment: custom
- Host OS: CoreOS 2191.5.0
- Linkerd version: 2.5.0
Possible solution
Additional context
To me, not knowing all details, it looks like that the proxy is not “refreshing” the endpoints for the service and eventually just runs out of ip addresses. For us it would be fine if the proxy would exit and let the pod get restarted.
Also: linkerd is pretty awesome, thanks for all your effort you put into it!
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 57 (32 by maintainers)
Newer versions of Linkerd (e.g.,
edge-20.3.4
) have been updated to handle service discovery differently. If you’re still experiencing these issues, I recommend annotating your workload withconfig.linkerd.io/proxy-version: edge-20.3.4
. If you test this, please report back! These changes will be released soon instable-2.7.1
.We are experiencing a similar issue, but we don’t have any
No route to host
in our logs.Running an nginx-ingress (That is not meshed) that forwards to a Kong 1.2.0 api-gateway (That is meshed). I have noticed getting 503 errors from some of the api-gateway pods when curling our other apis from that pod (Simple rest apis running scala applications). The other apis are reached successfully from each other (And from other api-gateway pods). When i get the 503 errors the proxy of the api-gateway pod logs the following:
WARN [1030275.972407s] linkerd2_proxy::app::errors request aborted because it reached the configured dispatch deadline
Might have been a fluke, but if when i added
Host: example.com
to the curl request originally getting 503’s, the request went through and i got 200’s. (And didn’t log the request aborted line) Haven’t been able to test this any further as the issue hasn’t happened since.Some info about environment:
Running on AWS EKS with Kubernetes version 1.14. The workers have version
v1.14.7-eks-1861c5
and are launched with kubelet args:Linkerd 2.5.0 was installed with:
linkerd install | kubectl apply -f -
, and upgraded to 2.6.0 withlinkerd upgrade --ha | kubectl apply --prune -l linkerd.io/control-plane-ns=linkerd -f -
But if my memory serves me correctly our test environment was simply installed with 2.6.0linkerd install --ha | kubectl apply -f -
and it happened there as well.Doesn’t feel like it is related to traffic since our test environment i relatively low traffic.
linkerd endpoints
seems to align withkubectl get endpoints
of the services.Ran the script @cpretzer linked when the issues was happening and
Everything looks okay!