linkerd2: linkerd-proxy throwing load balancer discovery error

Bug Report

What is the issue?

linkerd-proxy started throwing an error and was unable to connect to other meshed services

How can it be reproduced?

Unknown

Logs, error output, etc

ERR! [249010.471746s] outbound:accept{peer.addr=100.96.11.27:34552}:source{target.addr=100.69.16.83:80}: linkerd2_app_core::errors unexpected error: buffered service failed: load balancer discovery error: discovery task failed

linkerd check output

kubernetes-api
--------------
√ can initialize the client
√ can query the Kubernetes API

kubernetes-version
------------------
× is running the minimum Kubernetes API version
    Kubernetes is on version [1.11.8], but version [1.13.0] or more recent is required
    see https://linkerd.io/checks/#k8s-version for hints
√ is running the minimum kubectl version

linkerd-config
--------------
√ control plane Namespace exists
√ control plane ClusterRoles exist
√ control plane ClusterRoleBindings exist
√ control plane ServiceAccounts exist
√ control plane CustomResourceDefinitions exist
√ control plane MutatingWebhookConfigurations exist
√ control plane ValidatingWebhookConfigurations exist
√ control plane PodSecurityPolicies exist

linkerd-existence
-----------------
√ 'linkerd-config' config map exists
√ heartbeat ServiceAccount exist
√ control plane replica sets are ready
√ no unschedulable pods
√ controller pod is running
√ can initialize the client
√ can query the control plane API

linkerd-identity
----------------
√ certificate config is valid
√ trust roots are using supported crypto algorithm
√ trust roots are within their validity period
√ trust roots are valid for at least 60 days
√ issuer cert is using supported crypto algorithm
√ issuer cert is within its validity period
√ issuer cert is valid for at least 60 days
√ issuer cert is issued by the trust root

linkerd-api
-----------
√ control plane pods are ready
√ control plane self-check
√ [kubernetes] control plane can talk to Kubernetes
√ [prometheus] control plane can talk to Prometheus

linkerd-version
---------------
√ can determine the latest version
‼ cli is up-to-date
    is running version 20.1.2 but the latest edge version is 20.1.3
    see https://linkerd.io/checks/#l5d-version-cli for hints

control-plane-version
---------------------
‼ control plane is up-to-date
    is running version 20.1.2 but the latest edge version is 20.1.3
    see https://linkerd.io/checks/#l5d-version-control for hints
√ control plane and cli versions match

Environment

  • Kubernetes Version:
  • Cluster Environment: (GKE, AKS, kops, …)
  • Host OS:
  • Linkerd version:

Possible solution

Additional context

Verfied the linkerd-proxy sidecar version was also 20.1.2. Similar to https://github.com/linkerd/linkerd2/issues/3935

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 3
  • Comments: 30 (14 by maintainers)

Most upvoted comments

I have a proxy version which I believe fixes these issues. It should be released in tomorrow’s edge release, but it can be tested manually by adding an annotation to your workload:

config.linkerd.io/proxy-version: ver-backpressure-2020-03-04.0

Please let us know if you observe any similar issues with the newer version!

So I’ve just witnessed the same thing in one of our pods in our staging environment on :

~ ❯ kubectl logs daedalus-updater-69fbb949fc-6mc6s linkerd-proxy -f
time="2020-02-17T15:49:54Z" level=info msg="running version stable-2.7.0"
[     0.10338367s]  INFO linkerd2_proxy: Admin interface on 0.0.0.0:4191
[     0.10365568s]  INFO linkerd2_proxy: Inbound interface on 0.0.0.0:4143
[     0.10371468s]  INFO linkerd2_proxy: Outbound interface on 127.0.0.1:4140
[     0.10393368s]  INFO linkerd2_proxy: Tap interface on 0.0.0.0:4190
[     0.10491068s]  INFO linkerd2_proxy: Local identity is default.one.serviceaccount.identity.linkerd.cluster.local
[     0.10513669s]  INFO linkerd2_proxy: Identity verified via linkerd-identity.linkerd.svc.cluster.local:8080 (linkerd-identity.linkerd.serviceaccount.identity.linkerd.cluster.local)
[     0.10519369s]  INFO linkerd2_proxy: Destinations resolved via linkerd-dst.linkerd.svc.cluster.local:8086 (linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local)
[     0.12083579s]  INFO inbound: linkerd2_app_inbound: serving listen.addr=0.0.0.0:4143
[     0.32760114s]  INFO daemon:identity: linkerd2_app: Certified identity: default.one.serviceaccount.identity.linkerd.cluster.local
[154791.991986182s]  WARN outbound:accept{peer.addr=10.1.0.31:46516}:source{target.addr=172.16.12.128:80}:addr{addr=hermes-api:80}:logical{dst.logical=hermes-api.one.svc.cluster.local:80}:concrete{dst.concrete=hermes-api.one.svc.cluster.local:80}: linkerd2_proxy_discover::buffer: Dropping resolution due to watchdog timeout=60s
[155203.91360616s]  WARN outbound:accept{peer.addr=10.1.0.31:35284}:source{target.addr=172.16.12.128:80}: linkerd2_app_core::errors: Failed to proxy request: buffered service failed: load balancer discovery error: discovery task failed
[155224.476833031s]  WARN outbound:accept{peer.addr=10.1.0.31:35780}:source{target.addr=172.16.12.128:80}: linkerd2_app_core::errors: Failed to proxy request: buffered service failed: load balancer discovery error: discovery task failed
[155329.657839425s]  WARN outbound:accept{peer.addr=10.1.0.31:37844}:source{target.addr=172.16.12.128:80}: linkerd2_app_core::errors: Failed to proxy request: buffered service failed: load balancer discovery error: discovery task failed
[156882.508075123s]  WARN outbound:accept{peer.addr=10.1.0.31:40094}:source{target.addr=172.16.12.128:80}: linkerd2_app_core::errors: Failed to proxy request: buffered service failed: load balancer discovery error: discovery task failed
[157461.359183618s]  WARN outbound:accept{peer.addr=10.1.0.31:51468}:source{target.addr=172.16.12.128:80}: linkerd2_app_core::errors: Failed to proxy request: buffered service failed: load balancer discovery error: discovery task failed
[157507.388896661s]  WARN outbound:accept{peer.addr=10.1.0.31:52394}:source{target.addr=172.16.12.128:80}: linkerd2_app_core::errors: Failed to proxy request: buffered service failed: load balancer discovery error: discovery task failed

Output of kubectl version

~ ❯ kubectl version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.7", GitCommit:"6c143d35bb11d74970e7bc0b6c45b6bfdffc0bd4", GitTreeState:"clean", BuildDate:"2019-12-11T12:42:56Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.9", GitCommit:"2e808b7cb054ee242b68e62455323aa783991f03", GitTreeState:"clean", BuildDate:"2020-01-18T23:24:23Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}

Output of linkerd version

~ ❯ linkerd version
Client version: stable-2.7.0
Server version: stable-2.7.0

Output of linkerd check

~ took 6s ❯ linkerd check
kubernetes-api
--------------
√ can initialize the client
√ can query the Kubernetes API

kubernetes-version
------------------
√ is running the minimum Kubernetes API version
√ is running the minimum kubectl version

linkerd-existence
-----------------
√ 'linkerd-config' config map exists
√ heartbeat ServiceAccount exist
√ control plane replica sets are ready
√ no unschedulable pods
√ controller pod is running
√ can initialize the client
√ can query the control plane API

linkerd-config
--------------
√ control plane Namespace exists
√ control plane ClusterRoles exist
√ control plane ClusterRoleBindings exist
√ control plane ServiceAccounts exist
√ control plane CustomResourceDefinitions exist
√ control plane MutatingWebhookConfigurations exist
√ control plane ValidatingWebhookConfigurations exist
√ control plane PodSecurityPolicies exist

linkerd-identity
----------------
√ certificate config is valid
√ trust roots are using supported crypto algorithm
√ trust roots are within their validity period
√ trust roots are valid for at least 60 days
√ issuer cert is using supported crypto algorithm
√ issuer cert is within its validity period
√ issuer cert is valid for at least 60 days
√ issuer cert is issued by the trust root

linkerd-api
-----------
√ control plane pods are ready
√ control plane self-check
√ [kubernetes] control plane can talk to Kubernetes
√ [prometheus] control plane can talk to Prometheus
√ tap api service is running

linkerd-version
---------------
√ can determine the latest version
√ cli is up-to-date

control-plane-version
---------------------
√ control plane is up-to-date
√ control plane and cli versions match

linkerd-ha-checks
-----------------
√ pod injection disabled on kube-system

Status check results are √

Deleting the pod resolved the issue.

We’re starting to prepare stable-2.7.1 to include this fix, hopefully released later this week.

In the meantime, I suggest setting the annotation to:

config.linkerd.io/proxy-version: edge-20.3.4

Please report back if you experience this issue again! Thanks

Can’t test on >=1.13 for the moment. Also, I have lots of meshed services. Only saw this error on one service(3 pods) and all 3 pods exhibited the same error. I also verified that linkerd-proxy is running v2.84.0.