linkerd2: Proxy gets 503 response from K8s service
Bug Report
While debugging occasional failures in Production, I observed connection errors between the outbound proxy and one of the pods (nonexisting pod). This issue resembles the issue 6184
What is the issue?
We use Nginx ingress with LinkerD running in default mode (normal mode). One of the Ingress pods tries to connect to an IP (10.47.255.72:8080) of a nonexisting Pod. After reviewing the clusters state I saw that non of the pods restarted (both source and destination). The IP address is currently not assigned to any pod.
How can it be reproduced?
Wasn’t able to reproduce it (might be caused due to a network glitch in GKE
Logs, error output, etc
[ 272.767311s] INFO ThreadId(01) outbound:accept{peer.addr=10.44.5.238:46900 target.addr=10.47.255.72:8080}: linkerd2_app_core::serve: Connection closed error=Service in fail-fast
https://gist.github.com/AlonGluz/82633391f432ef35158deb11c516fb8c
linkerd check
output
kubernetes-api
--------------
√ can initialize the client
√ can query the Kubernetes API
kubernetes-version
------------------
√ is running the minimum Kubernetes API version
√ is running the minimum kubectl version
linkerd-existence
-----------------
√ 'linkerd-config' config map exists
√ heartbeat ServiceAccount exist
√ control plane replica sets are ready
√ no unschedulable pods
√ controller pod is running
√ can initialize the client
√ can query the control plane API
linkerd-config
--------------
√ control plane Namespace exists
√ control plane ClusterRoles exist
√ control plane ClusterRoleBindings exist
√ control plane ServiceAccounts exist
√ control plane CustomResourceDefinitions exist
√ control plane MutatingWebhookConfigurations exist
√ control plane ValidatingWebhookConfigurations exist
√ control plane PodSecurityPolicies exist
linkerd-identity
----------------
√ certificate config is valid
√ trust anchors are using supported crypto algorithm
√ trust anchors are within their validity period
√ trust anchors are valid for at least 60 days
√ issuer cert is using supported crypto algorithm
√ issuer cert is within its validity period
√ issuer cert is valid for at least 60 days
√ issuer cert is issued by the trust anchor
linkerd-webhooks-and-apisvc-tls
-------------------------------
√ tap API server has valid cert
√ tap API server cert is valid for at least 60 days
√ proxy-injector webhook has valid cert
√ proxy-injector cert is valid for at least 60 days
√ sp-validator webhook has valid cert
√ sp-validator cert is valid for at least 60 days
linkerd-api
-----------
√ control plane pods are ready
√ control plane self-check
√ [kubernetes] control plane can talk to Kubernetes
√ [prometheus] control plane can talk to Prometheus
√ tap api service is running
linkerd-version
---------------
√ can determine the latest version
‼ cli is up-to-date
is running version 2.9.4 but the latest stable version is 2.10.2
see https://linkerd.io/checks/#l5d-version-cli for hints
control-plane-version
---------------------
‼ control plane is up-to-date
is running version 2.9.4 but the latest stable version is 2.10.2
see https://linkerd.io/checks/#l5d-version-control for hints
√ control plane and cli versions match
linkerd-ha-checks
-----------------
√ pod injection disabled on kube-system
√ multiple replicas of control plane pods
linkerd-multicluster
--------------------
√ Link CRD exists
Status check results are √
Environment
- Kubernetes Version: 1.18.17-gke.1200
- Cluster Environment: GKE
- Host OS: cloud.google.com/gke-os-distribution: cos
- Linkerd version: 2.9.4
Possible solution
Additional context
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 19 (8 by maintainers)
Hey @mateiidavid ,
Hey, @mateiidavid thanks for your response. It happens quite a lot, it’s just not that easily reproducible. I’ll add the config and forward the output.