linkerd2: Thanos Query cannot connect to Prometheus with Linkerd
Bug Report
What is the issue?
I have deployed Prometheus (https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack) and Thanos in my cluster, without linkerd everything is working fine.
As soon as I enable linkerd for both Prometheus and Thanos the Query pod stops communicating with Prometheus.
linkerd-proxy
reports connection errors and stops collecting metrics.
How can it be reproduced?
- install linkerd
- install kube-prometheus-stack with Thanos sidecar and linkerd injection enabled at namespace level
- install Thanos. I’ve used the configuration reported here
Logs, error output, etc
Thanos Query’s linkerd-proxy output: https://gist.github.com/irizzant/bfe93d82b293915fc4ffa7b5edb77773
Screenshot taken from linkerd-viz to show the path Thanos Query is trying to reach:
linkerd check
output
kubernetes-api
--------------
√ can initialize the client
√ can query the Kubernetes API
kubernetes-version
------------------
√ is running the minimum Kubernetes API version
√ is running the minimum kubectl version
linkerd-existence
-----------------
√ 'linkerd-config' config map exists
√ heartbeat ServiceAccount exist
√ control plane replica sets are ready
√ no unschedulable pods
√ controller pod is running
linkerd-config
--------------
√ control plane Namespace exists
√ control plane ClusterRoles exist
√ control plane ClusterRoleBindings exist
√ control plane ServiceAccounts exist
√ control plane CustomResourceDefinitions exist
√ control plane MutatingWebhookConfigurations exist
√ control plane ValidatingWebhookConfigurations exist
W0903 14:08:03.083583 23673 warnings.go:67] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
√ control plane PodSecurityPolicies exist
linkerd-identity
----------------
√ certificate config is valid
√ trust anchors are using supported crypto algorithm
√ trust anchors are within their validity period
√ trust anchors are valid for at least 60 days
√ issuer cert is using supported crypto algorithm
√ issuer cert is within its validity period
√ issuer cert is valid for at least 60 days
√ issuer cert is issued by the trust anchor
linkerd-webhooks-and-apisvc-tls
-------------------------------
√ proxy-injector webhook has valid cert
√ proxy-injector cert is valid for at least 60 days
√ sp-validator webhook has valid cert
√ sp-validator cert is valid for at least 60 days
linkerd-api
-----------
√ control plane pods are ready
√ can initialize the client
√ can query the control plane API
linkerd-version
---------------
√ can determine the latest version
√ cli is up-to-date
control-plane-version
---------------------
√ control plane is up-to-date
√ control plane and cli versions match
Status check results are √
Linkerd extensions checks
=========================
linkerd-viz
-----------
√ linkerd-viz Namespace exists
√ linkerd-viz ClusterRoles exist
√ linkerd-viz ClusterRoleBindings exist
√ tap API server has valid cert
√ tap API server cert is valid for at least 60 days
√ tap API service is running
√ linkerd-viz pods are injected
√ viz extension pods are running
√ prometheus is installed and configured correctly
√ can initialize the client
√ viz extension self-check
Status check results are √
Environment
- Kubernetes Version: k3d v1.21.1+k3s1
- Cluster Environment: (GKE, AKS, kops, …) k3d
- Host OS:
- Linkerd version:
Client version: stable-2.10.2
Server version: stable-2.10.2
Possible solution
Additional context
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 15 (7 by maintainers)
Hey @Pothulapati you’re right, the error is because I foresaw Thanos chart version in the reproducer repo.
I updated the reproducer repo with the fix, now storegateway should start fine.
@irizzant thanks so much! this should make it much easier to track it down. we’ll take a look