thanos: Querier cannot speak to external sidecar
Thanos, Prometheus and Golang version used: thanos: charts.bitnami.com/bitnami 2.0.0 prometheus-operator with sidecar: charts.bitnami.com/bitnami0.22.3 sidecar image: 0.14.0-scratch-r3
What happened: I have 2 eks clusters both running prometheus-operator. 1 cluster has thanos installed (above chart) and needs querier to add the sidecar of the other cluster: Thanos is registering its local sidecar, the storage gateway but not the external sidecar:
cluster 1 (querier) -> aws route 55 (dns) -> nlb loadbalancer -> nginx ingress -> service -> prometheus pod -> sidecar port 10901
What you expected to happen: For the external sidecar to be added to stores in Querie
Helpful facts:
From my laptop I can grpcurl list the side-car on port 443 which shows the route to work.
response: Failed to list services: server does not support the reflection API
I had to enable ALPN policy (HTTP2Optional) on the NLB to make this work.
This ping also works from a busybox pod in the Thanos cluster in the same ns as querier.
These show up in the nginx logs where I cannot find entries coming from querier…
How to reproduce it (as minimally and precisely as possible): have 2 clusters, try to add the sidecar from 1 to the querier in the other. Use nginx ingress for this.
Full logs to relevant components:
Querier:
level=warn ts=2020-08-04T10:33:05.274486641Z caller=storeset.go:487 component=storeset msg="update of store node failed" err="getting metadata: fetching store info from XXXXXXXXX:443: rpc error: code = DeadlineExceeded desc = context deadline exceeded" address=XXXXXXXXXX:443
Sidecar:
level=info ts=2020-08-04T07:48:09.710243084Z caller=main.go:151 msg="Tracing will be disabled"
level=info ts=2020-08-04T07:48:09.710624498Z caller=options.go:23 protocol=gRPC msg="disabled TLS, key and cert must be set to enable"
level=info ts=2020-08-04T07:48:09.710771643Z caller=grpc.go:92 service=gRPC/server component=sidecar msg="registering as gRPC StoreAPI and RulesAPI"
level=info ts=2020-08-04T07:48:09.71109473Z caller=factory.go:46 msg="loading bucket configuration"
level=info ts=2020-08-04T07:48:09.711510731Z caller=sidecar.go:301 msg="starting sidecar"
level=info ts=2020-08-04T07:48:09.711645564Z caller=reloader.go:198 component=reloader msg="started watching config file and non-recursively rule dirs for changes" cfg= out= dirs=
level=info ts=2020-08-04T07:48:09.712128709Z caller=intrumentation.go:48 msg="changing probe status" status=ready
level=info ts=2020-08-04T07:48:09.712603212Z caller=grpc.go:119 service=gRPC/server component=sidecar msg="listening for serving gRPC" address=0.0.0.0:10901
level=info ts=2020-08-04T07:48:09.712795694Z caller=intrumentation.go:60 msg="changing probe status" status=healthy
level=info ts=2020-08-04T07:48:09.712814177Z caller=http.go:56 service=http/server component=sidecar msg="listening for requests and metrics" address=0.0.0.0:10902
level=info ts=2020-08-04T07:48:14.724274654Z caller=sidecar.go:163 msg="successfully loaded prometheus external labels" external_labels="{cluster=\"tcloud-cloudworks\", prometheus=\"kube-system/prometheus-operator-prometheus\", prometheus_replica=\"prometheus-prometheus-operator-prometheus-0\"}"
level=info ts=2020-08-04T07:48:14.724382853Z caller=intrumentation.go:48 msg="changing probe status" status=ready
level=info ts=2020-08-04T09:00:16.785707212Z caller=shipper.go:361 msg="upload new block" id=01EEWB5VEM0YRJVFH72RF0TY34
level=info ts=2020-08-04T11:00:16.728795796Z caller=shipper.go:361 msg="upload new block" id=01EEWJ1JPKVYG482MRGGTFJAFV
Anything else we need to know:
Querier config:
- query
- --log.level=debug
- --grpc-address=0.0.0.0:10901
- --http-address=0.0.0.0:10902
- --query.replica-label=replica
- --store=dnssrv+_grpc._tcp.thanos-storegateway.kube-system.svc.cluster.local
- --store=dnssrv+_grpc._tcp.prometheus-operator-prometheus-thanos.kube-system.svc
- --store=dns+thanos-sidecar.DNSNAME:443
image: docker.io/bitnami/thanos:0.14.0-scratch-r1
Changing - --store=dns+thanos-sidecar.DNSNAME:443
to - --store=thanos-sidecar.DNSNAME:443
did not have any effect.
Sidecar config:
- sidecar
- --prometheus.url=http://localhost:9090
- --grpc-address=0.0.0.0:10901
- --http-address=0.0.0.0:10902
- --tsdb.path=/prometheus/
- --objstore.config=$(OBJSTORE_CONFIG)
- --log.level=info
Sidecar Nginx/Ingress config We are terminating tls in NLB
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
annotations:
nginx.ingress.kubernetes.io/backend-protocol: GRPC
nginx.ingress.kubernetes.io/proxy-connect-timeout: 360s
nginx.ingress.kubernetes.io/proxy-read-timeout: 360s
nginx.ingress.kubernetes.io/proxy-send-timeout: 360s
creationTimestamp: "2020-07-30T08:58:56Z"
generation: 7
name: thanos-sidecar
namespace: kube-system
resourceVersion: "2043742"
selfLink: /apis/extensions/v1beta1/namespaces/kube-system/ingresses/thanos-sidecar
uid: db27bbe0-4e60-460c-b841-13000a63e8f3
spec:
rules:
- host: thanos-sidecar.DNSNAME
http:
paths:
- backend:
serviceName: thanos-sidecar
servicePort: 10901
status:
loadBalancer:
ingress:
- ip: XXXX
- ip: XXXX
- ip: XXXX
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 3
- Comments: 20
I have the same problem, add
--grpc-client-tls-secure
, Cannot co-exist with cluster internal DNS address@roysha1
envoy.yaml
My ingress using nginx controller
I have the same issue…