thanos: Querier cannot speak to external sidecar

Thanos, Prometheus and Golang version used: thanos: charts.bitnami.com/bitnami 2.0.0 prometheus-operator with sidecar: charts.bitnami.com/bitnami0.22.3 sidecar image: 0.14.0-scratch-r3

What happened: I have 2 eks clusters both running prometheus-operator. 1 cluster has thanos installed (above chart) and needs querier to add the sidecar of the other cluster: Thanos is registering its local sidecar, the storage gateway but not the external sidecar:

cluster 1 (querier) -> aws route 55 (dns) -> nlb loadbalancer -> nginx ingress -> service -> prometheus pod -> sidecar port 10901

What you expected to happen: For the external sidecar to be added to stores in Querie

Helpful facts: From my laptop I can grpcurl list the side-car on port 443 which shows the route to work. response: Failed to list services: server does not support the reflection API I had to enable ALPN policy (HTTP2Optional) on the NLB to make this work. This ping also works from a busybox pod in the Thanos cluster in the same ns as querier. These show up in the nginx logs where I cannot find entries coming from querier…

How to reproduce it (as minimally and precisely as possible): have 2 clusters, try to add the sidecar from 1 to the querier in the other. Use nginx ingress for this.

Full logs to relevant components: Querier: level=warn ts=2020-08-04T10:33:05.274486641Z caller=storeset.go:487 component=storeset msg="update of store node failed" err="getting metadata: fetching store info from XXXXXXXXX:443: rpc error: code = DeadlineExceeded desc = context deadline exceeded" address=XXXXXXXXXX:443

Sidecar:

level=info ts=2020-08-04T07:48:09.710243084Z caller=main.go:151 msg="Tracing will be disabled"
level=info ts=2020-08-04T07:48:09.710624498Z caller=options.go:23 protocol=gRPC msg="disabled TLS, key and cert must be set to enable"
level=info ts=2020-08-04T07:48:09.710771643Z caller=grpc.go:92 service=gRPC/server component=sidecar msg="registering as gRPC StoreAPI and RulesAPI"
level=info ts=2020-08-04T07:48:09.71109473Z caller=factory.go:46 msg="loading bucket configuration"
level=info ts=2020-08-04T07:48:09.711510731Z caller=sidecar.go:301 msg="starting sidecar"
level=info ts=2020-08-04T07:48:09.711645564Z caller=reloader.go:198 component=reloader msg="started watching config file and non-recursively rule dirs for changes" cfg= out= dirs=
level=info ts=2020-08-04T07:48:09.712128709Z caller=intrumentation.go:48 msg="changing probe status" status=ready
level=info ts=2020-08-04T07:48:09.712603212Z caller=grpc.go:119 service=gRPC/server component=sidecar msg="listening for serving gRPC" address=0.0.0.0:10901
level=info ts=2020-08-04T07:48:09.712795694Z caller=intrumentation.go:60 msg="changing probe status" status=healthy
level=info ts=2020-08-04T07:48:09.712814177Z caller=http.go:56 service=http/server component=sidecar msg="listening for requests and metrics" address=0.0.0.0:10902
level=info ts=2020-08-04T07:48:14.724274654Z caller=sidecar.go:163 msg="successfully loaded prometheus external labels" external_labels="{cluster=\"tcloud-cloudworks\", prometheus=\"kube-system/prometheus-operator-prometheus\", prometheus_replica=\"prometheus-prometheus-operator-prometheus-0\"}"
level=info ts=2020-08-04T07:48:14.724382853Z caller=intrumentation.go:48 msg="changing probe status" status=ready
level=info ts=2020-08-04T09:00:16.785707212Z caller=shipper.go:361 msg="upload new block" id=01EEWB5VEM0YRJVFH72RF0TY34
level=info ts=2020-08-04T11:00:16.728795796Z caller=shipper.go:361 msg="upload new block" id=01EEWJ1JPKVYG482MRGGTFJAFV

Anything else we need to know:

Querier config:

- query
        - --log.level=debug
        - --grpc-address=0.0.0.0:10901
        - --http-address=0.0.0.0:10902
        - --query.replica-label=replica
        - --store=dnssrv+_grpc._tcp.thanos-storegateway.kube-system.svc.cluster.local
        - --store=dnssrv+_grpc._tcp.prometheus-operator-prometheus-thanos.kube-system.svc
        - --store=dns+thanos-sidecar.DNSNAME:443
        image: docker.io/bitnami/thanos:0.14.0-scratch-r1

Changing - --store=dns+thanos-sidecar.DNSNAME:443 to - --store=thanos-sidecar.DNSNAME:443 did not have any effect.

Sidecar config:

- sidecar
    - --prometheus.url=http://localhost:9090
    - --grpc-address=0.0.0.0:10901
    - --http-address=0.0.0.0:10902
    - --tsdb.path=/prometheus/
    - --objstore.config=$(OBJSTORE_CONFIG)
    - --log.level=info

Sidecar Nginx/Ingress config We are terminating tls in NLB

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    nginx.ingress.kubernetes.io/backend-protocol: GRPC
    nginx.ingress.kubernetes.io/proxy-connect-timeout: 360s
    nginx.ingress.kubernetes.io/proxy-read-timeout: 360s
    nginx.ingress.kubernetes.io/proxy-send-timeout: 360s
  creationTimestamp: "2020-07-30T08:58:56Z"
  generation: 7
  name: thanos-sidecar
  namespace: kube-system
  resourceVersion: "2043742"
  selfLink: /apis/extensions/v1beta1/namespaces/kube-system/ingresses/thanos-sidecar
  uid: db27bbe0-4e60-460c-b841-13000a63e8f3
spec:
  rules:
  - host: thanos-sidecar.DNSNAME
    http:
      paths:
      - backend:
          serviceName: thanos-sidecar
          servicePort: 10901
status:
  loadBalancer:
    ingress:
    - ip: XXXX
    - ip: XXXX
    - ip: XXXX

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 3
  • Comments: 20

Most upvoted comments

I have the same problem, add --grpc-client-tls-secure, Cannot co-exist with cluster internal DNS address

@roysha1

envoy.yaml

admin:
  access_log_path: /tmp/admin_access.log
  address:
    socket_address: { address: 0.0.0.0, port_value: 9901 }

static_resources:
  listeners:
  - name: listener_0
    address:
      socket_address: { address: 0.0.0.0, port_value: 10000 }
    filter_chains:
    - filters:
      - name: envoy.http_connection_manager
        config:
          codec_type: auto
          stat_prefix: ingress_http
          route_config:
            name: local_route
            virtual_hosts:
            - name: local_service
              domains: ["*"]
              routes:
              - match: { prefix: "/" }
                route: { cluster: cluster_name, host_rewrite: my_external_sidecar.domain.net }
          http_filters:
          - name: envoy.router
  clusters:
  - name: cluster_name
    connect_timeout: 30s
    type: logical_dns
    http2_protocol_options: {}
    dns_lookup_family: V4_ONLY
    lb_policy: round_robin
    hosts: [{ socket_address: { address: my_external_sidecars.domain.net, port_value: 443 }}]
    tls_context:
      common_tls_context:
        alpn_protocols:
        - h2
        - http/1.1
      sni: my_external_sidecars.domain.net

My ingress using nginx controller

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/backend-protocol: GRPC
    nginx.ingress.kubernetes.io/force-ssl-redirect: "false"
    nginx.ingress.kubernetes.io/grpc-backend: "true"
    nginx.ingress.kubernetes.io/protocol: h2c
    nginx.ingress.kubernetes.io/proxy-read-timeout: "160"
    nginx.ingress.kubernetes.io/ssl-redirect: "false"
  name: sicecard-instance-1
spec:
  rules:
  - host: my_external_sidecars.domain.net
    http:
      paths:
      - backend:
          serviceName: sicecard-instance-1-services
          servicePort: 10901
  tls:
  - hosts:
    - my_external_sidecars.domain.net
    secretName: my-tls-secret

I have the same issue…