thanos: thanos+ingress-nginx+grpc: impossible setup due missing host header
Thanos, Prometheus and Golang version used quay.io/thanos/thanos:v0.7.0
What happened i setup 2 kubernetes clusters, thanos query is in one cluster (and a local prometheus+sidecar) and need to query the remote kubernetes cluster thanos sidecar, all running in AWS (but not using eks) I created one ingress-nginx with support for grpc with this config:
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: monitoring-ingress
namespace: monitoring
annotations:
kubernetes.io/ingress.class: "nginx"
spec:
rules:
- host: prometheus-k8s-live-a.ops.example.com
http:
paths:
- path: /
backend:
serviceName: prometheus-k8s-live-a
servicePort: 9090
- host: prometheus-k8s-live-b.ops.example.com
http:
paths:
- path: /
backend:
serviceName: prometheus-k8s-live-b
servicePort: 9090
tls:
- hosts:
- prometheus-k8s-live-a.ops.example.com
- prometheus-k8s-live-b.ops.example.com
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
annotations:
kubernetes.io/ingress.class: "nginx"
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/backend-protocol: "GRPC"
name: grpc-ingress
namespace: monitoring
spec:
rules:
- host: sidecar-k8s-live-a.ops.example.com
http:
paths:
- backend:
serviceName: sidecar-k8s-live-a
servicePort: 10911
- host: sidecar-k8s-live-b.ops.example.com
http:
paths:
- backend:
serviceName: sidecar-k8s-live-b
servicePort: 10911
tls:
- hosts:
- sidecar-k8s-live-a.ops.example.com
- sidecar-k8s-live-b.ops.example.com
thanos query is using
--store=sidecar-k8s-live-a.ops.example.com.:443
--store=sidecar-k8s-live-a.ops.example.com.:443
I can connect to the prometheus url, but the sidecar grpc fail in thanos query.
looking to the nginx logs i can see the query arriving in http2, but returning 400. Doing a curl i can get a 503, but probably just because it is not really grpc. Changing the ingress-nginx logs to show the host header, i can see that curl is sending the correct host header, but for thanos query the logs show only _
, it is either sending a empty one or a _
.
What you expected to happen I wanted to share the ingress to receive the https requests for prometheus and the grpc and using the host to redirect the request to the correct service. Sadly thanos query fail to send the host header and so the nginx can’t apply the virtual_host search and servers the request from the default site.
Full logs to relevant components
172.27.119.135 - [172.27.119.135] - - [10/Sep/2019:15:02:40 +0000] "PRI * HTTP/2.0" 400 163 "-" "-" 0 0.001 [] [] - - - - 477873c7a336618ccf06cf9c03fe8d97
172.27.119.135 - [172.27.119.135] - - [10/Sep/2019:15:02:40 +0000] "PRI * HTTP/2.0" 400 163 "-" "-" 0 0.003 [] [] - - - - c32e68975e91159a64326b55d4b72934
2019/09/10 15:02:40 [error] 1137#1137: *7155 upstream rejected request with error 2 while reading response header from upstream, client: 172.26.81.74, server: sidecar-k8s-live-a.ops.example.com, request: "PRI / HTTP/1.1", upstream: "grpc://100.96.136.200:10911", host: "sidecar-k8s-live-a.ops.example.com"
172.26.81.74 - [172.26.81.74] - - [10/Sep/2019:15:02:40 +0000] "PRI / HTTP/1.1" 502 163 "-" "curl/7.58.0" 189 0.002 [monitoring-sidecar-k8s-live-a-10911] [] 100.96.136.200:10911 0 0.004 502 4e08c4e8c6d8df148c5bc3a68d61ccf9
here we can see that the thanos query requests do not trigger the virtual_host, but the curl one, with host, is redirected to thanos sidecar
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 11
- Comments: 58 (8 by maintainers)
Commits related to this issue
- Add servername to grpc dial options for DNS stores (#1507) To avoid needing a query per remote cluster, get the name to add to the dial options from the dns provider when making the grpc connection. — committed to j3p0uk/thanos by j3p0uk 4 years ago
- Add servername to grpc dial options for DNS stores (#1507) To avoid needing a query per remote cluster, get the name to add to the dial options from the dns provider when making the grpc connection. — committed to j3p0uk/thanos by j3p0uk 4 years ago
- Add servername to grpc dial options for DNS stores (#1507) To avoid needing a query per remote cluster, get the name to add to the dial options from the dns provider when making the grpc connection. — committed to j3p0uk/thanos by j3p0uk 4 years ago
- Add servername to grpc dial options for DNS stores (#1507) To avoid needing a query per remote cluster, get the name to add to the dial options from the dns provider when making the grpc connection. ... — committed to j3p0uk/thanos by j3p0uk 4 years ago
Another work around with the NGINX Ingress Controller is to use the
--grpc-client-server-name
flag on yourthanos-query
. This uses Server Name Indication, allowing the ingress controller to route the request correctly.I believe this limits each querier to one server name only. Therefore you will need multiple queriers if you have multiple clusters to communicate between.
Your
thanos-query
args would include:And your ingress annotations would include:
Apparently this is still needed and valid.
After a while of pulling my hair with this one, I managed to make it work. Just a note here, my ingress is on the Query instance not the sidecar, I would assume it’d work the same way for sidecar (didn’t test that part)
My architecture is as follows:
I’m deploying the stack with helm, here is my config
Remote & Central Cluster Prometheus Operator
Remote Cluster Query Config
Central Cluster Query Config
Notice the certificate used for query ingress and for client TLS is the same certificate. I hope this helps someone
Still valid and help wanted.
I have a similar issue, I have a multi-cluster setup where each cluster has one query and one sidecar. And there is an observability cluster that has a query instance that points to all query instances in other clusters. Query instances are exposed through an ingress that has a backend on the GRPC port
The query and sidecar in clusters are working, but I can’t create stores pointing to
query.my-local-domain.local:443
, I keep getting this errorrpc error: code = DeadlineExceeded desc = latest balancer error: connection closed
Here is my ingress annotations:
The ingress controller is an Azure Internal Load Balancer. Am I missing anything?
P.S: tls is not enabled on the sidecar nor the query instances
I had the same problem
For anyone who’s bashing their heads against this, this single line fixed it; we have ingress enabled in both observer and remote.
using bitnami kube-prometheus and bitnami thanos on eks 1.21
heres the values for thanos:
and for kube-prometheus:
I had a same problem My solution was to use bitnami charts
Depends: https://github.com/bitnami/charts/pull/5345 https://github.com/bitnami/charts/pull/5344
my
bitnami/kube-prometheus
custom values:my
bitnami/thanos
custom values:Possible fix pushed that uses a flag to change behaviour based around the workaround detailed by @cjf-fuller in https://github.com/thanos-io/thanos/issues/1507#issuecomment-580820712.
If “grpc-client-dns-server-name” flag is specified then use the DNS provider to return back the name that was originally looked up and add the relevant dial options for the grpc at connection time. Allows a different SNI per store, based on the originally provided (
dns+<name>:<port>
) name.A reasonable way to work around this with NGINX Ingress Controller is to use the
tcp-services-configmap
feature to expose ports that route directly tosidecar-k8s-live-a:10911
(e.g.11911
) andsidecar-k8s-live-b:10911
(e.g.12911
) respectively.Then your
thanos-query
options would look something like:You still have to set up TLS on your own in both
thanos-query
andthanos-sidecar
, but it helps avoid all the HTTP routing that the ingress controller tries to do for you.Hello for me it’s work when you add the extraflag
--grpc-client-tls-secure
and on the observee cluster i havec certman activated@j3p0uk I have tried this with
grpcurl
from the central clustergrpcurl -insecure query.my-local-domain.local:443 list
I’m getting this response:
I did a describe as well
grpcurl -insecure query.my-local-domain.local:443 describe
and this is the outputThen
grpcurl -insecure query.my-local-domain.local:443 thanos.Store/Info
and this is the outputWhat if don’t want to use tls at all? Not for internal communication (which is default), and not for external queries. Is it possible?
@Than0s-coder, great point, we have set up a “central” Querier to target a “leaf” Querier and not the sidecars directly. But it sounds like this risk of overwriting the initial query and loss of host_headers would still be present?
@martip07, I am still very much a beginner with Thanos so could be totally wrong here. But, as far as I can tell the
--grpc-client-server-name
argument is a string, that setsServerName
in tls.Config. I am not too sure how I would make this a list of servernames.I have seen that the TLS Extensions documentation talks of a
ServerNameList
struct. I cannot find many examples of this being used. I have tested this with a simple comma separated list (--grpc-client-server-name=test-1.myorg.com,test-2.myorg.com
) which fails at the SSL handshake because the list is not enumerated at any point. So it fails as the wildcard certificate is valid for“*.myorg.com”
and not“test-1.myorg.com,test-2.myorg.com”
Found a reference for this problem in a several months old issue (not directly related to this) https://github.com/thanos-io/thanos/issues/977#issuecomment-483679010 and it basically confirms the problem