prometheus-operator: cannot scrape etcd3 https with v0.22.2
(I have a K8s cluster on AWS using http for etcd3, and running v0.19.0 of kube-prometheus I’m able to (per the v19 documentation for monitoring etcd) get the etcd nodes in the cluster to show up on https://<my-prometheus>/targets; I customized the ServiceMonitor to specify scheme: http.)
But I’m stepping up to v0.22.2, in a cluster (named “sandbox-green”) where I have etcd3 accepting requests via https, and I’ve tried various tlsConfig (and I read through the comments on #898) but I can’t get any etcd targets to show up (nor do I see any instance of “etcd” on https://<my-prometheus>/config) - any suggestions would be greatly appreciated, and here’s details…
Here’s the nodes in my “sandbox-green” cluster:
"master": [
"192.168.213.199",
"192.168.213.168",
"192.168.213.132"
],
"agent": [
"192.168.213.152",
"192.168.213.182",
"192.168.213.178",
"192.168.213.229",
"192.168.213.219"
],
"etcd": [
"192.168.213.212",
"192.168.213.171",
"192.168.213.210"
]
When a master node is started up, here’s some of the options I pass to apiserver:
- --bind-address=0.0.0.0
- --etcd-servers=https://127.0.0.1:2379
- --etcd-cafile=/etc/kubernetes/ssl/ca.pem
- --etcd-certfile=/etc/kubernetes/ssl/etcd.pem
- --etcd-keyfile=/etc/kubernetes/ssl/etcd-key.pem
The kube-prometheus pods:
core@ip-192-168-213-199 ~ $ kubectl get pods -n monitoring -o wide
NAME READY STATUS RESTARTS AGE IP NODE
alertmanager-main-0 2/2 Running 0 1h 10.2.142.145 ip-192-168-213-178.ec2.internal
grafana-54d8db4bc4-h7zqk 1/1 Running 0 1h 10.2.142.143 ip-192-168-213-178.ec2.internal
kube-state-metrics-74f8fd79f7-wvn86 4/4 Running 0 1h 10.2.240.203 ip-192-168-213-229.ec2.internal
node-exporter-7s959 2/2 Running 0 1h 192.168.213.132 ip-192-168-213-132.ec2.internal
node-exporter-878l6 2/2 Running 0 1h 192.168.213.152 ip-192-168-213-152.ec2.internal
node-exporter-9h4hh 2/2 Running 0 1h 192.168.213.182 ip-192-168-213-182.ec2.internal
node-exporter-cbpw7 2/2 Running 0 1h 192.168.213.178 ip-192-168-213-178.ec2.internal
node-exporter-l45vz 2/2 Running 0 1h 192.168.213.168 ip-192-168-213-168.ec2.internal
node-exporter-qh278 2/2 Running 0 1h 192.168.213.219 ip-192-168-213-219.ec2.internal
node-exporter-tffcw 2/2 Running 0 1h 192.168.213.199 ip-192-168-213-199.ec2.internal
node-exporter-xk5rf 2/2 Running 0 1h 192.168.213.229 ip-192-168-213-229.ec2.internal
prometheus-k8s-0 3/3 Running 1 34m 10.2.142.149 ip-192-168-213-178.ec2.internal
prometheus-k8s-1 3/3 Running 1 34m 10.2.240.204 ip-192-168-213-229.ec2.internal
prometheus-operator-769f6f97cb-wvlm2 1/1 Running 0 1h 10.2.142.142 ip-192-168-213-178.ec2.internal
A snippet of the output from running openssl x509 -noout -text -in /etc/kubernetes/ssl/etcd.pem on agent node 192.168.213.178 (i.e. the agent that the prometheus-k8s-0 pod is running on); etcd3 doesn’t allow/support DNS, so there is no server name - thus SANs (Subject Alternative Names) are used:
X509v3 Subject Alternative Name:
DNS:localhost, IP Address:127.0.0.1, IP Address:192.168.213.171, IP Address:192.168.213.210, IP Address:192.168.213.212
Note that PR #1732 proposes some revisions to the commentary in kube-prometheus about monitoring etcd (see README.md & etcd.jsonnet).
Here’s the relevant objects (note that until the “TODO” comment mentioned here is resolved, in my own copy of vendor/kube-prometheus/kube-prometheus-static-etcd.libsonnet I’ve customized it to not specify a value for serverName but instead specify insecureSkipVerify: true,):
core@ip-192-168-213-199 ~ $ kubectl -n kube-system describe service etcd
Name: etcd
Namespace: kube-system
Labels: k8s-app=etcd
Annotations: <none>
Selector: <none>
Type: ClusterIP
IP: None
Port: metrics 2379/TCP
TargetPort: 2379/TCP
Endpoints: 192.168.213.171:2379,192.168.213.210:2379,192.168.213.212:2379
Session Affinity: None
Events: <none>
core@ip-192-168-213-199 ~ $ kubectl -n kube-system describe endpoints etcd
Name: etcd
Namespace: kube-system
Labels: k8s-app=etcd
Annotations: <none>
Subsets:
Addresses: 192.168.213.171,192.168.213.210,192.168.213.212
NotReadyAddresses: <none>
Ports:
Name Port Protocol
---- ---- --------
metrics 2379 TCP
Events: <none>
core@ip-192-168-213-199 ~ $ kubectl -n kube-system describe servicemonitor etcd
Name: etcd
Namespace: kube-system
Labels: k8s-app=etcd
Annotations: <none>
API Version: monitoring.coreos.com/v1
Kind: ServiceMonitor
Metadata:
Cluster Name:
Creation Timestamp: 2018-08-03T12:42:08Z
Generation: 1
Resource Version: 293799
Self Link: /apis/monitoring.coreos.com/v1/namespaces/kube-system/servicemonitors/etcd
UID: a2a0be20-971a-11e8-a8e9-0aaf6efae5ec
Spec:
Endpoints:
Interval: 30s
Port: metrics
Scheme: https
Tls Config:
Ca File: /etc/prometheus/secrets/kube-etcd-client-certs/etcd-client-ca.crt
Cert File: /etc/prometheus/secrets/kube-etcd-client-certs/etcd-client.crt
Insecure Skip Verify: true
Key File: /etc/prometheus/secrets/kube-etcd-client-certs/etcd-client.key
Job Label: k8s-app
Selector:
Match Labels:
K 8 S - App: etcd
Events: <none>
core@ip-192-168-213-199 ~ $ kubectl -n monitoring describe secret kube-etcd-client-certs
Name: kube-etcd-client-certs
Namespace: monitoring
Labels: <none>
Annotations: <none>
Type: Opaque
Data
====
etcd-client-ca.crt: 1069 bytes
etcd-client.crt: 1252 bytes
etcd-client.key: 1703 bytes
In the *.jsonnet file that I pass to build.sh, for the prometheus spec I specified logLevel: "debug",. But in kubectl -n monitoring logs prometheus-k8s-0 I don’t see any errors - and the only thing relevant to etcd is lines such as the following:
level=debug ts=2018-08-03T12:42:40.098677772Z caller=kubernetes.go:385 component="discovery manager scrape" discovery=k8s role=endpoint msg="kubernetes discovery update" role=endpoints tg="&targetgroup.Group{Targets:[]model.LabelSet{model.LabelSet{\"__address__\":\"192.168.213.171:2379\", \"__meta_kubernetes_endpoint_port_name\":\"metrics\", \"__meta_kubernetes_endpoint_port_protocol\":\"TCP\", \"__meta_kubernetes_endpoint_ready\":\"true\"}, model.LabelSet{\"__address__\":\"192.168.213.210:2379\", \"__meta_kubernetes_endpoint_port_name\":\"metrics\", \"__meta_kubernetes_endpoint_port_protocol\":\"TCP\", \"__meta_kubernetes_endpoint_ready\":\"true\"}, model.LabelSet{\"__address__\":\"192.168.213.212:2379\", \"__meta_kubernetes_endpoint_port_name\":\"metrics\", \"__meta_kubernetes_endpoint_port_protocol\":\"TCP\", \"__meta_kubernetes_endpoint_ready\":\"true\"}}, Labels:model.LabelSet{\"__meta_kubernetes_namespace\":\"kube-system\", \"__meta_kubernetes_endpoints_name\":\"etcd\", \"__meta_kubernetes_service_name\":\"etcd\", \"__meta_kubernetes_service_label_k8s_app\":\"etcd\"}, Source:\"endpoints/kube-system/etcd\"}"
Is there something I’m doing wrong? Or is etcd monitoring somehow broken in v0.22.2?
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 20 (20 by maintainers)
#1756 has also now been merged, so all https configurations should now be available, let me know how it goes and how we should continue to get the documentation PR merged! 🙂 Once again thanks a lot for your dedication!
@jolson490 the v0.23.0 prometheus-operator container itself is ready, but the kube-prometheus stack needs https://github.com/coreos/prometheus-operator/pull/1762 to land before the jsonnet is ready. It will be merged soon!