prometheus-operator: cannot scrape etcd3 https with v0.22.2

(I have a K8s cluster on AWS using http for etcd3, and running v0.19.0 of kube-prometheus I’m able to (per the v19 documentation for monitoring etcd) get the etcd nodes in the cluster to show up on https://<my-prometheus>/targets; I customized the ServiceMonitor to specify scheme: http.)

But I’m stepping up to v0.22.2, in a cluster (named “sandbox-green”) where I have etcd3 accepting requests via https, and I’ve tried various tlsConfig (and I read through the comments on #898) but I can’t get any etcd targets to show up (nor do I see any instance of “etcd” on https://<my-prometheus>/config) - any suggestions would be greatly appreciated, and here’s details…

Here’s the nodes in my “sandbox-green” cluster:

  "master": [
      "192.168.213.199",
      "192.168.213.168",
      "192.168.213.132"
  ],
  "agent": [
      "192.168.213.152",
      "192.168.213.182",
      "192.168.213.178",
      "192.168.213.229",
      "192.168.213.219"
  ],
  "etcd": [
      "192.168.213.212",
      "192.168.213.171",
      "192.168.213.210"
  ]

When a master node is started up, here’s some of the options I pass to apiserver:

          - --bind-address=0.0.0.0
          - --etcd-servers=https://127.0.0.1:2379
          - --etcd-cafile=/etc/kubernetes/ssl/ca.pem
          - --etcd-certfile=/etc/kubernetes/ssl/etcd.pem
          - --etcd-keyfile=/etc/kubernetes/ssl/etcd-key.pem

The kube-prometheus pods:

  core@ip-192-168-213-199 ~ $ kubectl get pods -n monitoring -o wide
  NAME                                   READY     STATUS    RESTARTS   AGE       IP                NODE
  alertmanager-main-0                    2/2       Running   0          1h        10.2.142.145      ip-192-168-213-178.ec2.internal
  grafana-54d8db4bc4-h7zqk               1/1       Running   0          1h        10.2.142.143      ip-192-168-213-178.ec2.internal
  kube-state-metrics-74f8fd79f7-wvn86    4/4       Running   0          1h        10.2.240.203      ip-192-168-213-229.ec2.internal
  node-exporter-7s959                    2/2       Running   0          1h        192.168.213.132   ip-192-168-213-132.ec2.internal
  node-exporter-878l6                    2/2       Running   0          1h        192.168.213.152   ip-192-168-213-152.ec2.internal
  node-exporter-9h4hh                    2/2       Running   0          1h        192.168.213.182   ip-192-168-213-182.ec2.internal
  node-exporter-cbpw7                    2/2       Running   0          1h        192.168.213.178   ip-192-168-213-178.ec2.internal
  node-exporter-l45vz                    2/2       Running   0          1h        192.168.213.168   ip-192-168-213-168.ec2.internal
  node-exporter-qh278                    2/2       Running   0          1h        192.168.213.219   ip-192-168-213-219.ec2.internal
  node-exporter-tffcw                    2/2       Running   0          1h        192.168.213.199   ip-192-168-213-199.ec2.internal
  node-exporter-xk5rf                    2/2       Running   0          1h        192.168.213.229   ip-192-168-213-229.ec2.internal
  prometheus-k8s-0                       3/3       Running   1          34m       10.2.142.149      ip-192-168-213-178.ec2.internal
  prometheus-k8s-1                       3/3       Running   1          34m       10.2.240.204      ip-192-168-213-229.ec2.internal
  prometheus-operator-769f6f97cb-wvlm2   1/1       Running   0          1h        10.2.142.142      ip-192-168-213-178.ec2.internal

A snippet of the output from running openssl x509 -noout -text -in /etc/kubernetes/ssl/etcd.pem on agent node 192.168.213.178 (i.e. the agent that the prometheus-k8s-0 pod is running on); etcd3 doesn’t allow/support DNS, so there is no server name - thus SANs (Subject Alternative Names) are used:

            X509v3 Subject Alternative Name:
                DNS:localhost, IP Address:127.0.0.1, IP Address:192.168.213.171, IP Address:192.168.213.210, IP Address:192.168.213.212

Note that PR #1732 proposes some revisions to the commentary in kube-prometheus about monitoring etcd (see README.md & etcd.jsonnet).

Here’s the relevant objects (note that until the “TODO” comment mentioned here is resolved, in my own copy of vendor/kube-prometheus/kube-prometheus-static-etcd.libsonnet I’ve customized it to not specify a value for serverName but instead specify insecureSkipVerify: true,):

core@ip-192-168-213-199 ~ $ kubectl -n kube-system describe service etcd
Name:              etcd
Namespace:         kube-system
Labels:            k8s-app=etcd
Annotations:       <none>
Selector:          <none>
Type:              ClusterIP
IP:                None
Port:              metrics  2379/TCP
TargetPort:        2379/TCP
Endpoints:         192.168.213.171:2379,192.168.213.210:2379,192.168.213.212:2379
Session Affinity:  None
Events:            <none>
core@ip-192-168-213-199 ~ $ kubectl -n kube-system describe endpoints etcd
Name:         etcd
Namespace:    kube-system
Labels:       k8s-app=etcd
Annotations:  <none>
Subsets:
  Addresses:          192.168.213.171,192.168.213.210,192.168.213.212
  NotReadyAddresses:  <none>
  Ports:
    Name     Port  Protocol
    ----     ----  --------
    metrics  2379  TCP

Events:  <none>
core@ip-192-168-213-199 ~ $ kubectl -n kube-system describe servicemonitor etcd
Name:         etcd
Namespace:    kube-system
Labels:       k8s-app=etcd
Annotations:  <none>
API Version:  monitoring.coreos.com/v1
Kind:         ServiceMonitor
Metadata:
  Cluster Name:
  Creation Timestamp:  2018-08-03T12:42:08Z
  Generation:          1
  Resource Version:    293799
  Self Link:           /apis/monitoring.coreos.com/v1/namespaces/kube-system/servicemonitors/etcd
  UID:                 a2a0be20-971a-11e8-a8e9-0aaf6efae5ec
Spec:
  Endpoints:
    Interval:  30s
    Port:      metrics
    Scheme:    https
    Tls Config:
      Ca File:               /etc/prometheus/secrets/kube-etcd-client-certs/etcd-client-ca.crt
      Cert File:             /etc/prometheus/secrets/kube-etcd-client-certs/etcd-client.crt
      Insecure Skip Verify:  true
      Key File:              /etc/prometheus/secrets/kube-etcd-client-certs/etcd-client.key
  Job Label:                 k8s-app
  Selector:
    Match Labels:
      K 8 S - App:  etcd
Events:             <none>
core@ip-192-168-213-199 ~ $ kubectl -n monitoring describe secret kube-etcd-client-certs
Name:         kube-etcd-client-certs
Namespace:    monitoring
Labels:       <none>
Annotations:  <none>

Type:  Opaque

Data
====
etcd-client-ca.crt:  1069 bytes
etcd-client.crt:     1252 bytes
etcd-client.key:     1703 bytes

In the *.jsonnet file that I pass to build.sh, for the prometheus spec I specified logLevel: "debug",. But in kubectl -n monitoring logs prometheus-k8s-0 I don’t see any errors - and the only thing relevant to etcd is lines such as the following: level=debug ts=2018-08-03T12:42:40.098677772Z caller=kubernetes.go:385 component="discovery manager scrape" discovery=k8s role=endpoint msg="kubernetes discovery update" role=endpoints tg="&targetgroup.Group{Targets:[]model.LabelSet{model.LabelSet{\"__address__\":\"192.168.213.171:2379\", \"__meta_kubernetes_endpoint_port_name\":\"metrics\", \"__meta_kubernetes_endpoint_port_protocol\":\"TCP\", \"__meta_kubernetes_endpoint_ready\":\"true\"}, model.LabelSet{\"__address__\":\"192.168.213.210:2379\", \"__meta_kubernetes_endpoint_port_name\":\"metrics\", \"__meta_kubernetes_endpoint_port_protocol\":\"TCP\", \"__meta_kubernetes_endpoint_ready\":\"true\"}, model.LabelSet{\"__address__\":\"192.168.213.212:2379\", \"__meta_kubernetes_endpoint_port_name\":\"metrics\", \"__meta_kubernetes_endpoint_port_protocol\":\"TCP\", \"__meta_kubernetes_endpoint_ready\":\"true\"}}, Labels:model.LabelSet{\"__meta_kubernetes_namespace\":\"kube-system\", \"__meta_kubernetes_endpoints_name\":\"etcd\", \"__meta_kubernetes_service_name\":\"etcd\", \"__meta_kubernetes_service_label_k8s_app\":\"etcd\"}, Source:\"endpoints/kube-system/etcd\"}"

Is there something I’m doing wrong? Or is etcd monitoring somehow broken in v0.22.2?

About this issue

Original URL
State: closed
Created 6 years ago
Comments: 20 (20 by maintainers)

Most upvoted comments

#1756 has also now been merged, so all https configurations should now be available, let me know how it goes and how we should continue to get the documentation PR merged! 🙂 Once again thanks a lot for your dedication!

brancz on Aug 14, 2018

@jolson490 the v0.23.0 prometheus-operator container itself is ready, but the kube-prometheus stack needs https://github.com/coreos/prometheus-operator/pull/1762 to land before the jsonnet is ready. It will be merged soon!

squat on Aug 7, 2018