prometheus-operator: Kube-prometheus etcd ServiceMonitor not working

What did you do? Deploy kube-prometheus in with the kube-etcd exporter (Service and ServiceMonitor)

What did you expect to see? Green alerts in prometheus dashboard for kube-etc related metrics.

What did you see instead? Under which circumstances? InsufficientMembers is not passing. up{job="kube-etcd"} query on Prometheus returns 0 for all kube-etcd servers.

Prometheus Operator version:

quay.io/coreos/prometheus-operator@sha256:88cd66e273db8f96cfcce2eec03c04b04f0821f3f8d440396af2b5510667472d
Kubernetes version information:

Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.5", GitCommit:"32ac1c9073b132b8ba18aa830f46b77dcceb0723", GitTreeState:"clean", BuildDate:"2018-06-21T11:46:00Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.6", GitCommit:"a21fdbd78dde8f5447f5f6c331f7eb6f80bd684e", GitTreeState:"clean", BuildDate:"2018-07-26T10:04:08Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}

Kubernetes cluster kind:

Kops cluster, Version 1.10.0 (git-8b52ea6d1)
Manifests:

Service Monitor (created by kube-prometheus Chart)

Name:         kube-prometheus-exporter-kube-etcd
Namespace:    monitoring
Labels:       app=exporter-kube-etcd
              chart=exporter-kube-etcd-0.1.15
              component=kube-etcd
              heritage=Tiller
              prometheus=kube-prometheus
              release=kube-prometheus
Annotations:  <none>
API Version:  monitoring.coreos.com/v1
Kind:         ServiceMonitor
Metadata:
  Cluster Name:        
  Creation Timestamp:  2018-12-01T18:19:18Z
  Generation:          1
  Resource Version:    25504678
  Self Link:           /apis/monitoring.coreos.com/v1/namespaces/monitoring/servicemonitors/kube-prometheus-exporter-kube-etcd
  UID:                 9e4eaaed-f595-11e8-b63f-027656b86196
Spec:
  Endpoints:
    Bearer Token File:  /var/run/secrets/kubernetes.io/serviceaccount/token
    Interval:           15s
    Port:               http-metrics
  Job Label:            component
  Namespace Selector:
    Match Names:
      kube-system
  Selector:
    Match Labels:
      App:        exporter-kube-etcd
      Component:  kube-etcd
Events:           <none>

Service (created by kube-prometheus Chart)

Name:              kube-prometheus-exporter-kube-etcd
Namespace:         kube-system
Labels:            app=exporter-kube-etcd
                   chart=exporter-kube-etcd-0.1.15
                   component=kube-etcd
                   heritage=Tiller
                   release=kube-prometheus
Annotations:       <none>
Selector:          k8s-app=etcd-server
Type:              ClusterIP
IP:                None
Port:              http-metrics  4001/TCP
TargetPort:        4001/TCP
Endpoints:         <redacted>:4001,<redacted>:4001,<redacted>:4001
Session Affinity:  None
Events:            <none>

Etcd server pod labels (Created by Kops)

Labels:       k8s-app=etcd-server

Prometheus Operator Logs: Not sure if these are relevant. Restarted the Pod to get logs different than “sync alertmanager” and “sync prometheus”, which are the only logs after a few hours of runtime.

evel=info ts=2018-12-09T06:41:48.028636972Z caller=operator.go:292 component=prometheusoperator msg="connection established" cluster-version=v1.10.6
level=info ts=2018-12-09T06:41:48.031017419Z caller=operator.go:172 component=alertmanageroperator msg="connection established" cluster-version=v1.10.6
level=info ts=2018-12-09T06:41:48.16309026Z caller=operator.go:560 component=alertmanageroperator msg="CRD updated" crd=Alertmanager
level=info ts=2018-12-09T06:41:48.165255387Z caller=operator.go:1132 component=prometheusoperator msg="CRD updated" crd=Prometheus
level=info ts=2018-12-09T06:41:48.174046849Z caller=operator.go:1132 component=prometheusoperator msg="CRD updated" crd=ServiceMonitor
level=info ts=2018-12-09T06:41:48.183440145Z caller=operator.go:1132 component=prometheusoperator msg="CRD updated" crd=PrometheusRule
level=info ts=2018-12-09T06:41:51.170636266Z caller=operator.go:186 component=alertmanageroperator msg="CRD API endpoints ready"
level=info ts=2018-12-09T06:41:51.173120069Z caller=operator.go:396 component=alertmanageroperator msg="sync alertmanager" key=monitoring/kube-prometheus
E1209 06:41:51.204133       1 operator.go:272] Sync "monitoring/kube-prometheus" failed: creating statefulset failed: statefulsets.apps "alertmanager-kube-prometheus" already exists
level=info ts=2018-12-09T06:41:51.204934663Z caller=operator.go:396 component=alertmanageroperator msg="sync alertmanager" key=monitoring/kube-prometheus
level=info ts=2018-12-09T06:41:51.220686111Z caller=operator.go:396 component=alertmanageroperator msg="sync alertmanager" key=monitoring/kube-prometheus
level=info ts=2018-12-09T06:41:57.208163155Z caller=operator.go:306 component=prometheusoperator msg="CRD API endpoints ready"
level=info ts=2018-12-09T06:41:57.216490069Z caller=operator.go:731 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-12-09T06:41:57.457037411Z caller=operator.go:731 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-12-09T06:41:57.682665364Z caller=operator.go:731 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-12-09T06:41:57.73801164Z caller=operator.go:731 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus

About this issue

Original URL
State: closed
Created 6 years ago
Reactions: 3
Comments: 27 (2 by maintainers)

Most upvoted comments

Ok I managed to get etcd scrape working with etcd-manager and kops 1.12

You can create the Secret needed for the operator to start properly like this:

podname=$(kubectl get pods -o=jsonpath='{.items[0].metadata.name}' -l k8s-app=kube-apiserver -n kube-system)

kubectl create secret generic etcd-certs -nmonitoring \
  --from-literal=ca.crt="$(kubectl exec $podname -nkube-system -- cat /etc/kubernetes/pki/kube-apiserver/etcd-ca.crt)" \
  --from-literal=client.crt="$(kubectl exec $podname -nkube-system -- cat /etc/kubernetes/pki/kube-apiserver/etcd-client.crt)" \
--from-literal=client.key="$(kubectl exec $podname -nkube-system -- cat /etc/kubernetes/pki/kube-apiserver/etcd-client.key)"

Once you have the Secret you can install the operator with the following values.xml snipped (only etcd relevant part included):

kubeEtcd:
  service:
    port: 4001
    targetPort: 4001
    selector: 
     "k8s-app": "etcd-manager-main"
  serviceMonitor:
    scheme: https
    insecureSkipVerify: true
    caFile:   /etc/prometheus/secrets/etcd-certs/ca.crt
    certFile: /etc/prometheus/secrets/etcd-certs/client.crt
    keyFile:  /etc/prometheus/secrets/etcd-certs/client.key

Last but not least, you have to update the service prometheus-prometheus-oper-kube-etcd which the operator creates to monitor etcd removing the selector component: etcd so that its spec looks this way:

spec:
  clusterIP: None
  ports:
  - name: http-metrics
    port: 4001
    protocol: TCP
    targetPort: 4001
  selector:
    k8s-app: etcd-manager-main
  sessionAffinity: None
  type: ClusterIP

This ends up with etcd scraping working for me

+27

irizzant on Jun 17, 2019

Thank you @irizzant, worked for me too on kops 1.12 and with prometheus-operator-5.12.4 helm chart. I would like to extend it a bit. We also need to add the generated etcd-certs secret to the prometheus-operators values.yaml and we can remove in advance the component: etcd from the generated service by adding component: null

prometheus:
  prometheusSpec:
    secrets:
    - etcd-certs

kubeEtcd:
  service:
    port: 4001
    targetPort: 4001
    selector:
     component: null
     k8s-app: "etcd-manager-main"
  serviceMonitor:
    scheme: https
    insecureSkipVerify: true
    caFile:   /etc/prometheus/secrets/etcd-certs/ca.crt
    certFile: /etc/prometheus/secrets/etcd-certs/client.crt
    keyFile:  /etc/prometheus/secrets/etcd-certs/client.key

+10

tkozma on Jun 24, 2019

@stale Still interested in a better solution for this.

mfrister on Oct 30, 2019

@mediaimprove @jesse-welch kops doesn’t allow nodes to access the masters on port 4001 (etcd client port) for security reasons, see your EC2 security group for the masters.

mfrister on Jul 24, 2019