prometheus-operator: Kube-prometheus etcd ServiceMonitor not working

What did you do? Deploy kube-prometheus in with the kube-etcd exporter (Service and ServiceMonitor)

What did you expect to see? Green alerts in prometheus dashboard for kube-etc related metrics.

What did you see instead? Under which circumstances? InsufficientMembers is not passing. up{job="kube-etcd"} query on Prometheus returns 0 for all kube-etcd servers.

  • Prometheus Operator version:

    quay.io/coreos/prometheus-operator@sha256:88cd66e273db8f96cfcce2eec03c04b04f0821f3f8d440396af2b5510667472d

  • Kubernetes version information:

Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.5", GitCommit:"32ac1c9073b132b8ba18aa830f46b77dcceb0723", GitTreeState:"clean", BuildDate:"2018-06-21T11:46:00Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.6", GitCommit:"a21fdbd78dde8f5447f5f6c331f7eb6f80bd684e", GitTreeState:"clean", BuildDate:"2018-07-26T10:04:08Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
  • Kubernetes cluster kind:

    Kops cluster, Version 1.10.0 (git-8b52ea6d1)

  • Manifests:

Service Monitor (created by kube-prometheus Chart)

Name:         kube-prometheus-exporter-kube-etcd
Namespace:    monitoring
Labels:       app=exporter-kube-etcd
              chart=exporter-kube-etcd-0.1.15
              component=kube-etcd
              heritage=Tiller
              prometheus=kube-prometheus
              release=kube-prometheus
Annotations:  <none>
API Version:  monitoring.coreos.com/v1
Kind:         ServiceMonitor
Metadata:
  Cluster Name:        
  Creation Timestamp:  2018-12-01T18:19:18Z
  Generation:          1
  Resource Version:    25504678
  Self Link:           /apis/monitoring.coreos.com/v1/namespaces/monitoring/servicemonitors/kube-prometheus-exporter-kube-etcd
  UID:                 9e4eaaed-f595-11e8-b63f-027656b86196
Spec:
  Endpoints:
    Bearer Token File:  /var/run/secrets/kubernetes.io/serviceaccount/token
    Interval:           15s
    Port:               http-metrics
  Job Label:            component
  Namespace Selector:
    Match Names:
      kube-system
  Selector:
    Match Labels:
      App:        exporter-kube-etcd
      Component:  kube-etcd
Events:           <none>

Service (created by kube-prometheus Chart)

Name:              kube-prometheus-exporter-kube-etcd
Namespace:         kube-system
Labels:            app=exporter-kube-etcd
                   chart=exporter-kube-etcd-0.1.15
                   component=kube-etcd
                   heritage=Tiller
                   release=kube-prometheus
Annotations:       <none>
Selector:          k8s-app=etcd-server
Type:              ClusterIP
IP:                None
Port:              http-metrics  4001/TCP
TargetPort:        4001/TCP
Endpoints:         <redacted>:4001,<redacted>:4001,<redacted>:4001
Session Affinity:  None
Events:            <none>

Etcd server pod labels (Created by Kops)

Labels:       k8s-app=etcd-server
  • Prometheus Operator Logs: Not sure if these are relevant. Restarted the Pod to get logs different than “sync alertmanager” and “sync prometheus”, which are the only logs after a few hours of runtime.
evel=info ts=2018-12-09T06:41:48.028636972Z caller=operator.go:292 component=prometheusoperator msg="connection established" cluster-version=v1.10.6
level=info ts=2018-12-09T06:41:48.031017419Z caller=operator.go:172 component=alertmanageroperator msg="connection established" cluster-version=v1.10.6
level=info ts=2018-12-09T06:41:48.16309026Z caller=operator.go:560 component=alertmanageroperator msg="CRD updated" crd=Alertmanager
level=info ts=2018-12-09T06:41:48.165255387Z caller=operator.go:1132 component=prometheusoperator msg="CRD updated" crd=Prometheus
level=info ts=2018-12-09T06:41:48.174046849Z caller=operator.go:1132 component=prometheusoperator msg="CRD updated" crd=ServiceMonitor
level=info ts=2018-12-09T06:41:48.183440145Z caller=operator.go:1132 component=prometheusoperator msg="CRD updated" crd=PrometheusRule
level=info ts=2018-12-09T06:41:51.170636266Z caller=operator.go:186 component=alertmanageroperator msg="CRD API endpoints ready"
level=info ts=2018-12-09T06:41:51.173120069Z caller=operator.go:396 component=alertmanageroperator msg="sync alertmanager" key=monitoring/kube-prometheus
E1209 06:41:51.204133       1 operator.go:272] Sync "monitoring/kube-prometheus" failed: creating statefulset failed: statefulsets.apps "alertmanager-kube-prometheus" already exists
level=info ts=2018-12-09T06:41:51.204934663Z caller=operator.go:396 component=alertmanageroperator msg="sync alertmanager" key=monitoring/kube-prometheus
level=info ts=2018-12-09T06:41:51.220686111Z caller=operator.go:396 component=alertmanageroperator msg="sync alertmanager" key=monitoring/kube-prometheus
level=info ts=2018-12-09T06:41:57.208163155Z caller=operator.go:306 component=prometheusoperator msg="CRD API endpoints ready"
level=info ts=2018-12-09T06:41:57.216490069Z caller=operator.go:731 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-12-09T06:41:57.457037411Z caller=operator.go:731 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-12-09T06:41:57.682665364Z caller=operator.go:731 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus
level=info ts=2018-12-09T06:41:57.73801164Z caller=operator.go:731 component=prometheusoperator msg="sync prometheus" key=monitoring/kube-prometheus

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 3
  • Comments: 27 (2 by maintainers)

Most upvoted comments

Ok I managed to get etcd scrape working with etcd-manager and kops 1.12

You can create the Secret needed for the operator to start properly like this:

podname=$(kubectl get pods -o=jsonpath='{.items[0].metadata.name}' -l k8s-app=kube-apiserver -n kube-system)

kubectl create secret generic etcd-certs -nmonitoring \
  --from-literal=ca.crt="$(kubectl exec $podname -nkube-system -- cat /etc/kubernetes/pki/kube-apiserver/etcd-ca.crt)" \
  --from-literal=client.crt="$(kubectl exec $podname -nkube-system -- cat /etc/kubernetes/pki/kube-apiserver/etcd-client.crt)" \
--from-literal=client.key="$(kubectl exec $podname -nkube-system -- cat /etc/kubernetes/pki/kube-apiserver/etcd-client.key)"

Once you have the Secret you can install the operator with the following values.xml snipped (only etcd relevant part included):

kubeEtcd:
  service:
    port: 4001
    targetPort: 4001
    selector: 
     "k8s-app": "etcd-manager-main"
  serviceMonitor:
    scheme: https
    insecureSkipVerify: true
    caFile:   /etc/prometheus/secrets/etcd-certs/ca.crt
    certFile: /etc/prometheus/secrets/etcd-certs/client.crt
    keyFile:  /etc/prometheus/secrets/etcd-certs/client.key

Last but not least, you have to update the service prometheus-prometheus-oper-kube-etcd which the operator creates to monitor etcd removing the selector component: etcd so that its spec looks this way:

spec:
  clusterIP: None
  ports:
  - name: http-metrics
    port: 4001
    protocol: TCP
    targetPort: 4001
  selector:
    k8s-app: etcd-manager-main
  sessionAffinity: None
  type: ClusterIP

This ends up with etcd scraping working for me

Thank you @irizzant, worked for me too on kops 1.12 and with prometheus-operator-5.12.4 helm chart. I would like to extend it a bit. We also need to add the generated etcd-certs secret to the prometheus-operators values.yaml and we can remove in advance the component: etcd from the generated service by adding component: null

prometheus:
  prometheusSpec:
    secrets:
    - etcd-certs

kubeEtcd:
  service:
    port: 4001
    targetPort: 4001
    selector:
     component: null
     k8s-app: "etcd-manager-main"
  serviceMonitor:
    scheme: https
    insecureSkipVerify: true
    caFile:   /etc/prometheus/secrets/etcd-certs/ca.crt
    certFile: /etc/prometheus/secrets/etcd-certs/client.crt
    keyFile:  /etc/prometheus/secrets/etcd-certs/client.key

@stale Still interested in a better solution for this.

@mediaimprove @jesse-welch kops doesn’t allow nodes to access the masters on port 4001 (etcd client port) for security reasons, see your EC2 security group for the masters.