prometheus: Old Kubernetes SD endpoints are still "discovered" and scraped despite no longer existing

What did you see instead? Under which circumstances?

We have an alert that can fire when a target cannot be scraped. This began firing, and upon inspection, the target did not actually exist. The target was for a Kubernetes pod that was replaced. It no longer appeared in the Kubernetes apiserver, and its IP address was not in the relevant Services corresponding Endpoints resources.

We run redundant, identical Prometheuses, and this only happened in one of them.

To better understand this state, we manually deleted another Pod of the same Deployment. The affected Prometheus successfully removed the old Pod from its targets and added the new Pod. The original false Pod, however, was still in the target list.

Additionally, it’s worth noting that this group of Pods are discovered a few times. We accidentally had two Services pointing to these Pods’ metrics endpoint, and both were discovered using a single ServiceMonitor resource from the prometheus-operator. Only one of these Services shows the problem (four targets, including the false one). The other has the appropriate targets (three targets). We also probe these pods using the blackbox-exporter, which shows the exact same issues (seven targets, with three pairs of accidental duplicates, and one false target).

I have absolutely no idea how to reproduce this.

What did you do?

Nothing. The bug occurred while we were hands off.

What did you expect to see?

When the old Pod was removed and replaced, the Prometheus instance should have updated its targets accordingly by removing the old Pod.

Environment

System information:

Linux 5.4.155-flatcar x86_64

Prometheus version:

prometheus, version 2.32.1 (branch: HEAD, revision: 41f1a8125e664985dd30674e5bdf6b683eff5d32)
  build user:       root@54b6dbd48b97
  build date:       20211217-22:08:06
  go version:       go1.17.5
  platform:         linux/amd64

Prometheus configuration file:

Here is all the job configuration that targets/probes the relevant Pods/Services.

- job_name: serviceMonitor/kube-dns/core-dns/0
  honor_timestamps: true
  scrape_interval: 30s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: http
  follow_redirects: true
  relabel_configs:
  - source_labels: [job]
    separator: ;
    regex: (.*)
    target_label: __tmp_prometheus_job_name
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_label_k8s_app]
    separator: ;
    regex: kube-dns
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_endpoint_port_name]
    separator: ;
    regex: metrics
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
    separator: ;
    regex: Node;(.*)
    target_label: node
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
    separator: ;
    regex: Pod;(.*)
    target_label: pod
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_namespace]
    separator: ;
    regex: (.*)
    target_label: namespace
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: service
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_name]
    separator: ;
    regex: (.*)
    target_label: pod
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_container_name]
    separator: ;
    regex: (.*)
    target_label: container
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: job
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_service_label_k8s_app]
    separator: ;
    regex: (.+)
    target_label: job
    replacement: ${1}
    action: replace
  - separator: ;
    regex: (.*)
    target_label: endpoint
    replacement: metrics
    action: replace
  - source_labels: [__address__]
    separator: ;
    regex: (.*)
    modulus: 1
    target_label: __tmp_hash
    replacement: $1
    action: hashmod
  - source_labels: [__tmp_hash]
    separator: ;
    regex: "0"
    replacement: $1
    action: keep
  kubernetes_sd_configs:
  - role: endpoints
    kubeconfig_file: ""
    follow_redirects: true
    namespaces:
      names:
      - kube-dns
- job_name: serviceMonitor/kube-dns/dns-server-health/0
  honor_timestamps: true
  params:
    module:
    - dns_kubernetes_svc
  scrape_interval: 30s
  scrape_timeout: 10s
  metrics_path: /probe
  scheme: http
  follow_redirects: true
  relabel_configs:
  - source_labels: [job]
    separator: ;
    regex: (.*)
    target_label: __tmp_prometheus_job_name
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_label_k8s_app]
    separator: ;
    regex: kube-dns
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_endpoint_port_name]
    separator: ;
    regex: dns
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
    separator: ;
    regex: Node;(.*)
    target_label: node
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
    separator: ;
    regex: Pod;(.*)
    target_label: pod
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_namespace]
    separator: ;
    regex: (.*)
    target_label: namespace
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: service
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_name]
    separator: ;
    regex: (.*)
    target_label: pod
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_container_name]
    separator: ;
    regex: (.*)
    target_label: container
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: job
    replacement: ${1}
    action: replace
  - separator: ;
    regex: (.*)
    target_label: endpoint
    replacement: dns
    action: replace
  - source_labels: [__meta_kubernetes_pod_ip]
    separator: ;
    regex: (.*)
    target_label: __param_target
    replacement: $1
    action: replace
  - source_labels: [__param_target]
    separator: ;
    regex: (.*)
    target_label: instance
    replacement: $1
    action: replace
  - separator: ;
    regex: (.*)
    target_label: __address__
    replacement: blackbox-exporter.prometheus:9115
    action: replace
  - separator: ;
    regex: (.*)
    target_label: job
    replacement: dns-kubernetes-svc-blackbox
    action: replace
  - source_labels: [__address__]
    separator: ;
    regex: (.*)
    modulus: 1
    target_label: __tmp_hash
    replacement: $1
    action: hashmod
  - source_labels: [__tmp_hash]
    separator: ;
    regex: "0"
    replacement: $1
    action: keep
  kubernetes_sd_configs:
  - role: endpoints
    kubeconfig_file: ""
    follow_redirects: true
    namespaces:
      names:
      - kube-dns
- job_name: serviceMonitor/kube-dns/dns-server-health/1
  honor_timestamps: true
  params:
    module:
    - dns_route53_record
  scrape_interval: 30s
  scrape_timeout: 10s
  metrics_path: /probe
  scheme: http
  follow_redirects: true
  relabel_configs:
  - source_labels: [job]
    separator: ;
    regex: (.*)
    target_label: __tmp_prometheus_job_name
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_label_k8s_app]
    separator: ;
    regex: kube-dns
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_endpoint_port_name]
    separator: ;
    regex: dns
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
    separator: ;
    regex: Node;(.*)
    target_label: node
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
    separator: ;
    regex: Pod;(.*)
    target_label: pod
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_namespace]
    separator: ;
    regex: (.*)
    target_label: namespace
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: service
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_name]
    separator: ;
    regex: (.*)
    target_label: pod
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_container_name]
    separator: ;
    regex: (.*)
    target_label: container
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: job
    replacement: ${1}
    action: replace
  - separator: ;
    regex: (.*)
    target_label: endpoint
    replacement: dns
    action: replace
  - source_labels: [__meta_kubernetes_pod_ip]
    separator: ;
    regex: (.*)
    target_label: __param_target
    replacement: $1
    action: replace
  - source_labels: [__param_target]
    separator: ;
    regex: (.*)
    target_label: instance
    replacement: $1
    action: replace
  - separator: ;
    regex: (.*)
    target_label: __address__
    replacement: blackbox-exporter.prometheus:9115
    action: replace
  - separator: ;
    regex: (.*)
    target_label: job
    replacement: dns-route53-record-blackbox
    action: replace
  - source_labels: [__address__]
    separator: ;
    regex: (.*)
    modulus: 1
    target_label: __tmp_hash
    replacement: $1
    action: hashmod
  - source_labels: [__tmp_hash]
    separator: ;
    regex: "0"
    replacement: $1
    action: keep
  kubernetes_sd_configs:
  - role: endpoints
    kubeconfig_file: ""
    follow_redirects: true
    namespaces:
      names:
      - kube-dns

Logs:

We did not find any interesting logs, but here are the logs surrounding the start of the bug:

ts=2022-02-03T09:00:09.464Z caller=compact.go:518 level=info component=tsdb msg="write block" mint=1643868000025 maxt=16
43875200000 ulid=01FTZCZPTXY02H4J1RRGFC133J duration=9.242232921s
ts=2022-02-03T09:00:09.935Z caller=head.go:812 level=info component=tsdb msg="Head GC completed" duration=468.087779ms
ts=2022-02-03T09:00:09.999Z caller=checkpoint.go:98 level=info component=tsdb msg="Creating checkpoint" from_segment=747
 to_segment=749 mint=1643875200000
ts=2022-02-03T09:00:17.901Z caller=head.go:981 level=info component=tsdb msg="WAL checkpoint complete" first=747 last=74
9 duration=7.902867284s
ts=2022-02-03T11:00:09.509Z caller=compact.go:518 level=info component=tsdb msg="write block" mint=1643875200145 maxt=16
43882400000 ulid=01FTZKVE2Y697HQ7XP60J31F04 duration=9.286627633s
ts=2022-02-03T11:00:10.009Z caller=head.go:812 level=info component=tsdb msg="Head GC completed" duration=497.591114ms
ts=2022-02-03T11:00:10.062Z caller=checkpoint.go:98 level=info component=tsdb msg="Creating checkpoint" from_segment=750
 to_segment=752 mint=1643882400000
ts=2022-02-03T11:00:18.421Z caller=head.go:981 level=info component=tsdb msg="WAL checkpoint complete" first=750 last=75
2 duration=8.359750705s
ts=2022-02-03T11:00:48.763Z caller=compact.go:459 level=info component=tsdb msg="compact blocks" count=3 mint=1643846400
080 maxt=1643868000000 ulid=01FTZKVZVPMF7D3E9543S6MH6H sources="[01FTYRCH2Z443XHBNB2FF41DNN 01FTYZ88AYXA5K7VAPCMXBT7BB 0
1FTZ63ZJYAK50QS1GHT7DCXJE]" duration=30.340985514s
ts=2022-02-03T11:00:48.782Z caller=db.go:1279 level=info component=tsdb msg="Deleting obsolete block" block=01FTYRCH2Z44
3XHBNB2FF41DNN
ts=2022-02-03T11:00:48.797Z caller=db.go:1279 level=info component=tsdb msg="Deleting obsolete block" block=01FTYZ88AYXA
5K7VAPCMXBT7BB
ts=2022-02-03T11:00:48.814Z caller=db.go:1279 level=info component=tsdb msg="Deleting obsolete block" block=01FTZ63ZJYAK
50QS1GHT7DCXJE
ts=2022-02-03T13:00:14.248Z caller=compact.go:518 level=info component=tsdb msg="write block" mint=1643882400041 maxt=16
43889600000 ulid=01FTZTQ5AYKX9TKQJ0STSR6VG1 duration=14.026654896s
ts=2022-02-03T13:00:14.737Z caller=head.go:812 level=info component=tsdb msg="Head GC completed" duration=485.328335ms
ts=2022-02-03T13:00:14.806Z caller=checkpoint.go:98 level=info component=tsdb msg="Creating checkpoint" from_segment=753
 to_segment=755 mint=1643889600000
ts=2022-02-03T13:00:19.997Z caller=head.go:981 level=info component=tsdb msg="WAL checkpoint complete" first=753 last=75
5 duration=5.190818906s
ts=2022-02-03T15:00:15.284Z caller=compact.go:518 level=info component=tsdb msg="write block" mint=1643889600052 maxt=16
43896800000 ulid=01FV01JWJYNX4MDBA0Q4RFZ5YR duration=15.061840545s
ts=2022-02-03T15:00:15.935Z caller=head.go:812 level=info component=tsdb msg="Head GC completed" duration=647.428299ms
ts=2022-02-03T15:00:15.998Z caller=checkpoint.go:98 level=info component=tsdb msg="Creating checkpoint" from_segment=756
 to_segment=758 mint=1643896800000
ts=2022-02-03T15:00:21.498Z caller=head.go:981 level=info component=tsdb msg="WAL checkpoint complete" first=756 last=75
8 duration=5.499604249s

About this issue

Original URL
State: open
Created 2 years ago
Reactions: 11
Comments: 24 (19 by maintainers)

Most upvoted comments

Experienced it too.

From my tests, to force “forgeting” outdated targets:

No need to restart prometheus, you can send a SIGHUP to refresh its config. Has the same effect, with no interruption.
reload API enpoint would probably have the same effect (https://prometheus.io/docs/prometheus/latest/management_api/#reload)

hervenicol on Dec 9, 2022

In the service discovery I can see the __meta_kubernetes_endpoint_ready="false" being set to false correctly.

This happens for us as well, but here __meta_kubernetes_endpoint_ready="true" is set to true incorrectly (the Pod is not there anymore).

nathan-vp on Jun 30, 2022

This doesn’t see to be configurable on GKE, so at least there we’ll probably going to have to filter out all unready endpoints 🤷‍♂️ https://issuetracker.google.com/issues/172663707?pli=1

metalmatze on Feb 21, 2022

To me it looks like notReadyAddresses are treated the same way as regular addresses, with the exception that the kubernetes_endpoint_ready label is set to false: https://github.com/prometheus/prometheus/blob/e239e3ee8b13b51b0f791a199813a14f74600a7e/discovery/kubernetes/endpoints.go#L304-L306 Not sure if this is intended or not though.

A solution could be to exclude pods in the Succeeded phase from being discovered, but this changes the assumption that all pods backed by an endpoint are expected to be running at all times.

fpetkovski on Feb 21, 2022

We’ve now discovered this on our PolarSignals GKE cluster too. I’ll take a closer look at it next week.

In the meantime I’ve deleted the old Pods with:

kubectl delete pod -n observability --field-selector=status.phase==Succeeded
kubectl delete pod -n observability --field-selector=status.phase==Failed

metalmatze on Feb 18, 2022

We’re tracking similar symptoms as well in multiple cases (https://bugzilla.redhat.com/show_bug.cgi?id=1943860). I agree with @fpetkovski, the k8s SD and Informer side looks unsuspicious. The cases we have seen seem to coincide with either heavy load on the apiserver (i.e. request throttling is in effect) or temporary outage/churn of in the apiserver (due to say a node shutting down). Sometimes both those parameters coincide with the bug result here. The bugzilla link also further links to other bugzillas and an open issue in kube-state-metrics: https://github.com/kubernetes/kube-state-metrics/issues/1569

jan--f on Feb 10, 2022

What is the Kubernetes version involved here?

cc @simonpasquier @fpetkovski @PhilipGough

brancz on Feb 6, 2022