prometheus-operator: Kubelet end points not updated - still scraping the old servers

What did you do? For some reasons, 3 days ago, our 3 masters have been replaced by AWS. All good, everything still works on Kubernetes. We got alerts for “K8SKubeletDown” as expected.

I tried restarting prometheus operator, and then deleting and recreating the kubelet servicemonitor, but nothing changed.

What did you expect to see? The new masters scraped and added automatically by prometheus operator kubelet servicemonitor. I don’t know what to expect was going to happen to the old ones. I guess at a certain point they should be removed?

What did you see instead? Under which circumstances? I still see the old master nodes scraped and seen as “down”, we keep of course getting the alerts. And I don’t see the new masters.

Environment Kubernetes 1.7.2 running on AWS and set up using kops.

Kubernetes version information:

Client Version: version.Info{Major:“1”, Minor:“7”, GitVersion:“v1.7.1”, GitCommit:“1dc5c66f5dd61da08412a74221ecc79208c2165b”, GitTreeState:“clean”, BuildDate:“2017-07-14T02:00:46Z”, GoVersion:“go1.8.3”, Compiler:“gc”, Platform:“linux/amd64”} Server Version: version.Info{Major:“1”, Minor:“7”, GitVersion:“v1.7.2”, GitCommit:“922a86cfcd65915a9b2f69f3f193b8907d741d9c”, GitTreeState:“clean”, BuildDate:“2017-07-21T08:08:00Z”, GoVersion:“go1.8.3”, Compiler:“gc”, Platform:“linux/amd64”}

Kubernetes cluster kind:

kops
Manifests:

Default setup of prometheus operator 0.13.0. Everything was working properly before the masters replacement.

Prometheus Operator Logs: I think this is the error

ts=2017-09-26T11:15:20Z caller=operator.go:306 component=prometheusoperator msg="syncing nodes into Endpoints object failed" err="synchronizing kubelet endpoints object failed: updating kubelet endpoints object failed: Endpoints \"kubelet\" is invalid: [subsets[0].addresses[0].nodeName: Forbidden: Cannot change NodeName for 172.25.111.117 to ip-172-25-96-238.eu-west-1.compute.internal, subsets[0].addresses[1].nodeName: Forbidden: Cannot change NodeName for 172.25.119.81 to ip-172-25-96-238.eu-west-1.compute.internal, subsets[0].addresses[2].nodeName: Forbidden: Cannot change NodeName for 172.25.121.60 to ip-172-25-96-238.eu-west-1.compute.internal, subsets[0].addresses[4].nodeName: Forbidden: Cannot change NodeName for 172.25.55.18 to ip-172-25-96-238.eu-west-1.compute.internal, subsets[0].addresses[5].nodeName: Forbidden: Cannot change NodeName for 172.25.65.86 to ip-172-25-96-238.eu-west-1.compute.internal, subsets[0].addresses[6].nodeName: Forbidden: Cannot change NodeName for 172.25.76.1 to ip-172-25-96-238.eu-west-1.compute.internal]"

About this issue

Original URL
State: closed
Created 7 years ago
Reactions: 2
Comments: 32 (18 by maintainers)

Commits related to this issue

Fixing #644 - Avoid resetting nodename - Fixed error in a log line — committed to faceit/prometheus-operator by emas80 7 years ago
Merge pull request #651 from faceit/fixes-644 Fixing #644 — committed to prometheus-operator/prometheus-operator by brancz 7 years ago
Fixing #644 - Avoid resetting nodename - Fixed error in a log line — committed to brancz/prometheus-operator by emas80 7 years ago

Most upvoted comments

We were facing this issue. We noticed that we had multiple kubelet endpoints from previous installations.

kubectl get endpoints --namespace=kube-system

We deleted the old ones and everything went fine.

rayitopy on Apr 19, 2021

@brancz This issue still persists, we are using 2.15.2. let me know if anything we can do to enable this 3 minutes polling.

tarunptala on Jul 28, 2020

We’re actually just finishing up some work, and the next release will be out within the next few days.

brancz on Oct 16, 2017

Yes, a release should be coming up soon! 🙂

brancz on Oct 5, 2017

@brancz , I have used the latest prometheus-operator image quay.io/prometheus-operator/prometheus-operator:v0.44.0. and the issue still persists, I can see old node getting scraped for kubelet and showing it as arget down Screenshot 2020-12-17 at 12 18 40 PM

These ae the logs of prometheus operator pod ➜ ~ kubectl logs -f kube-prometheus-stack-operator-7f977f6b86-f7znc level=info ts=2020-12-17T06:40:51.344872356Z caller=main.go:235 msg="Starting Prometheus Operator" version="(version=0.44.0, branch=refs/tags/pkg/apis/monitoring/v0.44.0, revision=35c9101c332b9371172e1d6cc5a57c065f14eddf)" level=info ts=2020-12-17T06:40:51.344915254Z caller=main.go:236 build_context="(go=go1.14.12, user=paulfantom, date=20201202-15:42:46)" level=warn ts=2020-12-17T06:40:51.344923645Z caller=main.go:239 msg="'--config-reloader-image' flag is ignored, only '--prometheus-config-reloader' is used" config-reloader-image=docker.io/jimmidyson/configmap-reload:v0.4.0 prometheus-config-reloader=quay.io/prometheus-operator/prometheus-config-reloader:v0.44.0 ts=2020-12-17T06:40:51.349291525Z caller=main.go:107 msg="Starting insecure server on [::]:8080" level=info ts=2020-12-17T06:40:51.357157809Z caller=operator.go:436 component=alertmanageroperator msg="connection established" cluster-version=v1.18.8 level=info ts=2020-12-17T06:40:51.357189124Z caller=operator.go:445 component=alertmanageroperator msg="CRD API endpoints ready" level=info ts=2020-12-17T06:40:51.357236946Z caller=operator.go:300 component=thanosoperator msg="connection established" cluster-version=v1.18.8 level=info ts=2020-12-17T06:40:51.357258584Z caller=operator.go:309 component=thanosoperator msg="CRD API endpoints ready" level=info ts=2020-12-17T06:40:51.358168723Z caller=operator.go:420 component=prometheusoperator msg="connection established" cluster-version=v1.18.8 level=info ts=2020-12-17T06:40:51.35822798Z caller=operator.go:429 component=prometheusoperator msg="CRD API endpoints ready" level=info ts=2020-12-17T06:40:52.5434552Z caller=operator.go:261 component=thanosoperator msg="successfully synced all caches" level=info ts=2020-12-17T06:40:53.143358973Z caller=operator.go:277 component=alertmanageroperator msg="successfully synced all caches" level=warn ts=2020-12-17T06:40:53.14344985Z caller=operator.go:1345 component=alertmanageroperator msg="alertmanager key=monitoring-stack/kube-prometheus-stack-alertmanager, field spec.baseImage is deprecated, 'spec.image' field should be used instead" level=info ts=2020-12-17T06:40:53.143508338Z caller=operator.go:661 component=alertmanageroperator msg="sync alertmanager" key=monitoring-stack/kube-prometheus-stack-alertmanager level=info ts=2020-12-17T06:40:53.259671523Z caller=operator.go:359 component=prometheusoperator msg="successfully synced all caches" level=warn ts=2020-12-17T06:40:53.259754378Z caller=operator.go:1276 component=prometheusoperator msg="prometheus key=monitoring-stack/kube-prometheus-stack-prometheus, field spec.baseImage is deprecated, 'spec.image' field should be used instead" level=info ts=2020-12-17T06:40:53.259813417Z caller=operator.go:1163 component=prometheusoperator msg="sync prometheus" key=monitoring-stack/kube-prometheus-stack-prometheus level=info ts=2020-12-17T06:40:53.344039927Z caller=operator.go:661 component=alertmanageroperator msg="sync alertmanager" key=monitoring-stack/kube-prometheus-stack-alertmanager level=info ts=2020-12-17T06:40:53.575422112Z caller=operator.go:661 component=alertmanageroperator msg="sync alertmanager" key=monitoring-stack/kube-prometheus-stack-alertmanager

Namrata3991 on Dec 17, 2020

I’m currently at ossummit eu, I won’t get to it anytime this week, please open an issue so I’ll remember to backport the fix.

brancz on Oct 24, 2017

I found the solution (not related to prometheus-operator): kubectl get endpoints kubelet -o yaml > kubelet.yaml Manually remove the dead nodes from kubelet.yaml kubectl replace -f kubelet.yaml

mbugeia on Sep 26, 2017