prometheus-operator: Kubelet end points not updated - still scraping the old servers
What did you do? For some reasons, 3 days ago, our 3 masters have been replaced by AWS. All good, everything still works on Kubernetes. We got alerts for “K8SKubeletDown” as expected.
I tried restarting prometheus operator, and then deleting and recreating the kubelet servicemonitor, but nothing changed.
What did you expect to see? The new masters scraped and added automatically by prometheus operator kubelet servicemonitor. I don’t know what to expect was going to happen to the old ones. I guess at a certain point they should be removed?
What did you see instead? Under which circumstances? I still see the old master nodes scraped and seen as “down”, we keep of course getting the alerts. And I don’t see the new masters.
Environment Kubernetes 1.7.2 running on AWS and set up using kops.
- Kubernetes version information:
Client Version: version.Info{Major:“1”, Minor:“7”, GitVersion:“v1.7.1”, GitCommit:“1dc5c66f5dd61da08412a74221ecc79208c2165b”, GitTreeState:“clean”, BuildDate:“2017-07-14T02:00:46Z”, GoVersion:“go1.8.3”, Compiler:“gc”, Platform:“linux/amd64”} Server Version: version.Info{Major:“1”, Minor:“7”, GitVersion:“v1.7.2”, GitCommit:“922a86cfcd65915a9b2f69f3f193b8907d741d9c”, GitTreeState:“clean”, BuildDate:“2017-07-21T08:08:00Z”, GoVersion:“go1.8.3”, Compiler:“gc”, Platform:“linux/amd64”}
-
Kubernetes cluster kind:
kops
-
Manifests:
Default setup of prometheus operator 0.13.0. Everything was working properly before the masters replacement.
- Prometheus Operator Logs: I think this is the error
ts=2017-09-26T11:15:20Z caller=operator.go:306 component=prometheusoperator msg="syncing nodes into Endpoints object failed" err="synchronizing kubelet endpoints object failed: updating kubelet endpoints object failed: Endpoints \"kubelet\" is invalid: [subsets[0].addresses[0].nodeName: Forbidden: Cannot change NodeName for 172.25.111.117 to ip-172-25-96-238.eu-west-1.compute.internal, subsets[0].addresses[1].nodeName: Forbidden: Cannot change NodeName for 172.25.119.81 to ip-172-25-96-238.eu-west-1.compute.internal, subsets[0].addresses[2].nodeName: Forbidden: Cannot change NodeName for 172.25.121.60 to ip-172-25-96-238.eu-west-1.compute.internal, subsets[0].addresses[4].nodeName: Forbidden: Cannot change NodeName for 172.25.55.18 to ip-172-25-96-238.eu-west-1.compute.internal, subsets[0].addresses[5].nodeName: Forbidden: Cannot change NodeName for 172.25.65.86 to ip-172-25-96-238.eu-west-1.compute.internal, subsets[0].addresses[6].nodeName: Forbidden: Cannot change NodeName for 172.25.76.1 to ip-172-25-96-238.eu-west-1.compute.internal]"
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 2
- Comments: 32 (18 by maintainers)
Commits related to this issue
- Fixing #644 - Avoid resetting nodename - Fixed error in a log line — committed to faceit/prometheus-operator by emas80 7 years ago
- Merge pull request #651 from faceit/fixes-644 Fixing #644 — committed to prometheus-operator/prometheus-operator by brancz 7 years ago
- Fixing #644 - Avoid resetting nodename - Fixed error in a log line — committed to brancz/prometheus-operator by emas80 7 years ago
We were facing this issue. We noticed that we had multiple kubelet endpoints from previous installations.
We deleted the old ones and everything went fine.
@brancz This issue still persists, we are using
2.15.2
. let me know if anything we can do to enable this 3minutes
polling.We’re actually just finishing up some work, and the next release will be out within the next few days.
Yes, a release should be coming up soon! 🙂
@brancz , I have used the latest prometheus-operator image quay.io/prometheus-operator/prometheus-operator:v0.44.0. and the issue still persists, I can see old node getting scraped for kubelet and showing it as arget down
These ae the logs of prometheus operator pod
➜ ~ kubectl logs -f kube-prometheus-stack-operator-7f977f6b86-f7znc level=info ts=2020-12-17T06:40:51.344872356Z caller=main.go:235 msg="Starting Prometheus Operator" version="(version=0.44.0, branch=refs/tags/pkg/apis/monitoring/v0.44.0, revision=35c9101c332b9371172e1d6cc5a57c065f14eddf)" level=info ts=2020-12-17T06:40:51.344915254Z caller=main.go:236 build_context="(go=go1.14.12, user=paulfantom, date=20201202-15:42:46)" level=warn ts=2020-12-17T06:40:51.344923645Z caller=main.go:239 msg="'--config-reloader-image' flag is ignored, only '--prometheus-config-reloader' is used" config-reloader-image=docker.io/jimmidyson/configmap-reload:v0.4.0 prometheus-config-reloader=quay.io/prometheus-operator/prometheus-config-reloader:v0.44.0 ts=2020-12-17T06:40:51.349291525Z caller=main.go:107 msg="Starting insecure server on [::]:8080" level=info ts=2020-12-17T06:40:51.357157809Z caller=operator.go:436 component=alertmanageroperator msg="connection established" cluster-version=v1.18.8 level=info ts=2020-12-17T06:40:51.357189124Z caller=operator.go:445 component=alertmanageroperator msg="CRD API endpoints ready" level=info ts=2020-12-17T06:40:51.357236946Z caller=operator.go:300 component=thanosoperator msg="connection established" cluster-version=v1.18.8 level=info ts=2020-12-17T06:40:51.357258584Z caller=operator.go:309 component=thanosoperator msg="CRD API endpoints ready" level=info ts=2020-12-17T06:40:51.358168723Z caller=operator.go:420 component=prometheusoperator msg="connection established" cluster-version=v1.18.8 level=info ts=2020-12-17T06:40:51.35822798Z caller=operator.go:429 component=prometheusoperator msg="CRD API endpoints ready" level=info ts=2020-12-17T06:40:52.5434552Z caller=operator.go:261 component=thanosoperator msg="successfully synced all caches" level=info ts=2020-12-17T06:40:53.143358973Z caller=operator.go:277 component=alertmanageroperator msg="successfully synced all caches" level=warn ts=2020-12-17T06:40:53.14344985Z caller=operator.go:1345 component=alertmanageroperator msg="alertmanager key=monitoring-stack/kube-prometheus-stack-alertmanager, field spec.baseImage is deprecated, 'spec.image' field should be used instead" level=info ts=2020-12-17T06:40:53.143508338Z caller=operator.go:661 component=alertmanageroperator msg="sync alertmanager" key=monitoring-stack/kube-prometheus-stack-alertmanager level=info ts=2020-12-17T06:40:53.259671523Z caller=operator.go:359 component=prometheusoperator msg="successfully synced all caches" level=warn ts=2020-12-17T06:40:53.259754378Z caller=operator.go:1276 component=prometheusoperator msg="prometheus key=monitoring-stack/kube-prometheus-stack-prometheus, field spec.baseImage is deprecated, 'spec.image' field should be used instead" level=info ts=2020-12-17T06:40:53.259813417Z caller=operator.go:1163 component=prometheusoperator msg="sync prometheus" key=monitoring-stack/kube-prometheus-stack-prometheus level=info ts=2020-12-17T06:40:53.344039927Z caller=operator.go:661 component=alertmanageroperator msg="sync alertmanager" key=monitoring-stack/kube-prometheus-stack-alertmanager level=info ts=2020-12-17T06:40:53.575422112Z caller=operator.go:661 component=alertmanageroperator msg="sync alertmanager" key=monitoring-stack/kube-prometheus-stack-alertmanager
I’m currently at ossummit eu, I won’t get to it anytime this week, please open an issue so I’ll remember to backport the fix.
I found the solution (not related to prometheus-operator):
kubectl get endpoints kubelet -o yaml > kubelet.yaml
Manually remove the dead nodes from kubelet.yamlkubectl replace -f kubelet.yaml