prometheus-operator: All kubelet targets down - 401 Unauthorized ?
What did you do?
./contrib/kube-prometheus/hack/cluster-monitoring/deploy
What did you expect to see?
Everything working fine.
What did you see instead? Under which circumstances?
Everything is fine except kubelet on the Prometheus targets page, all are DOWN with error server returned HTTP status 401 Unauthorized
Environment
GKE / Ubuntu 17.10
Kubernetes version information:
Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.6", GitCommit:"6260bb08c46c31eea6cb538b34a9ceb3e406689c", GitTreeState:"clean", BuildDate:"2017-12-21T06:34:11Z", GoVersion:"go1.8.3", Compiler: "gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"8+", GitVersion:"v1.8.6-gke.0", GitCommit:"ee9a97661f14ee0b1ca31d6edd30480c89347c79", GitTreeState:"clean", BuildDate:"2018-01-05T03:36:42Z", GoVersion:"go1.8.3b4", Compiler:"gc", Platform:"linux/amd64"}
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 3
- Comments: 19 (5 by maintainers)
for future googlers, changing the kubelet
ServiceMonitorto look for the http endpoints on port 10255 worked for me:(
prometheus-k8s-service-monitor-kubelet.yamlfor thehack/example)port: https-metricschanges toport: http-metrics(in code here)In the latest helm chart, set
kubelet.serviceMonitor.https=false:This forces kubelet exporter to scrape the
http-metricendpoint, which should solve the problem.EDIT: s/http/https/
I had the same problem, using GKE. Solved it by updating the
kube-prometheus-exporter-kubeletsServiceMonitor resource definition, from HTTPS to HTTP, as @vsinha suggested, with this one-liner (update the namespace accordingly):After this, the Prometheus server was able to scrape the
kubelettarget as expected.Running on GKE, I solved a similar issue with adding
--set exporter-kubelets.https=falseto myhelm installcommand. See comment inhelm/exporter-kubelets/values.yaml:In case anyone comes across this issue again with AKS and k8s
1.18.4. With the chartstable/prometheus-operatorin version9.2.0removing the suggested changefixes the issue.
Also applies to AKS, I switched the ServiceMonitor to http as workaround.
Please note the
hack/directory that you’re executing this from. The kube-prometheus stack expects you to have a properly secured setup, which allows authenticating with ServiceAccount tokens and authorizes against RBAC roles. What this concretely means for a minikube cluster for example is documented here: https://github.com/coreos/prometheus-operator/blob/master/contrib/kube-prometheus/hack/cluster-monitoring/minikube-deploy#L3-L12Basically your cluster needs to be RBAC enabled and these two kubelet flags need to be enabled:
--authentication-token-webhook=true--extra-config=kubelet.authorization-mode=WebhookFeel free to ask any further questions, but this is not an issue with the Prometheus operator or kube-prometheus stack, so I’m closing this here.
Not sure if this is the right place to leave this, but adding it here as I hit a similar issue and this was the first result. I was using AWS/EKS, but I think this has more to do with k8s v1.11. It seems now the read-only port is disabled by default now. I had to re-enable this on all my nodes.
I am using a launch configuration and ASGs, so that would look something like this:
after this change all worked as expected.
Another thing that made the issue more obvious that in prometheus under /targets you could see connection being refused along with data missing in grafana.
It took way to long for me to find this, so hopefully it helps someone else out.
I just confirmed that on AKS,
--authentication-token-webhookis set tofalsethe default from kubelet.https://github.com/Azure/AKS/issues/1087
@gb-ckedzierski, since you’re modifying the kubelet config anyway, you should leave the
--read-only-portdisabled and use the secure port with these flags:Apparently this is known behaviour. See the
kube-prometheushere: https://github.com/coreos/prometheus-operator/blob/master/contrib/kube-prometheus/docs/GKE-cadvisor-support.mdFWIW this seems to resolve the issue - https://github.com/kubernetes/kubernetes/issues/44330#issuecomment-293287575