metrics-server: Metrics server issue with hostname resolution of kubelet and apiserver unable to communicate with metric-server clusterIP
Metric-server unable to resolve the hostname to scrape the metrics from kubelet.
E0903 1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:<hostname>: unable to fetch metrics from Kubelet <hostname> (<hostname>): Get https://<hostname>:10250/stats/summary/: dial tcp: lookup <hostname> on 10.96.0.10:53: no such host
I figured its not resolving the hostname from kubedns
and as mentioned in following issues: https://github.com/kubernetes-incubator/metrics-server/issues/105#issuecomment-412818944 and https://github.com/kubernetes-incubator/metrics-server/issues/97
I did try to edit kubectl -n kube-system edit deploy metrics-server
But metrics-server pod entered the error state.
The describe apiservice v1beta1.metrics.k8s.io have message:
no response from https://10.101.248.96:443: Get https://10.101.248.96:443: Proxy Error ( Connection refused )
10.101.248.96 being the clusterIP of the metric-server.
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 8
- Comments: 73 (12 by maintainers)
@amolredhat The ‘–source’ flag is unavailable right now (v0.3.0-alpha.1)
I (finally) got it to work by setting the following args:
It works like a charm!
The solution proposed by @MIBc works. Change the metrics-server-deployment.yaml file and add:
The metrics-server is now able to talk to the node (It was failing before because it could not resolve the hostname of the node). There’s something strange happening though, I can see the metrics now from HPA but it shows an error in the logs:
@juan-vg awesome, this also works for me too (metrics-server-amd64:v.0.3.0 on k8s 1.10.3). Btw, so as not to duplicate the entrypoint set in the Dockerfile, consider using
args:
instead:In recent versions of metrics-server, where there is no “command” or “metrics-server-deployment.yaml”, the following helped me
kubectl -n kube-system edit deploy metrics-server
@vikranttkamble you can try --kubelet-preferred-address-types InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP
I temporary fix this by edit metric-service deployemnt add this cofnig unser Deployemnt.spec.template.spec
It is OK when set the kubelet flag “–authorization-mode=AlwaysAllow”, and metrics-server flag “–kubelet-insecure-tls”.
If you’re going to use InternalIP, you should probably set up your node’s serving certs to list the IP as an alternative name. You generally don’t want to pass
kubelet-insecure-tls
except in testing setups.I’m adding a followup comment in case others had the same issues as I did. Here is how I got this working on a kubeadm bootstrapped cluster:
metrics server changes:
deploy.sh:
kubectl top commands…
Note: in kubeadm the insecure port is disabled by default, so don’t try setting the kubelet port in the metrics server to 10255 as it will not work!
I want to clarify that my prior comment will only help show stats for nodes, pod stats are still broken.
Snippet of logs from metrics pod.
@veton 's answer is the most on-point and up to date, thank you! 🙏
@DirectXMan12 Hi, sorry to revive an old issue with a question like this - I agree with this approach as I try to have a keen eye on security, however, as a newbie to kubernetes, I don’t know where I begin to configure this. Would you be kind enough to link me somewhere?
Hopefully this will help others who find this issue when Googling around too.
@DirectXMan12 this problem still exists and usage of
--kubelet-insecure-tls
is discouraged. Furthermore, I don’t found any guide how to deploy metrics-server “right way”. Can you please reopen this issue until it is fixed by code or documentation?@itskingori and yes, the insecure flag is not a good long term solution (@jpetazzo)
@itskingori yeah, it should just work with webhook auth, assuming metrics-server can trust your Kubelet. Right now, that trust means that the kubelet’s serving certs must be signed by the main cluster CA. We need someone to submit a to support a separate Kubelet CA, since some clusters use that.
EDIT: #183 implements the “separate kubelet CA” thing. Once it merges, that’ll be a good way forward. Then we just have to convince cluster tools to use an actual CA for their kubelet serving certs.
问题解决了吗?
The host uses /etc/hosts to parse. How to handle it better?
Obviously the issue in
lookup <hostname-ip> in <dns-service-ip>..... no such host
In my case coreDNS is using for cluster DNS resolutions. By default coreDNS ( in my case deployed with Kubespray) is set only for services name resolution and not pods/nodes.
Then we could look at default config for coreDNS
This option
proxy . /etc/resolv.conf
in general is meaning that your DNS service will use your external nameserver (in my case there were defined external nameservers) for node name resolution.So, yes, I’ve looked in my DNS logs and found that my DNS were received that requests.
Eventually, I’ve just added my node’s hostnames records to my external DNS service and that’s it. Metrics are collecting successfully.
This seems to be a blocking issue for 0.3.0 when running a kops deployment on AWS using a private network topology.
Naturally kubedns can’t resolve that hostname. I tried setting
dnsPolicy: Default
in the metrics-server deployment, which skirts the DNS issue, but then I see this:Not really sure what to do with that. I don’t want to start monkeying with my node’s certs without knowing exactly what I’m fixing. For now I’ve had to revert to metrics-server 0.2.1.
In Metrics Servers we are found below logs.
E0903 15:36:38.239003 1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:nvm250d00: unable to fetch metrics from Kubelet nvm250d00 (10.130.X.X): Get https://10.130.X.X:10250/stats/summary/: x509: cannot validate certificate for 10.130.X.X because it doesn't contain any IP SANs, unable to fully scrape metrics from source kubelet_summary:nvmbd1aow270d00: unable to fetch metrics from Kubelet
Adding
--kubelet-preferred-address-types=InternalIP
flag helped me to fix metrics-server 0.3.6 after enabling NodeLocal DNS Cache.Nice! I was reading that same page too. There might be more learnings on the page you linked for me too. Queued for when I have some learning time. 😃
Perhaps, specifically the
--bootstrap-kubeconfig
flag https://kubernetes.io/docs/tasks/tls/certificate-rotation/#understanding-the-certificate-rotation-configurationI had to run this command to change the cert SANs for the API and to listen on different IPs
It’d have been nice if that command could also sign the cert with the cluster’s CA, anyway, I see where things are going wrong now and can use your very helpful notes to take me further.
I was able to resolve the certificate issue without disabling TLS. With k8s version 1.13.1, I was able to generate new certs using https://kubernetes.io/docs/tasks/tls/managing-tls-in-a-cluster/ and openssl.
I generated the CSR and private key with openssl and passed that to the certificates API in K8S to approve and sign it. Then I downloaded the signed cert and put it in /var/lib/kublet/pki/ and added the kublet tls cert and key params to /etc/sysconfig/kubelet. And then obviously I had to restart the kubelet service. Now, port 10250 on each node has a K8S CA signed cert installed.
However, I did notice some deprecation warnings regarding using those params in the KUBELET_EXTRA_ARGS, I will investigate the alternate later as I didn’t find a quick solution in my 10s of googling.
@originsmike Another problem after modify tls and internalIP