metrics-server: Metrics server issue with hostname resolution of kubelet and apiserver unable to communicate with metric-server clusterIP

Metric-server unable to resolve the hostname to scrape the metrics from kubelet.

E0903  1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:<hostname>: unable to fetch metrics from Kubelet <hostname> (<hostname>): Get https://<hostname>:10250/stats/summary/: dial tcp: lookup <hostname> on 10.96.0.10:53: no such host

I figured its not resolving the hostname from kubedns

and as mentioned in following issues: https://github.com/kubernetes-incubator/metrics-server/issues/105#issuecomment-412818944 and https://github.com/kubernetes-incubator/metrics-server/issues/97

I did try to edit kubectl -n kube-system edit deploy metrics-server But metrics-server pod entered the error state.

The describe apiservice v1beta1.metrics.k8s.io have message:

no response from https://10.101.248.96:443: Get https://10.101.248.96:443: Proxy Error ( Connection refused )

10.101.248.96 being the clusterIP of the metric-server.

About this issue

Original URL
State: closed
Created 6 years ago
Reactions: 8
Comments: 73 (12 by maintainers)

Commits related to this issue

Update: use internal IP for MetricServer See: https://github.com/kubernetes-incubator/metrics-server/issues/129 https://github.com/kubernetes-incubator/metrics-server/issues/131 Signed-off-by: Nicol... — committed to zeiot-old/jarvis by nlamirault 5 years ago

Most upvoted comments

@amolredhat The ‘–source’ flag is unavailable right now (v0.3.0-alpha.1)

I (finally) got it to work by setting the following args:

        command:
        - /metrics-server
        - --kubelet-insecure-tls
        - --kubelet-preferred-address-types=InternalIP

It works like a charm!

+112

juan-vg on Sep 4, 2018

The solution proposed by @MIBc works. Change the metrics-server-deployment.yaml file and add:

        command:
        - /metrics-server
        - --kubelet-preferred-address-types=InternalIP

The metrics-server is now able to talk to the node (It was failing before because it could not resolve the hostname of the node). There’s something strange happening though, I can see the metrics now from HPA but it shows an error in the logs:

E0903 11:04:45.914111       1 reststorage.go:98] unable to fetch pod metrics for pod default/my-nginx-84c58b9888-whg8r: no metrics known for pod "default/my-nginx-84c58b9888-whg8r"

+55

damascenorakuten on Sep 3, 2018

@juan-vg awesome, this also works for me too (metrics-server-amd64:v.0.3.0 on k8s 1.10.3). Btw, so as not to duplicate the entrypoint set in the Dockerfile, consider using args: instead:

        args:
        - --kubelet-insecure-tls
        - --kubelet-preferred-address-types=InternalIP

+39

kaiterramike on Sep 5, 2018

In recent versions of metrics-server, where there is no “command” or “metrics-server-deployment.yaml”, the following helped me

Open deployment editor kubectl -n kube-system edit deploy metrics-server
Add a few args:

      - args:
        - --cert-dir=/tmp
        - --secure-port=4443
        - --v=2
        - --kubelet-insecure-tls
        - --kubelet-preferred-address-types=InternalIP

+19

veton on Apr 23, 2020

@vikranttkamble you can try --kubelet-preferred-address-types InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP

+17

MIBc on Sep 3, 2018

I temporary fix this by edit metric-service deployemnt add this cofnig unser Deployemnt.spec.template.spec

      hostAliases:
      - hostnames:
        - k8s-master1
        ip: xxxxx
      - hostnames:
        - k8s-node1
        ip: yyyyy
      - hostnames:
        - k8s-node2
        ip: zzzzz

+12

724399396 on Sep 14, 2018

It is OK when set the kubelet flag “–authorization-mode=AlwaysAllow”, and metrics-server flag “–kubelet-insecure-tls”.

MIBc on Sep 4, 2018

If you’re going to use InternalIP, you should probably set up your node’s serving certs to list the IP as an alternative name. You generally don’t want to pass kubelet-insecure-tls except in testing setups.

DirectXMan12 on Sep 5, 2018

I’m adding a followup comment in case others had the same issues as I did. Here is how I got this working on a kubeadm bootstrapped cluster:

metrics server changes:

acabrer@nuc-01:~/git.workspace/metrics-server$ git diff
diff --git a/deploy/1.8+/metrics-server-deployment.yaml b/deploy/1.8+/metrics-server-deployment.yaml
index ad2abaf..bc5e718 100644
--- a/deploy/1.8+/metrics-server-deployment.yaml
+++ b/deploy/1.8+/metrics-server-deployment.yaml
@@ -31,7 +31,14 @@ spec:
       - name: metrics-server
         image: k8s.gcr.io/metrics-server-amd64:v0.3.1
         imagePullPolicy: Always
+        command:
+        - /metrics-server
+        - --kubelet-insecure-tls
+        - --kubelet-preferred-address-types=InternalIP
         volumeMounts:
         - name: tmp-dir
           mountPath: /tmp
-
+      hostAliases:
+      - hostnames:
+        - nuc-01
+        ip: 192.168.1.240
acabrer@nuc-01:~/git.workspace/metrics-server$ hostname -A
nuc-01.int.mrmcmuffinz.com nuc-01 nuc-01
acabrer@nuc-01:~/git.workspace/metrics-server$ hostname -I
192.168.1.240 172.17.0.1 172.16.0.1

deploy.sh:

#!/bin/sh

sudo kubeadm init --pod-network-cidr=172.16.0.0/12 --token-ttl=0 --apiserver-advertise-address=192.168.1.240
rm -f /home/acabrer/.kube/config
mkdir -p /home/acabrer/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown acabrer:acabrer /home/acabrer/.kube/config
kubectl taint nodes --all node-role.kubernetes.io/master-

kubectl apply -f /home/acabrer/scratch/rbac-kdd.yaml
kubectl apply -f /home/acabrer/scratch/calico.yaml

cd /home/acabrer/git.workspace/metrics-server
kubectl apply -f deploy/1.8+/
cd -

kubectl top commands…

acabrer@nuc-01:~$ kubectl top nodes
NAME     CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
nuc-01   368m         9%     2017Mi          6%
acabrer@nuc-01:~$ kubectl top pods --all-namespaces
NAMESPACE     NAME                              CPU(cores)   MEMORY(bytes)
kube-system   calico-node-bfmkw                 36m          54Mi
kube-system   coredns-576cbf47c7-5r2v9          4m           8Mi
kube-system   coredns-576cbf47c7-7pvx2          3m           8Mi
kube-system   etcd-nuc-01                       33m          34Mi
kube-system   kube-apiserver-nuc-01             53m          406Mi
kube-system   kube-controller-manager-nuc-01    67m          50Mi
kube-system   kube-proxy-fkrb6                  6m           11Mi
kube-system   kube-scheduler-nuc-01             19m          12Mi
kube-system   metrics-server-556f49c7c9-4smvt   2m           13Mi

Note: in kubeadm the insecure port is disabled by default, so don’t try setting the kubelet port in the metrics server to 10255 as it will not work!

mrmcmuffinz on Nov 10, 2018

I want to clarify that my prior comment will only help show stats for nodes, pod stats are still broken.

kubectl -n kube-system logs metrics-server-7fbd9b8589-hv6qh
I1105 00:05:31.372311       1 serving.go:273] Generated self-signed cert (apiserver.local.config/certificates/apiserver.crt, apiserver.local.config/certificates/apiserver.key)
[restful] 2018/11/05 00:05:31 log.go:33: [restful/swagger] listing is available at https://:443/swaggerapi
[restful] 2018/11/05 00:05:31 log.go:33: [restful/swagger] https://:443/swaggerui/ is mapped to folder /swagger-ui/
I1105 00:05:31.939521       1 serve.go:96] Serving securely on [::]:443
E1105 00:05:34.537886       1 reststorage.go:144] unable to fetch pod metrics for pod istio-system/istio-egressgateway-5b765869bf-l86lv: no metrics known for pod
E1105 00:05:34.537910       1 reststorage.go:144] unable to fetch pod metrics for pod istio-system/istio-egressgateway-5b765869bf-xrg2f: no metrics known for pod
E1105 00:05:34.543566       1 reststorage.go:144] unable to fetch pod metrics for pod istio-system/istio-ingressgateway-69b597b6bd-qwq78: no metrics known for pod
E1105 00:05:34.549083       1 reststorage.go:144] unable to fetch pod metrics for pod istio-system/istio-policy-59b7f4ccd5-kllfx: no metrics known for pod
E1105 00:05:34.554026       1 reststorage.go:144] unable to fetch pod metrics for pod istio-system/istio-telemetry-7686cd76bd-j72qd: no metrics known for pod
E1105 00:05:34.554041       1 reststorage.go:144] unable to fetch pod metrics for pod istio-system/istio-telemetry-7686cd76bd-rvfgw: no metrics known for pod
E1105 00:05:34.567333       1 reststorage.go:144] unable to fetch pod metrics for pod istio-system/knative-ingressgateway-84d56577db-82sml: no metrics known for pod
E1105 00:05:34.567348       1 reststorage.go:144] unable to fetch pod metrics for pod istio-system/knative-ingressgateway-84d56577db-cnwsj: no metrics known for pod
E1105 00:05:34.567353       1 reststorage.go:144] unable to fetch pod metrics for pod istio-system/knative-ingressgateway-84d56577db-7rzjw: no metrics known for pod
E1105 00:05:34.567357       1 reststorage.go:144] unable to fetch pod metrics for pod istio-system/knative-ingressgateway-84d56577db-9956m: no metrics known for pod
E1105 00:05:34.567361       1 reststorage.go:144] unable to fetch pod metrics for pod istio-system/knative-ingressgateway-84d56577db-r95mj: no metrics known for pod
E1105 00:05:34.567366       1 reststorage.go:144] unable to fetch pod metrics for pod istio-system/knative-ingressgateway-84d56577db-cmlgg: no metrics known for pod
E1105 00:05:34.567370       1 reststorage.go:144] unable to fetch pod metrics for pod istio-system/knative-ingressgateway-84d56577db-m7kpx: no metrics known for pod
E1105 00:05:34.567374       1 reststorage.go:144] unable to fetch pod metrics for pod istio-system/knative-ingressgateway-84d56577db-tgwfc: no metrics known for pod
E1105 00:05:34.567378       1 reststorage.go:144] unable to fetch pod metrics for pod istio-system/knative-ingressgateway-84d56577db-r4dxm: no metrics known for pod
E1105 00:05:34.567382       1 reststorage.go:144] unable to fetch pod metrics for pod istio-system/knative-ingressgateway-84d56577db-82wg9: no metrics known for pod
E1105 00:05:49.560880       1 reststorage.go:144] unable to fetch pod metrics for pod istio-system/istio-egressgateway-5b765869bf-xrg2f: no metrics known for pod
E1105 00:05:49.560924       1 reststorage.go:144] unable to fetch pod metrics for pod istio-system/istio-egressgateway-5b765869bf-l86lv: no metrics known for pod

Snippet of logs from metrics pod.

mrmcmuffinz on Nov 5, 2018

@veton 's answer is the most on-point and up to date, thank you! 🙏

progapandist on Jun 3, 2020

If you’re going to use InternalIP, you should probably set up your node’s serving certs to list the IP as an alternative name. You generally don’t want to pass kubelet-insecure-tls except in testing setups.

@DirectXMan12 Hi, sorry to revive an old issue with a question like this - I agree with this approach as I try to have a keen eye on security, however, as a newbie to kubernetes, I don’t know where I begin to configure this. Would you be kind enough to link me somewhere?

Hopefully this will help others who find this issue when Googling around too.

silverbackdan on Jul 30, 2019

@DirectXMan12 this problem still exists and usage of --kubelet-insecure-tls is discouraged. Furthermore, I don’t found any guide how to deploy metrics-server “right way”. Can you please reopen this issue until it is fixed by code or documentation?

Bessonov on Jan 18, 2019

@itskingori and yes, the insecure flag is not a good long term solution (@jpetazzo)

DirectXMan12 on Dec 21, 2018

@itskingori yeah, it should just work with webhook auth, assuming metrics-server can trust your Kubelet. Right now, that trust means that the kubelet’s serving certs must be signed by the main cluster CA. We need someone to submit a to support a separate Kubelet CA, since some clusters use that.

EDIT: #183 implements the “separate kubelet CA” thing. Once it merges, that’ll be a good way forward. Then we just have to convince cluster tools to use an actual CA for their kubelet serving certs.

DirectXMan12 on Dec 21, 2018

coredns，并且我已正确设置节点的/ etc / hosts，但它仍然失败：

问题解决了吗？

xiaotian45123 on Sep 27, 2018

显然问题在于 lookup <hostname-ip> in <dns-service-ip>..... no such host

就我而言，coreDNS用于群集DNS解析。默认情况下，coreDNS（在我的情况下使用Kubespray部署）仅设置用于服务名称解析而不是pods / nodes。

然后我们可以查看coreDNS的默认配置
kind: ConfigMap
metadata:  name: coredns
  namespace: kube-system
data:  Corefile: |
    .:53 {
        errors
        log stdout
        health
        kubernetes cluster.local {
          cidrs 10.3.0.0/24
        }
        proxy . /etc/resolv.conf
        cache 30
    }
此选项proxy . /etc/resolv.conf通常意味着您的DNS服务将使用您的外部名称服务器（在我的情况下，已定义外部名称服务器）来进行节点名称解析。

所以，是的，我查看了我的DNS日志，发现我的DNS收到了请求。

最后，我刚刚将我的节点的主机名记录添加到我的外部DNS服务中，就是这样。度量标准已成功收集。

The host uses /etc/hosts to parse. How to handle it better?

xiaotian45123 on Sep 27, 2018

Obviously the issue in lookup <hostname-ip> in <dns-service-ip>..... no such host

In my case coreDNS is using for cluster DNS resolutions. By default coreDNS ( in my case deployed with Kubespray) is set only for services name resolution and not pods/nodes.

Then we could look at default config for coreDNS

kind: ConfigMap
metadata:  name: coredns
  namespace: kube-system
data:  Corefile: |
    .:53 {
        errors
        log stdout
        health
        kubernetes cluster.local {
          cidrs 10.3.0.0/24
        }
        proxy . /etc/resolv.conf
        cache 30
    }

This option proxy . /etc/resolv.conf in general is meaning that your DNS service will use your external nameserver (in my case there were defined external nameservers) for node name resolution.

So, yes, I’ve looked in my DNS logs and found that my DNS were received that requests.

Eventually, I’ve just added my node’s hostnames records to my external DNS service and that’s it. Metrics are collecting successfully.

Demon-DK on Sep 19, 2018

This seems to be a blocking issue for 0.3.0 when running a kops deployment on AWS using a private network topology.

dial tcp: lookup ip-x-x-x-x.us-west-2.compute.internal on 100.64.0.10:53: no such host

Naturally kubedns can’t resolve that hostname. I tried setting dnsPolicy: Default in the metrics-server deployment, which skirts the DNS issue, but then I see this:

x509: certificate signed by unknown authority

Not really sure what to do with that. I don’t want to start monkeying with my node’s certs without knowing exactly what I’m fixing. For now I’ve had to revert to metrics-server 0.2.1.

wilsonjackson on Sep 7, 2018

In Metrics Servers we are found below logs.

E0903 15:36:38.239003 1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:nvm250d00: unable to fetch metrics from Kubelet nvm250d00 (10.130.X.X): Get https://10.130.X.X:10250/stats/summary/: x509: cannot validate certificate for 10.130.X.X because it doesn't contain any IP SANs, unable to fully scrape metrics from source kubelet_summary:nvmbd1aow270d00: unable to fetch metrics from Kubelet

amolredhat on Sep 3, 2018

Adding --kubelet-preferred-address-types=InternalIP flag helped me to fix metrics-server 0.3.6 after enabling NodeLocal DNS Cache.

justinaslelys on Sep 7, 2020

Nice! I was reading that same page too. There might be more learnings on the page you linked for me too. Queued for when I have some learning time. 😃

nabheet on Jul 31, 2019

Perhaps, specifically the --bootstrap-kubeconfig flag https://kubernetes.io/docs/tasks/tls/certificate-rotation/#understanding-the-certificate-rotation-configuration

I had to run this command to change the cert SANs for the API and to listen on different IPs

kubeadm init phase certs all \
  --apiserver-advertise-address=0.0.0.0 \
  --apiserver-cert-extra-sans=10.244.0.1,11.0.0.10,example.com

It’d have been nice if that command could also sign the cert with the cluster’s CA, anyway, I see where things are going wrong now and can use your very helpful notes to take me further.

silverbackdan on Jul 31, 2019

I was able to resolve the certificate issue without disabling TLS. With k8s version 1.13.1, I was able to generate new certs using https://kubernetes.io/docs/tasks/tls/managing-tls-in-a-cluster/ and openssl.

I generated the CSR and private key with openssl and passed that to the certificates API in K8S to approve and sign it. Then I downloaded the signed cert and put it in /var/lib/kublet/pki/ and added the kublet tls cert and key params to /etc/sysconfig/kubelet. And then obviously I had to restart the kubelet service. Now, port 10250 on each node has a K8S CA signed cert installed.

However, I did notice some deprecation warnings regarding using those params in the KUBELET_EXTRA_ARGS, I will investigate the alternate later as I didn’t find a quick solution in my 10s of googling.

nabheet on Jan 8, 2019

@originsmike Another problem after modify tls and internalIP

[root@192 ~]# docker logs -f fa55e7f7343a
I1010 10:40:01.108023       1 serving.go:273] Generated self-signed cert (apiserver.local.config/certificates/apiserver.crt, apiserver.local.config/certificates/apiserver.key)
[restful] 2018/10/10 10:40:32 log.go:33: [restful/swagger] listing is available at https://:443/swaggerapi
[restful] 2018/10/10 10:40:32 log.go:33: [restful/swagger] https://:443/swaggerui/ is mapped to folder /swagger-ui/
I1010 10:40:33.308883       1 serve.go:96] Serving securely on [::]:443
I1010 10:40:33.609544       1 logs.go:49] http: TLS handshake error from 172.20.0.1:49456: EOF
E1010 10:41:02.208299       1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:master: unable to fetch metrics from Kubelet master (192.168.8.184): request failed - "403 Forbidden", response: "Forbidden (user=system:serviceaccount:kube-system:metrics-server, verb=get, resource=nodes, subresource=stats)", unable to fully scrape metrics from source kubelet_summary:node: unable to fetch metrics from Kubelet node (192.168.8.185): request failed - "403 Forbidden", response: "Forbidden (user=system:serviceaccount:kube-system:metrics-server, verb=get, resource=nodes, subresource=stats)"]
E1010 10:41:32.116815       1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:node: unable to fetch metrics from Kubelet node (192.168.8.185): request failed - "403 Forbidden", response: "Forbidden (user=system:serviceaccount:kube-system:metrics-server, verb=get, resource=nodes, subresource=stats)", unable to fully scrape metrics from source kubelet_summary:master: unable to fetch metrics from Kubelet master (192.168.8.184): request failed - "403 Forbidden", response: "Forbidden (user=system:serviceaccount:kube-system:metrics-server, verb=get, resource=nodes, subresource=stats)"]

TracyBin on Oct 10, 2018