metrics-server: Metrics server issue with hostname resolution of kubelet and apiserver unable to communicate with metric-server clusterIP

Metric-server unable to resolve the hostname to scrape the metrics from kubelet.

E0903  1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:<hostname>: unable to fetch metrics from Kubelet <hostname> (<hostname>): Get https://<hostname>:10250/stats/summary/: dial tcp: lookup <hostname> on 10.96.0.10:53: no such host

I figured its not resolving the hostname from kubedns

and as mentioned in following issues: https://github.com/kubernetes-incubator/metrics-server/issues/105#issuecomment-412818944 and https://github.com/kubernetes-incubator/metrics-server/issues/97

I did try to edit kubectl -n kube-system edit deploy metrics-server But metrics-server pod entered the error state.

The describe apiservice v1beta1.metrics.k8s.io have message:

no response from https://10.101.248.96:443: Get https://10.101.248.96:443: Proxy Error ( Connection refused )

10.101.248.96 being the clusterIP of the metric-server.

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 8
  • Comments: 73 (12 by maintainers)

Commits related to this issue

Most upvoted comments

@amolredhat The ‘–source’ flag is unavailable right now (v0.3.0-alpha.1)

I (finally) got it to work by setting the following args:

        command:
        - /metrics-server
        - --kubelet-insecure-tls
        - --kubelet-preferred-address-types=InternalIP

It works like a charm!

The solution proposed by @MIBc works. Change the metrics-server-deployment.yaml file and add:

        command:
        - /metrics-server
        - --kubelet-preferred-address-types=InternalIP

The metrics-server is now able to talk to the node (It was failing before because it could not resolve the hostname of the node). There’s something strange happening though, I can see the metrics now from HPA but it shows an error in the logs:

E0903 11:04:45.914111       1 reststorage.go:98] unable to fetch pod metrics for pod default/my-nginx-84c58b9888-whg8r: no metrics known for pod "default/my-nginx-84c58b9888-whg8r"

@juan-vg awesome, this also works for me too (metrics-server-amd64:v.0.3.0 on k8s 1.10.3). Btw, so as not to duplicate the entrypoint set in the Dockerfile, consider using args: instead:

        args:
        - --kubelet-insecure-tls
        - --kubelet-preferred-address-types=InternalIP

In recent versions of metrics-server, where there is no “command” or “metrics-server-deployment.yaml”, the following helped me

  1. Open deployment editor kubectl -n kube-system edit deploy metrics-server
  2. Add a few args:
      - args:
        - --cert-dir=/tmp
        - --secure-port=4443
        - --v=2
        - --kubelet-insecure-tls
        - --kubelet-preferred-address-types=InternalIP

@vikranttkamble you can try --kubelet-preferred-address-types InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP

I temporary fix this by edit metric-service deployemnt add this cofnig unser Deployemnt.spec.template.spec

      hostAliases:
      - hostnames:
        - k8s-master1
        ip: xxxxx
      - hostnames:
        - k8s-node1
        ip: yyyyy
      - hostnames:
        - k8s-node2
        ip: zzzzz

It is OK when set the kubelet flag “–authorization-mode=AlwaysAllow”, and metrics-server flag “–kubelet-insecure-tls”.

If you’re going to use InternalIP, you should probably set up your node’s serving certs to list the IP as an alternative name. You generally don’t want to pass kubelet-insecure-tls except in testing setups.

I’m adding a followup comment in case others had the same issues as I did. Here is how I got this working on a kubeadm bootstrapped cluster:

metrics server changes:

acabrer@nuc-01:~/git.workspace/metrics-server$ git diff
diff --git a/deploy/1.8+/metrics-server-deployment.yaml b/deploy/1.8+/metrics-server-deployment.yaml
index ad2abaf..bc5e718 100644
--- a/deploy/1.8+/metrics-server-deployment.yaml
+++ b/deploy/1.8+/metrics-server-deployment.yaml
@@ -31,7 +31,14 @@ spec:
       - name: metrics-server
         image: k8s.gcr.io/metrics-server-amd64:v0.3.1
         imagePullPolicy: Always
+        command:
+        - /metrics-server
+        - --kubelet-insecure-tls
+        - --kubelet-preferred-address-types=InternalIP
         volumeMounts:
         - name: tmp-dir
           mountPath: /tmp
-
+      hostAliases:
+      - hostnames:
+        - nuc-01
+        ip: 192.168.1.240
acabrer@nuc-01:~/git.workspace/metrics-server$ hostname -A
nuc-01.int.mrmcmuffinz.com nuc-01 nuc-01
acabrer@nuc-01:~/git.workspace/metrics-server$ hostname -I
192.168.1.240 172.17.0.1 172.16.0.1

deploy.sh:

#!/bin/sh

sudo kubeadm init --pod-network-cidr=172.16.0.0/12 --token-ttl=0 --apiserver-advertise-address=192.168.1.240
rm -f /home/acabrer/.kube/config
mkdir -p /home/acabrer/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown acabrer:acabrer /home/acabrer/.kube/config
kubectl taint nodes --all node-role.kubernetes.io/master-

kubectl apply -f /home/acabrer/scratch/rbac-kdd.yaml
kubectl apply -f /home/acabrer/scratch/calico.yaml

cd /home/acabrer/git.workspace/metrics-server
kubectl apply -f deploy/1.8+/
cd -

kubectl top commands…

acabrer@nuc-01:~$ kubectl top nodes
NAME     CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
nuc-01   368m         9%     2017Mi          6%
acabrer@nuc-01:~$ kubectl top pods --all-namespaces
NAMESPACE     NAME                              CPU(cores)   MEMORY(bytes)
kube-system   calico-node-bfmkw                 36m          54Mi
kube-system   coredns-576cbf47c7-5r2v9          4m           8Mi
kube-system   coredns-576cbf47c7-7pvx2          3m           8Mi
kube-system   etcd-nuc-01                       33m          34Mi
kube-system   kube-apiserver-nuc-01             53m          406Mi
kube-system   kube-controller-manager-nuc-01    67m          50Mi
kube-system   kube-proxy-fkrb6                  6m           11Mi
kube-system   kube-scheduler-nuc-01             19m          12Mi
kube-system   metrics-server-556f49c7c9-4smvt   2m           13Mi

Note: in kubeadm the insecure port is disabled by default, so don’t try setting the kubelet port in the metrics server to 10255 as it will not work!

I want to clarify that my prior comment will only help show stats for nodes, pod stats are still broken.

kubectl -n kube-system logs metrics-server-7fbd9b8589-hv6qh
I1105 00:05:31.372311       1 serving.go:273] Generated self-signed cert (apiserver.local.config/certificates/apiserver.crt, apiserver.local.config/certificates/apiserver.key)
[restful] 2018/11/05 00:05:31 log.go:33: [restful/swagger] listing is available at https://:443/swaggerapi
[restful] 2018/11/05 00:05:31 log.go:33: [restful/swagger] https://:443/swaggerui/ is mapped to folder /swagger-ui/
I1105 00:05:31.939521       1 serve.go:96] Serving securely on [::]:443
E1105 00:05:34.537886       1 reststorage.go:144] unable to fetch pod metrics for pod istio-system/istio-egressgateway-5b765869bf-l86lv: no metrics known for pod
E1105 00:05:34.537910       1 reststorage.go:144] unable to fetch pod metrics for pod istio-system/istio-egressgateway-5b765869bf-xrg2f: no metrics known for pod
E1105 00:05:34.543566       1 reststorage.go:144] unable to fetch pod metrics for pod istio-system/istio-ingressgateway-69b597b6bd-qwq78: no metrics known for pod
E1105 00:05:34.549083       1 reststorage.go:144] unable to fetch pod metrics for pod istio-system/istio-policy-59b7f4ccd5-kllfx: no metrics known for pod
E1105 00:05:34.554026       1 reststorage.go:144] unable to fetch pod metrics for pod istio-system/istio-telemetry-7686cd76bd-j72qd: no metrics known for pod
E1105 00:05:34.554041       1 reststorage.go:144] unable to fetch pod metrics for pod istio-system/istio-telemetry-7686cd76bd-rvfgw: no metrics known for pod
E1105 00:05:34.567333       1 reststorage.go:144] unable to fetch pod metrics for pod istio-system/knative-ingressgateway-84d56577db-82sml: no metrics known for pod
E1105 00:05:34.567348       1 reststorage.go:144] unable to fetch pod metrics for pod istio-system/knative-ingressgateway-84d56577db-cnwsj: no metrics known for pod
E1105 00:05:34.567353       1 reststorage.go:144] unable to fetch pod metrics for pod istio-system/knative-ingressgateway-84d56577db-7rzjw: no metrics known for pod
E1105 00:05:34.567357       1 reststorage.go:144] unable to fetch pod metrics for pod istio-system/knative-ingressgateway-84d56577db-9956m: no metrics known for pod
E1105 00:05:34.567361       1 reststorage.go:144] unable to fetch pod metrics for pod istio-system/knative-ingressgateway-84d56577db-r95mj: no metrics known for pod
E1105 00:05:34.567366       1 reststorage.go:144] unable to fetch pod metrics for pod istio-system/knative-ingressgateway-84d56577db-cmlgg: no metrics known for pod
E1105 00:05:34.567370       1 reststorage.go:144] unable to fetch pod metrics for pod istio-system/knative-ingressgateway-84d56577db-m7kpx: no metrics known for pod
E1105 00:05:34.567374       1 reststorage.go:144] unable to fetch pod metrics for pod istio-system/knative-ingressgateway-84d56577db-tgwfc: no metrics known for pod
E1105 00:05:34.567378       1 reststorage.go:144] unable to fetch pod metrics for pod istio-system/knative-ingressgateway-84d56577db-r4dxm: no metrics known for pod
E1105 00:05:34.567382       1 reststorage.go:144] unable to fetch pod metrics for pod istio-system/knative-ingressgateway-84d56577db-82wg9: no metrics known for pod
E1105 00:05:49.560880       1 reststorage.go:144] unable to fetch pod metrics for pod istio-system/istio-egressgateway-5b765869bf-xrg2f: no metrics known for pod
E1105 00:05:49.560924       1 reststorage.go:144] unable to fetch pod metrics for pod istio-system/istio-egressgateway-5b765869bf-l86lv: no metrics known for pod

Snippet of logs from metrics pod.

@veton 's answer is the most on-point and up to date, thank you! 🙏

If you’re going to use InternalIP, you should probably set up your node’s serving certs to list the IP as an alternative name. You generally don’t want to pass kubelet-insecure-tls except in testing setups.

@DirectXMan12 Hi, sorry to revive an old issue with a question like this - I agree with this approach as I try to have a keen eye on security, however, as a newbie to kubernetes, I don’t know where I begin to configure this. Would you be kind enough to link me somewhere?

Hopefully this will help others who find this issue when Googling around too.

@DirectXMan12 this problem still exists and usage of --kubelet-insecure-tls is discouraged. Furthermore, I don’t found any guide how to deploy metrics-server “right way”. Can you please reopen this issue until it is fixed by code or documentation?

@itskingori and yes, the insecure flag is not a good long term solution (@jpetazzo)

@itskingori yeah, it should just work with webhook auth, assuming metrics-server can trust your Kubelet. Right now, that trust means that the kubelet’s serving certs must be signed by the main cluster CA. We need someone to submit a to support a separate Kubelet CA, since some clusters use that.

EDIT: #183 implements the “separate kubelet CA” thing. Once it merges, that’ll be a good way forward. Then we just have to convince cluster tools to use an actual CA for their kubelet serving certs.

coredns,并且我已正确设置节点的/ etc / hosts,但它仍然失败:

问题解决了吗?

显然问题在于 lookup <hostname-ip> in <dns-service-ip>..... no such host

就我而言,coreDNS用于群集DNS解析。 默认情况下,coreDNS(在我的情况下使用Kubespray部署)仅设置用于服务名称解析而不是pods / nodes。

然后我们可以查看coreDNS的默认配置

kind: ConfigMap
metadata:  name: coredns
  namespace: kube-system
data:  Corefile: |
    .:53 {
        errors
        log stdout
        health
        kubernetes cluster.local {
          cidrs 10.3.0.0/24
        }
        proxy . /etc/resolv.conf
        cache 30
    }

此选项proxy . /etc/resolv.conf通常意味着您的DNS服务将使用您的外部名称服务器(在我的情况下,已定义外部名称服务器)来进行节点名称解析。

所以,是的,我查看了我的DNS日志,发现我的DNS收到了请求。

最后,我刚刚将我的节点的主机名记录添加到我的外部DNS服务中,就是这样。 度量标准已成功收集。

The host uses /etc/hosts to parse. How to handle it better?

Obviously the issue in lookup <hostname-ip> in <dns-service-ip>..... no such host

In my case coreDNS is using for cluster DNS resolutions. By default coreDNS ( in my case deployed with Kubespray) is set only for services name resolution and not pods/nodes.

Then we could look at default config for coreDNS

kind: ConfigMap
metadata:  name: coredns
  namespace: kube-system
data:  Corefile: |
    .:53 {
        errors
        log stdout
        health
        kubernetes cluster.local {
          cidrs 10.3.0.0/24
        }
        proxy . /etc/resolv.conf
        cache 30
    }

This option proxy . /etc/resolv.conf in general is meaning that your DNS service will use your external nameserver (in my case there were defined external nameservers) for node name resolution.

So, yes, I’ve looked in my DNS logs and found that my DNS were received that requests.

Eventually, I’ve just added my node’s hostnames records to my external DNS service and that’s it. Metrics are collecting successfully.

This seems to be a blocking issue for 0.3.0 when running a kops deployment on AWS using a private network topology.

dial tcp: lookup ip-x-x-x-x.us-west-2.compute.internal on 100.64.0.10:53: no such host

Naturally kubedns can’t resolve that hostname. I tried setting dnsPolicy: Default in the metrics-server deployment, which skirts the DNS issue, but then I see this:

x509: certificate signed by unknown authority

Not really sure what to do with that. I don’t want to start monkeying with my node’s certs without knowing exactly what I’m fixing. For now I’ve had to revert to metrics-server 0.2.1.

In Metrics Servers we are found below logs.

E0903 15:36:38.239003 1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:nvm250d00: unable to fetch metrics from Kubelet nvm250d00 (10.130.X.X): Get https://10.130.X.X:10250/stats/summary/: x509: cannot validate certificate for 10.130.X.X because it doesn't contain any IP SANs, unable to fully scrape metrics from source kubelet_summary:nvmbd1aow270d00: unable to fetch metrics from Kubelet

Adding --kubelet-preferred-address-types=InternalIP flag helped me to fix metrics-server 0.3.6 after enabling NodeLocal DNS Cache.

Nice! I was reading that same page too. There might be more learnings on the page you linked for me too. Queued for when I have some learning time. 😃

Perhaps, specifically the --bootstrap-kubeconfig flag https://kubernetes.io/docs/tasks/tls/certificate-rotation/#understanding-the-certificate-rotation-configuration

I had to run this command to change the cert SANs for the API and to listen on different IPs

kubeadm init phase certs all \
  --apiserver-advertise-address=0.0.0.0 \
  --apiserver-cert-extra-sans=10.244.0.1,11.0.0.10,example.com

It’d have been nice if that command could also sign the cert with the cluster’s CA, anyway, I see where things are going wrong now and can use your very helpful notes to take me further.

I was able to resolve the certificate issue without disabling TLS. With k8s version 1.13.1, I was able to generate new certs using https://kubernetes.io/docs/tasks/tls/managing-tls-in-a-cluster/ and openssl.

I generated the CSR and private key with openssl and passed that to the certificates API in K8S to approve and sign it. Then I downloaded the signed cert and put it in /var/lib/kublet/pki/ and added the kublet tls cert and key params to /etc/sysconfig/kubelet. And then obviously I had to restart the kubelet service. Now, port 10250 on each node has a K8S CA signed cert installed.

However, I did notice some deprecation warnings regarding using those params in the KUBELET_EXTRA_ARGS, I will investigate the alternate later as I didn’t find a quick solution in my 10s of googling.

@originsmike Another problem after modify tls and internalIP

[root@192 ~]# docker logs -f fa55e7f7343a
I1010 10:40:01.108023       1 serving.go:273] Generated self-signed cert (apiserver.local.config/certificates/apiserver.crt, apiserver.local.config/certificates/apiserver.key)
[restful] 2018/10/10 10:40:32 log.go:33: [restful/swagger] listing is available at https://:443/swaggerapi
[restful] 2018/10/10 10:40:32 log.go:33: [restful/swagger] https://:443/swaggerui/ is mapped to folder /swagger-ui/
I1010 10:40:33.308883       1 serve.go:96] Serving securely on [::]:443
I1010 10:40:33.609544       1 logs.go:49] http: TLS handshake error from 172.20.0.1:49456: EOF
E1010 10:41:02.208299       1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:master: unable to fetch metrics from Kubelet master (192.168.8.184): request failed - "403 Forbidden", response: "Forbidden (user=system:serviceaccount:kube-system:metrics-server, verb=get, resource=nodes, subresource=stats)", unable to fully scrape metrics from source kubelet_summary:node: unable to fetch metrics from Kubelet node (192.168.8.185): request failed - "403 Forbidden", response: "Forbidden (user=system:serviceaccount:kube-system:metrics-server, verb=get, resource=nodes, subresource=stats)"]
E1010 10:41:32.116815       1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:node: unable to fetch metrics from Kubelet node (192.168.8.185): request failed - "403 Forbidden", response: "Forbidden (user=system:serviceaccount:kube-system:metrics-server, verb=get, resource=nodes, subresource=stats)", unable to fully scrape metrics from source kubelet_summary:master: unable to fetch metrics from Kubelet master (192.168.8.184): request failed - "403 Forbidden", response: "Forbidden (user=system:serviceaccount:kube-system:metrics-server, verb=get, resource=nodes, subresource=stats)"]