metrics-server: "Failed to scrape node" "failed parsing metrics: expected timestamp or new record, got MNAME"
Symptoms
Summary :
- sometimes, some nodes don’t get metrics, not always the same nodes, and randomly between 3 and 0 node(s) get an “unknown” status.
$ kubectl top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
itaodocker1 668m 12% 11499Mi 9%
kube-poc-compute2 216m 4% 7659Mi 24%
kube-poc-compute3 291m 5% 5642Mi 18%
kube-poc-master1 225m 6% 3752Mi 25%
kube-poc-master2 314m 8% 6230Mi 41%
kube-poc-master3 335m 9% 4210Mi 28%
iqdockerclust1 <unknown> <unknown> <unknown> <unknown>
kube-poc-compute1 <unknown> <unknown> <unknown> <unknown>
kube-poc-compute4 <unknown> <unknown> <unknown> <unknown>
- in metrics-server logs, from any of the 3 pods, we can see error messages like “Failed to scrape node” “failed parsing metrics: expected timestamp or new record, got MNAME”
metrics-server-7bdfb7c765-f5lp5 metrics-server E1128 07:29:28.136616 1 scraper.go:140] "Failed to scrape node" err="failed parsing metrics: expected timestamp or new record, got \"MNAME\"" node="kube-poc-compute1"
metrics-server-7bdfb7c765-bkkvs metrics-server E1128 07:29:38.270106 1 scraper.go:140] "Failed to scrape node" err="failed parsing metrics: expected timestamp or new record, got \"MNAME\"" node="kube-poc-compute1"
metrics-server-7bdfb7c765-c94c4 metrics-server E1128 07:29:38.713184 1 scraper.go:140] "Failed to scrape node" err="failed parsing metrics: expected timestamp or new record, got \"MNAME\"" node="kube-poc-compute1"
metrics-server-7bdfb7c765-f5lp5 metrics-server E1128 07:29:48.083484 1 scraper.go:140] "Failed to scrape node" err="failed parsing metrics: expected timestamp or new record, got \"MNAME\"" node="kube-poc-compute1"
metrics-server-7bdfb7c765-bkkvs metrics-server E1128 07:29:58.340301 1 scraper.go:140] "Failed to scrape node" err="failed parsing metrics: expected timestamp or new record, got \"MNAME\"" node="kube-poc-compute1"
metrics-server-7bdfb7c765-c94c4 metrics-server E1128 07:29:58.713545 1 scraper.go:140] "Failed to scrape node" err="failed parsing metrics: expected timestamp or new record, got \"MNAME\"" node="kube-poc-compute1"
metrics-server-7bdfb7c765-c94c4 metrics-server E1128 07:32:58.690976 1 scraper.go:140] "Failed to scrape node" err="failed parsing metrics: expected timestamp or new record, got \"MNAME\"" node="iqdockerclust1"
metrics-server-7bdfb7c765-f5lp5 metrics-server E1128 07:33:08.139105 1 scraper.go:140] "Failed to scrape node" err="failed parsing metrics: expected timestamp or new record, got \"MNAME\"" node="iqdockerclust1"
metrics-server-7bdfb7c765-bkkvs metrics-server E1128 07:33:18.314060 1 scraper.go:140] "Failed to scrape node" err="failed parsing metrics: expected timestamp or new record, got \"MNAME\"" node="iqdockerclust1"
metrics-server-7bdfb7c765-c94c4 metrics-server E1128 07:33:18.708113 1 scraper.go:140] "Failed to scrape node" err="failed parsing metrics: expected timestamp or new record, got \"MNAME\"" node="iqdockerclust1"
metrics-server-7bdfb7c765-f5lp5 metrics-server E1128 07:34:28.098630 1 scraper.go:140] "Failed to scrape node" err="failed parsing metrics: expected timestamp or new record, got \"MNAME\"" node="kube-poc-compute4"
metrics-server-7bdfb7c765-bkkvs metrics-server E1128 07:34:38.291907 1 scraper.go:140] "Failed to scrape node" err="failed parsing metrics: expected timestamp or new record, got \"MNAME\"" node="kube-poc-compute4"
metrics-server-7bdfb7c765-bkkvs metrics-server E1128 07:34:38.326823 1 scraper.go:140] "Failed to scrape node" err="failed parsing metrics: expected timestamp or new record, got \"MNAME\"" node="kube-poc-compute1"
metrics-server-7bdfb7c765-c94c4 metrics-server E1128 07:34:38.695391 1 scraper.go:140] "Failed to scrape node" err="failed parsing metrics: expected timestamp or new record, got \"MNAME\"" node="kube-poc-compute4"
metrics-server-7bdfb7c765-c94c4 metrics-server E1128 07:34:38.741334 1 scraper.go:140] "Failed to scrape node" err="failed parsing metrics: expected timestamp or new record, got \"MNAME\"" node="kube-poc-compute1"
metrics-server-7bdfb7c765-f5lp5 metrics-server E1128 07:34:48.104751 1 scraper.go:140] "Failed to scrape node" err="failed parsing metrics: expected timestamp or new record, got \"MNAME\"" node="kube-poc-compute1"
metrics-server-7bdfb7c765-f5lp5 metrics-server E1128 07:34:48.121529 1 scraper.go:140] "Failed to scrape node" err="failed parsing metrics: expected timestamp or new record, got \"MNAME\"" node="kube-poc-compute4"
metrics-server-7bdfb7c765-bkkvs metrics-server E1128 07:34:58.300311 1 scraper.go:140] "Failed to scrape node" err="failed parsing metrics: expected timestamp or new record, got \"MNAME\"" node="kube-poc-compute4"
metrics-server-7bdfb7c765-bkkvs metrics-server E1128 07:34:58.345051 1 scraper.go:140] "Failed to scrape node" err="failed parsing metrics: expected timestamp or new record, got \"MNAME\"" node="kube-poc-compute1"
metrics-server-7bdfb7c765-c94c4 metrics-server E1128 07:34:58.689066 1 scraper.go:140] "Failed to scrape node" err="failed parsing metrics: expected timestamp or new record, got \"MNAME\"" node="iqdockerclust1"
metrics-server-7bdfb7c765-c94c4 metrics-server E1128 07:34:58.697456 1 scraper.go:140] "Failed to scrape node" err="failed parsing metrics: expected timestamp or new record, got \"MNAME\"" node="kube-poc-compute4"
metrics-server-7bdfb7c765-c94c4 metrics-server E1128 07:34:58.729683 1 scraper.go:140] "Failed to scrape node" err="failed parsing metrics: expected timestamp or new record, got \"MNAME\"" node="kube-poc-compute1"
metrics-server-7bdfb7c765-f5lp5 metrics-server E1128 07:35:08.160690 1 scraper.go:140] "Failed to scrape node" err="failed parsing metrics: expected timestamp or new record, got \"MNAME\"" node="iqdockerclust1"
metrics-server-7bdfb7c765-bkkvs metrics-server E1128 07:35:18.295294 1 scraper.go:140] "Failed to scrape node" err="failed parsing metrics: expected timestamp or new record, got \"MNAME\"" node="iqdockerclust1"
metrics-server-7bdfb7c765-c94c4 metrics-server E1128 07:35:18.693318 1 scraper.go:140] "Failed to scrape node" err="failed parsing metrics: expected timestamp or new record, got \"MNAME\"" node="iqdockerclust1"
What we have tried
- restarting kubelets on nodes that can’t be scraped : no significant effect, may be random effect, maybe it solves it only for a short time however.
- rollback to metrics-server:v0.5.2 : no error logs anymore, ‘top nodes’ back to fully normal, the bug disappears completely.
- we have investigated a DNS name resolution issue, and did not find anything wrong. More over, it would not explain why it works with metrics-server:v0.5.2 and not with v0.6.2 with exactly the same configuration (only the image tag differs).
- then upgrade again to metrics-server:v0.6.2 : the bug comes back, the error log too. This is clearly appeared between these 2 versions.
Versions
- metrics-server v0.6.2
- kubernetes 1.24.7
- cri-o 1.24.3
- calico v3.24.3
- centOS stream 8
Additional info
- We run 3 replicas for the deployment/metrics-server in kube-system namespace.
- The affinity.nodeAffinity schedules these pods to the master nodes, as expected.
- After 1 hour running, none of the 3 pods had crashed.
- There could be similarities with issue #1031. The bug appeared after metrics-server v0.5.2, that works well, and was very preeminent with v0.6.1. I was hoping that it would be fixed by v0.6.2.
- These resources and verbs are allowed to the ServiceAccount/metrics-server :
PolicyRule:
Resources Non-Resource URLs Resource Names Verbs
--------- ----------------- -------------- -----
nodes/metrics [] [] [get list watch]
nodes/stats [] [] [get list watch]
nodes [] [] [get list watch]
pods/metrics [] [] [get list watch]
pods/stats [] [] [get list watch]
pods [] [] [get list watch]
- Deployment/metrics-server args section - note that we have increased the value of
metric-resolutionat the last line, but it did not clearly helped :
spec:
replicas: 3
[...]
template:
spec:
containers:
- name: metrics-server
image: registry.k8s.io/metrics-server/metrics-server:v0.6.2
args:
- --logtostderr
- --cert-dir=/tmp
- --secure-port=4443
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --kubelet-insecure-tls
- --metric-resolution=20s
- As requested, APIService status :
$ kubectl describe apiservice v1beta1.metrics.k8s.io
Name: v1beta1.metrics.k8s.io
Namespace:
Labels: addonmanager.kubernetes.io/mode=Reconcile
Annotations: <none>
API Version: apiregistration.k8s.io/v1
Kind: APIService
Metadata:
Creation Timestamp: 2021-05-03T06:28:21Z
[...]
Spec:
Group: metrics.k8s.io
Group Priority Minimum: 100
Insecure Skip TLS Verify: true
Service:
Name: metrics-server
Namespace: kube-system
Port: 443
Version: v1beta1
Version Priority: 100
Status:
Conditions:
Last Transition Time: 2022-11-08T11:20:14Z
Message: all checks passed
Reason: Passed
Status: True
Type: Available
Events: <none>
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 2
- Comments: 26 (14 by maintainers)
Same issue here, using metrics-server 0.6.2. Kubernetes cluster v1.24.1 . Oracle Cloud.
No worries at all. These errors indicates that metrics-server was not able to reach kubelet on some nodes, so it might be worth looking into the status of the network at the time of the incident.
It might have been caused by https://github.com/kubernetes/kubernetes/pull/110880 then since it was fixed in 1.25