metrics-server: metrics-server is restarting : failed to extract container metrics: proto: wrong wireType = 0 for field CPU

What happened:

I use AKS with 1.20

What you expected to happen:

metrics-server is restarting in a loop.

unable to fully collect metrics: unable to fully scrape metrics from source kubelet_summary: xxxxxxxxxxxxxxxxx: unable to fetch metrics from Kubelet xxxxxxxxxxxxxxxxx000000 (x.y.z.a): request failed - “500 Internal Server Error”, response: “Internal Error: failed to list pod stats: failed to list all container stats: rpc error: code = Unknown desc = failed to convert to cri containerd stats format: failed to decode container metrics for "d07f5d1038b47236dd6247505db6ba01b61f966a190655751b68746c37c56365": failed to extract container metrics: proto: wrong wireType = 0 for field CPU”

Anything else we need to know?:

I have many clusters (same configuration with terraform) with the same error message and the same issue. metrics server is restarting.

Environment:

  • Kubernetes distribution : AKS 1.20.5 (same issue with 1.19)
  • Container Network Setup (calico): (same issue with standard azure network)
  • Kubernetes version (use kubectl version):

Client Version: version.Info{Major:“1”, Minor:“21”, GitVersion:“v1.21.0”, GitCommit:“cb303e613a121a29364f75cc67d3d580833a7479”, GitTreeState:“clean”, BuildDate:“2021-04-08T16:31:21Z”, GoVersion:“go1.16.1”, Compiler:“gc”, Platform:“linux/amd64”} Server Version: version.Info{Major:“1”, Minor:“20”, GitVersion:“v1.20.5”, GitCommit:“54684493f8139456e5d2f963b23cb5003c4d8055”, GitTreeState:“clean”, BuildDate:“2021-03-22T23:02:59Z”, GoVersion:“go1.15.8”, Compiler:“gc”, Platform:“linux/amd64”}

I’m surprising that metrics-server is restarting when it fails to read metrics from a pod.

How to find the ID of this “bugged” pod from this log : failed to decode container metrics for "d07f5d1038b47236dd6247505db6ba01b61f966a190655751b68746c37c56365": failed to extract container metrics: proto: wrong wireType = 0 for field CPU"

regards

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 3
  • Comments: 15 (4 by maintainers)

Most upvoted comments

We are experiencing similar issues. Once the issue starts happening on a node it continues until the metrics-server is restarted and then it is hit and miss as to whether the issue jumps to another node. The consistent but of the error message no matter which node the error happens on is “failed to extract container metrics: proto: wrong wireType = 0 for field CPU”

We are running AKS with kubernetes version 1.19.9 and this happens on both windows and linux nodes.

I got similar case but seems not in restarting loops.

kubectl get po -n kube-system -o wide | grep met
metrics-server-6dbb45d8c-7dk8z              1/1     Running   0          74m     10.243.0.97    aks-pool0b4ms-42880648-vmss00000k   <none>           <none>

stern -n kube-system --tail=50 metrics

metrics-server-6dbb45d8c-7dk8z metrics-server E0506 04:32:07.933485       1 reststorage.go:135] unable to fetch node metrics for node "aks-pool0b4ms-42880648-vmss00000k": no metrics known for node
metrics-server-6dbb45d8c-7dk8z metrics-server E0506 04:32:07.933520       1 reststorage.go:135] unable to fetch node metrics for node "virtual-node-aci-linux": no metrics known for node
metrics-server-6dbb45d8c-7dk8z metrics-server E0506 04:32:37.969101       1 reststorage.go:135] unable to fetch node metrics for node "virtual-node-aci-linux": no metrics known for node
metrics-server-6dbb45d8c-7dk8z metrics-server E0506 04:32:37.969140       1 reststorage.go:135] unable to fetch node metrics for node "aks-pool0b4ms-42880648-vmss00000k": no metrics known for node
metrics-server-6dbb45d8c-7dk8z metrics-server E0506 04:33:07.468151       1 manager.go:111] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:virtual-node-aci-linux: unable to get valid timestamp for metric point for node "10.243.0.79", discarding data: no non-zero timestamp on either CPU or memory, unable to fully scrape metrics from source kubelet_summary:aks-pool0b4ms-42880648-vmss00000k: unable to fetch metrics from Kubelet aks-pool0b4ms-42880648-vmss00000k (10.243.0.97): request failed - "500 Internal Server Error", response: "Internal Error: failed to list pod stats: failed to list all container stats: rpc error: code = Unknown desc = failed to convert to cri containerd stats format: failed to decode container metrics for \"a34097d1f719c21092987b32a3e2225e1659fda1bcdd4472ce77ed9f364ba38c\": failed to extract container metrics: proto: wrong wireType = 0 for field CPU"]

Similar messages above are shown periodically.

$ kubectl get nodes
NAME                                STATUS   ROLES   AGE     VERSION
aks-pool0b4ms-42880648-vmss00000a   Ready    agent   11d     v1.20.5
aks-pool0b4ms-42880648-vmss00000h   Ready    agent   6d22h   v1.20.5
aks-pool0b4ms-42880648-vmss00000k   Ready    agent   2d22h   v1.20.5
virtual-node-aci-linux              Ready    agent   24d     v1.18.4-vk-azure-aci-v1.3.5
$ kubectl top nodes
W0506 14:48:49.130675    9695 top_node.go:119] Using json format to get metrics. Next release will switch to protocol-buffers, switch early by passing --use-protocol-buffers flag
NAME                                CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%     
aks-pool0b4ms-42880648-vmss00000a   1844m        47%    4502Mi          35%         
aks-pool0b4ms-42880648-vmss00000h   1706m        44%    8583Mi          68%         
aks-pool0b4ms-42880648-vmss00000k   <unknown>                           <unknown>               <unknown>               <unknown>               
virtual-node-aci-linux              <unknown>                           <unknown>               <unknown>               <unknown>    

Environment

  • AKS 1.20.5
  • Container Network Setup: (CNI standard azure network)
  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"clean", BuildDate:"2021-04-08T16:31:21Z", GoVersion:"go1.16.1", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.5", GitCommit:"54684493f8139456e5d2f963b23cb5003c4d8055", GitTreeState:"clean", BuildDate:"2021-03-22T23:02:59Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}