kubernetes: Impossibly high cpu usage value from kubelet embedded cadvisor (k8s 1.23.12)

What happened?

I am querying kubelet embedded cadvisor /stats/summary endpoint to get summary metrics and am seeing impossibly high values for the cpu usage metric for containers

Sample value:

"cpu"=>{
  "time"=>"2022-11-12T09:36:38Z", 
  "usageNanoCores"=>5213910257021878272, 
  "usageCoreNanoSeconds"=>0
},

It looks like there might be some uninitialized memory value or something that’s getting set in the usage variable? The node only has 2 cores available so this value for a sample container is incorrect.

I am querying for it approximately every minute and most of the samples look fine but then there comes one with the impossibly high value as below: image

What did you expect to happen?

Normal values within the 2 core limit of the node at least.

How can we reproduce it (as minimally and precisely as possible)?

I created a kuberenets cluster and deployed this YAML file on it:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cpuburner
spec:
  replicas: 2400
  revisionHistoryLimit: 3
  selector:
    matchLabels:
      app: cpuburner
  template:
    metadata:
      labels:
        app: cpuburner
    spec:
      containers:
      - image: alexeiled/stress-ng:0.11.01
        args: ["--cpu", "0", "--cpu-method", "matrixprod", "--timeout", "240s"]
        name: cpuburner
        securityContext:
          allowPrivilegeEscalation: false
      securityContext:
        runAsUser: 2000

Query the /stats/summary from within container and record the responses when the cpuUsage metric is an impossible value.

Anything else we need to know?

No response

Kubernetes version

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.0", GitCommit:"c2b5237ccd9c0f1d600d3072634ca66cefdf272f", GitTreeState:"clean", BuildDate:"2021-08-04T18:03:20Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"windows/amd64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.12", GitCommit:"f941a31f4515c5ac03f5fc7ccf9a330e3510b80d", GitTreeState:"clean", BuildDate:"2022-11-09T17:12:33Z", GoVersion:"go1.17.13", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider

Azure (AKS)

OS version



# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture

BuildNumber  Caption                   OSArchitecture  Version
19044        Microsoft Windows 10 Pro  64-bit          10.0.19044

</details>


### Install tools

<details>

</details>


### Container runtime (CRI) and version (if applicable)

<details>

</details>


### Related plugins (CNI, CSI, ...) and versions (if applicable)

<details>

</details>

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 25 (11 by maintainers)

Commits related to this issue

Most upvoted comments

Yes, I still have the repro in my cluster with the above YAML file. It only happens randomly for some cadvisor calls. Most of them return the proper values.

I am going to try using the only_cpu_and_memory=true flag and see if the issue reproduces. Its very inconsistent so will update the issue with my findings once I have run it for at least a couple of days/if the error surfaces sooner.