kubernetes: Impossibly high cpu usage value from kubelet embedded cadvisor (k8s 1.23.12)

What happened?

I am querying kubelet embedded cadvisor /stats/summary endpoint to get summary metrics and am seeing impossibly high values for the cpu usage metric for containers

Sample value:

"cpu"=>{
  "time"=>"2022-11-12T09:36:38Z", 
  "usageNanoCores"=>5213910257021878272, 
  "usageCoreNanoSeconds"=>0
},

It looks like there might be some uninitialized memory value or something that’s getting set in the usage variable? The node only has 2 cores available so this value for a sample container is incorrect.

I am querying for it approximately every minute and most of the samples look fine but then there comes one with the impossibly high value as below:

What did you expect to happen?

Normal values within the 2 core limit of the node at least.

How can we reproduce it (as minimally and precisely as possible)?

I created a kuberenets cluster and deployed this YAML file on it:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cpuburner
spec:
  replicas: 2400
  revisionHistoryLimit: 3
  selector:
    matchLabels:
      app: cpuburner
  template:
    metadata:
      labels:
        app: cpuburner
    spec:
      containers:
      - image: alexeiled/stress-ng:0.11.01
        args: ["--cpu", "0", "--cpu-method", "matrixprod", "--timeout", "240s"]
        name: cpuburner
        securityContext:
          allowPrivilegeEscalation: false
      securityContext:
        runAsUser: 2000

Query the /stats/summary from within container and record the responses when the cpuUsage metric is an impossible value.

Anything else we need to know?

No response

Kubernetes version

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.0", GitCommit:"c2b5237ccd9c0f1d600d3072634ca66cefdf272f", GitTreeState:"clean", BuildDate:"2021-08-04T18:03:20Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"windows/amd64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.12", GitCommit:"f941a31f4515c5ac03f5fc7ccf9a330e3510b80d", GitTreeState:"clean", BuildDate:"2022-11-09T17:12:33Z", GoVersion:"go1.17.13", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider

Azure (AKS)

OS version



# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture

BuildNumber  Caption                   OSArchitecture  Version
19044        Microsoft Windows 10 Pro  64-bit          10.0.19044

</details>


### Install tools

<details>

</details>


### Container runtime (CRI) and version (if applicable)

<details>

</details>


### Related plugins (CNI, CSI, ...) and versions (if applicable)

<details>

</details>

About this issue

Original URL
State: closed
Created 2 years ago
Comments: 25 (11 by maintainers)

Commits related to this issue

NR-139168: Fix k8s.container.cpuCoresUtilization metric calculation Background (as observed by folks on slack thread - https://newrelic.slack.com/archives/C043X7A8JRF/p1687808604387229): The 'cpuCore... — committed to newrelic/nri-kubernetes by sachin-shankar a year ago
NR-139168: Fix k8s.container.cpuCoresUtilization metric calculation Background (as observed by folks on slack thread - https://newrelic.slack.com/archives/C043X7A8JRF/p1687808604387229): The 'cpuCore... — committed to newrelic/nri-kubernetes by sachin-shankar a year ago
NR-139168: Fix k8s.container.cpuCoresUtilization metric calculation (#817) * NR-139168: Fix k8s.container.cpuCoresUtilization metric calculation Background (as observed by folks on slack thread - ... — committed to newrelic/nri-kubernetes by sachin-shankar a year ago

Most upvoted comments

Yes, I still have the repro in my cluster with the above YAML file. It only happens randomly for some cadvisor calls. Most of them return the proper values.

bragi92 on Dec 12, 2022

I am going to try using the only_cpu_and_memory=true flag and see if the issue reproduces. Its very inconsistent so will update the issue with my findings once I have run it for at least a couple of days/if the error surfaces sooner.

bragi92 on Dec 7, 2022