kubernetes: Impossibly high cpu usage value from kubelet embedded cadvisor (k8s 1.23.12)
What happened?
I am querying kubelet embedded cadvisor /stats/summary endpoint to get summary metrics and am seeing impossibly high values for the cpu usage metric for containers
Sample value:
"cpu"=>{
"time"=>"2022-11-12T09:36:38Z",
"usageNanoCores"=>5213910257021878272,
"usageCoreNanoSeconds"=>0
},
It looks like there might be some uninitialized memory value or something that’s getting set in the usage variable? The node only has 2 cores available so this value for a sample container is incorrect.
I am querying for it approximately every minute and most of the samples look fine but then there comes one with the impossibly high value as below:

What did you expect to happen?
Normal values within the 2 core limit of the node at least.
How can we reproduce it (as minimally and precisely as possible)?
I created a kuberenets cluster and deployed this YAML file on it:
apiVersion: apps/v1
kind: Deployment
metadata:
name: cpuburner
spec:
replicas: 2400
revisionHistoryLimit: 3
selector:
matchLabels:
app: cpuburner
template:
metadata:
labels:
app: cpuburner
spec:
containers:
- image: alexeiled/stress-ng:0.11.01
args: ["--cpu", "0", "--cpu-method", "matrixprod", "--timeout", "240s"]
name: cpuburner
securityContext:
allowPrivilegeEscalation: false
securityContext:
runAsUser: 2000
Query the /stats/summary from within container and record the responses when the cpuUsage metric is an impossible value.
Anything else we need to know?
No response
Kubernetes version
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.0", GitCommit:"c2b5237ccd9c0f1d600d3072634ca66cefdf272f", GitTreeState:"clean", BuildDate:"2021-08-04T18:03:20Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"windows/amd64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.12", GitCommit:"f941a31f4515c5ac03f5fc7ccf9a330e3510b80d", GitTreeState:"clean", BuildDate:"2022-11-09T17:12:33Z", GoVersion:"go1.17.13", Compiler:"gc", Platform:"linux/amd64"}
Cloud provider
OS version
# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
BuildNumber Caption OSArchitecture Version
19044 Microsoft Windows 10 Pro 64-bit 10.0.19044
</details>
### Install tools
<details>
</details>
### Container runtime (CRI) and version (if applicable)
<details>
</details>
### Related plugins (CNI, CSI, ...) and versions (if applicable)
<details>
</details>
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 25 (11 by maintainers)
Commits related to this issue
- NR-139168: Fix k8s.container.cpuCoresUtilization metric calculation Background (as observed by folks on slack thread - https://newrelic.slack.com/archives/C043X7A8JRF/p1687808604387229): The 'cpuCore... — committed to newrelic/nri-kubernetes by sachin-shankar a year ago
- NR-139168: Fix k8s.container.cpuCoresUtilization metric calculation Background (as observed by folks on slack thread - https://newrelic.slack.com/archives/C043X7A8JRF/p1687808604387229): The 'cpuCore... — committed to newrelic/nri-kubernetes by sachin-shankar a year ago
- NR-139168: Fix k8s.container.cpuCoresUtilization metric calculation (#817) * NR-139168: Fix k8s.container.cpuCoresUtilization metric calculation Background (as observed by folks on slack thread - ... — committed to newrelic/nri-kubernetes by sachin-shankar a year ago
Yes, I still have the repro in my cluster with the above YAML file. It only happens randomly for some cadvisor calls. Most of them return the proper values.
I am going to try using the
only_cpu_and_memory=trueflag and see if the issue reproduces. Its very inconsistent so will update the issue with my findings once I have run it for at least a couple of days/if the error surfaces sooner.