psutil: CPU steal stuck at 100%
Occasionally psutil returns a CPU steal time of 100%, the only way to get this back to the correct value is by rebooting the system.
cpu:
{
"0": {
"guest": 0.0,
"guest_nice": 0.0,
"idle": 0.0,
"iowait": 0.0,
"irq": 0.0,
"nice": 0.0,
"softirq": 0.0,
"steal": 100.0,
"system": 0.0,
"user": 0.0
},
"1": {
"guest": 0.0,
"guest_nice": 0.0,
"idle": 0.0,
"iowait": 0.0,
"irq": 0.0,
"nice": 0.0,
"softirq": 0.0,
"steal": 100.0,
"system": 0.0,
"user": 0.0
}
}
top - 10:25:55 up 46 days, 20:48, 1 user, load average: 0,34, 0,19, 0,15
Tasks: 120 total, 1 running, 119 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0,0 us, 0,0 sy, 0,0 ni, 99,7 id, 0,3 wa, 0,0 hi, 0,0 si, 0,0 st
KiB Mem : 8173956 total, 496288 free, 6969612 used, 708056 buff/cache
KiB Swap: 2097148 total, 349612 free, 1747536 used. 894700 avail Mem
Running version 5.4.3 on AWS Ubuntu 16.04 xenial.
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 16 (6 by maintainers)
Commits related to this issue
- Ignore negative deltas in cpu times when calculating percentages. Fixes #1210 — committed to Infinidat/psutil by deleted user 6 years ago
- Ignore negative deltas in cpu times when calculating percentages (#1210) — committed to Infinidat/psutil by deleted user 6 years ago
- Ignore negative deltas in cpu times when calculating percentages (#1210) — committed to Infinidat/psutil by deleted user 6 years ago
- Ignore negative deltas in cpu times when calculating percentages (#1210) — committed to Infinidat/psutil by deleted user 6 years ago
- Ignore negative deltas in cpu times when calculating percentages (#1210) — committed to Infinidat/psutil by deleted user 6 years ago
- Ignore negative deltas in cpu times when calculating percentages (#1210) (#1214) — committed to giampaolo/psutil by wiggin15 6 years ago
- #1210, #1214: update README and give CREDITs to @wiggin15 — committed to giampaolo/psutil by giampaolo 6 years ago
- Re. #1210: add doc warning explaining that cpu_times() values can sometimes go backwards Signed-off-by: Giampaolo Rodola <g.rodola@gmail.com> — committed to giampaolo/psutil by giampaolo 3 years ago
From what I see in the logs requested by @giampaolo , the “steal” value actually decreases every second instead of going up (the values are supposed to be cumulative). Looking at the first two results:
When we count the percentage, we divide the difference in the specific field (steal) with the total difference of the cpu times. In this case almost all of the difference is the decrease in steal time so we return 100%:
A decrease in the cumulative steal time should not happen, but apparently can happen erroneously: https://0xstubs.org/debugging-a-flaky-cpu-steal-time-counter-on-a-paravirtualized-xen-guest/
psutil should probably ignore negative differences if values in “/proc/stat” decrease. Something like:
“top” is doing this: https://github.com/thlorenz/procps/blob/faa41f864a599854ceafa4ea634b29a6924bbbe6/deps/procps/top/top.c#L5017
Thanks!
I currently own the vps affected by this problem, in case something need to be tested on the machine itself.