psutil: CPU steal stuck at 100%

Occasionally psutil returns a CPU steal time of 100%, the only way to get this back to the correct value is by rebooting the system.

cpu:
{
"0": {
"guest": 0.0, 
"guest_nice": 0.0, 
"idle": 0.0, 
"iowait": 0.0, 
"irq": 0.0, 
"nice": 0.0, 
"softirq": 0.0, 
"steal": 100.0, 
"system": 0.0, 
"user": 0.0
}, 
"1": {
"guest": 0.0, 
"guest_nice": 0.0, 
"idle": 0.0, 
"iowait": 0.0, 
"irq": 0.0, 
"nice": 0.0, 
"softirq": 0.0, 
"steal": 100.0, 
"system": 0.0, 
"user": 0.0
}
}

top - 10:25:55 up 46 days, 20:48, 1 user, load average: 0,34, 0,19, 0,15
Tasks: 120 total, 1 running, 119 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0,0 us, 0,0 sy, 0,0 ni, 99,7 id, 0,3 wa, 0,0 hi, 0,0 si, 0,0 st
KiB Mem : 8173956 total, 496288 free, 6969612 used, 708056 buff/cache
KiB Swap: 2097148 total, 349612 free, 1747536 used. 894700 avail Mem

Running version 5.4.3 on AWS Ubuntu 16.04 xenial.

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 16 (6 by maintainers)

Commits related to this issue

Most upvoted comments

From what I see in the logs requested by @giampaolo , the “steal” value actually decreases every second instead of going up (the values are supposed to be cumulative). Looking at the first two results:

steal=14416055402.18
steal=14395838578.8

When we count the percentage, we divide the difference in the specific field (steal) with the total difference of the cpu times. In this case almost all of the difference is the decrease in steal time so we return 100%:

all_delta = -20216821.48000145
field_delta =-20216823.380001068
field_perc = (100 * -20216823.380001068) / (-20216821.48000145) = 100.00000939811245

A decrease in the cumulative steal time should not happen, but apparently can happen erroneously: https://0xstubs.org/debugging-a-flaky-cpu-steal-time-counter-on-a-paravirtualized-xen-guest/

psutil should probably ignore negative differences if values in “/proc/stat” decrease. Something like:

all_delta = max(0, min_delta)
field_delta = max(0, field_delta)

“top” is doing this: https://github.com/thlorenz/procps/blob/faa41f864a599854ceafa4ea634b29a6924bbbe6/deps/procps/top/top.c#L5017

Yes, you need to be a collaborator. I just made you one (then I suppose you can assign the issue to yourself).

Thanks!

I currently own the vps affected by this problem, in case something need to be tested on the machine itself.