prometheus: irate() strange behavior when values stepping is not the default scrap interval

Considering the following PromQL request: base_cpu_times{hostname="$hostname", mode="idle"}

Yielding the following values:

values:Array[8]
0:Array[1558252710,10633786.735]
1:Array[1558252740,10633816.69]
2:Array[1558252770,10633846.635]
3:Array[1558252800,10633876.575]
4:Array[1558252830,10633906.524999999]
5:Array[1558252860,10633936.47]
6:Array[1558252890,10633966.42]
7:Array[1558252920,10633996.365]

And now the same query, with irate: irate(base_cpu_times{hostname="$hostname", mode="idle"}[$__interval])

Yielding:

0:Array[1558252710,0.9976666666567325]
1:Array[1558252740,0.9985000000024835]
2:Array[1558252770,0.9981666666766008]
3:Array[1558252800,0.9979999999826153]
4:Array[1558252830,3.195135826765516]
5:Array[1558252860,3.194424690782829]
6:Array[1558252890,3.19442469051791]
7:Array[1558252920,3.193713554270304]

Starting with the 4th sample, irate is going postal. The only thing I can think of is the difference of the scrapping interval (15s) and the actual distance between samples (30s). Does it influence extrapolation or reset detection of the irate function ?

Anyway the 4th sample should not it be: (10633906.524999999 - 10633876.575) / (1558252830 - 1558252800) = 0,9983333333 ?

About this issue

  • Original URL
  • State: open
  • Created 5 years ago
  • Comments: 22 (13 by maintainers)

Most upvoted comments

You haven’t posted your curl query, but I assume you are not looking at raw numbers.

I did double-check with raw numbers, I felt the screenshot would be easier to digest than 50kB of JSON objects. Easy enough to fetch more data. prometheus_rate_function_weird.txt

I see similar results if I use rate(), irate() or increase(). Starting to think I should be filing my own bug report not ridin on someone else’s.

Edit: I used rate([5m]) over a five minute interval. Re-uploaded the text file with rate([1m]) instead.

Sorry, my bad. My brain unescaped wrongly. 😮)

Then, indeed, the result you are seeing cannot be explained from the data queried. I still don’t think that irate is implemented wrongly. That bug would have shown up a million times over the last five years. My best guess is that it has to do with remote read. Perhaps the data queried from the remote read backend is slightly different from within the irate evaluation compared to retrieving the raw counter directly. However, I’m not very familiar with the internals of the remote read interface. Perhaps @juliusv can guess in a more educated fashion?