prometheus: Query/Rule Time Regression
What did you do? Upgraded from Prometheus 2.8.1 to 2.16.0
What did you expect to see? Better or the same query/recording rule timing.
What did you see instead? Under which circumstances? Instantly saw recording rule times increase by 3-5x. Other queries taking 60-100ms longer to return results.
Environment
-
System information:
Linux 4.14.94+ x86_64
-
Prometheus version:
2.16.0
I have attached some images to show the differences.
This image shows a few queries on a 2.8 instance.
These are the same queries on the same instance after upgrading to 2.16.0 (notice the load time is 60ms more). We noticed this due to the prometheus.html
console taking ~60 seconds to load due to the increased query time (it shows a listing of all Prometheus instances plus their memory, TSDB and ingestion stats).
And here is the same instance showing how a recording rule group went from nearly instant to 5 seconds.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 58 (58 by maintainers)
@ahurtaud I believe you are probably impacted by #6841 as that is only in 2.17.0. The issue I am seeing is present in 2.16.0 as well.
@bwplotka Yes, reverting fixes the regression I see.