prometheus: Query/Rule Time Regression

What did you do? Upgraded from Prometheus 2.8.1 to 2.16.0

What did you expect to see? Better or the same query/recording rule timing.

What did you see instead? Under which circumstances? Instantly saw recording rule times increase by 3-5x. Other queries taking 60-100ms longer to return results.

Environment

  • System information:

    Linux 4.14.94+ x86_64

  • Prometheus version:

    2.16.0

I have attached some images to show the differences.

This image shows a few queries on a 2.8 instance. 2 8

These are the same queries on the same instance after upgrading to 2.16.0 (notice the load time is 60ms more). We noticed this due to the prometheus.html console taking ~60 seconds to load due to the increased query time (it shows a listing of all Prometheus instances plus their memory, TSDB and ingestion stats). 2 16

And here is the same instance showing how a recording rule group went from nearly instant to 5 seconds. recording-rules

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 58 (58 by maintainers)

Most upvoted comments

@ahurtaud I believe you are probably impacted by #6841 as that is only in 2.17.0. The issue I am seeing is present in 2.16.0 as well.

@bwplotka Yes, reverting fixes the regression I see.