prometheus: TSDB performance regression in 2.21

What did you do?

Run sum(count_over_time(up[7d])) ; takes 21s in 2.21, 9s in 2.20.

2.20:

{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {},
        "value": [
          1598887073.694,
          "60537224"
        ]
      }
    ],
    "stats": {
      "timings": {
        "evalTotalTime": 8.329375888,
        "resultSortTime": 0,
        "queryPreparationTime": 0.015605704,
        "innerEvalTime": 8.313736971,
        "execQueueTime": 1.9189e-05,
        "execTotalTime": 8.329413218
      }
    }
  }
}

2.21:

{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {},
        "value": [
          1598887073.694,
          "60518026"
        ]
      }
    ],
    "stats": {
      "timings": {
        "evalTotalTime": 20.547211193,
        "resultSortTime": 0,
        "queryPreparationTime": 0.011826232,
        "innerEvalTime": 20.535358719,
        "execQueueTime": 2.3937e-05,
        "execTotalTime": 20.547251203
      }
    }
  }
}

Environment

  • Prometheus version:
$ prometheus --version
prometheus, version 2.21.0-rc.0 (branch: HEAD, revision: 1195cc24e3c8b9af8aeafcfc46473f6486ca3f64)
  build user:       root@1e754dfec932
  build date:       20200827-23:23:27
  go version:       go1.15

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 28 (28 by maintainers)

Commits related to this issue

Most upvoted comments

So actually we were looking on wrong heap… ;p

The reason is simple - we use vertical ‘safe’ merge even for series that does not overlap. Fix in progress.

The profile is on the same machine, and on prod the cpu usage is the same