prometheus: Rule evaluation & scrapes are frequently blocked or delayed with Prometheus >= v2.33.0

What did you do?

Upgraded Prometheus from v2.32.1 to v2.33.1

What did you expect to see?

Similar performance and smooth graphs.

What did you see instead? Under which circumstances?

It appears that starting with v2.33.0 Prometheus hits some scalability limits for us. Something takes longer than it used to OR it started to block when it didn’t use to.

We’ve started noticing that ever since we’ve upgrade Prometheus from v2.32.1 to v2.33.1 (same issue with v2.33.4) we’re seeing an issue where on our biggest instances every 30 minutes we have:

some counter updates are delayed, which looks like either scrapes are getting delayed - I only mention counters here because it’s more visible on counters vs gauge, so either the actual HTTP scrape is delayed OR sample insertion to tsdb is delayed (or wherever sample timestamp is set on scrape)
we see a massive spike in rule evaluation duration
we see rule evaluation duration spike
we see rule evaluation timeouts query timed out in expression evaluation

Still trying to debug it, so far it doesn’t seem to be related to:

queries - we don’t see any elevated spike in queries
go-routines - they stay flat, so it’s not like Go gets so many goroutines that some gets left behind
cpu or memory - don’t see any elevated resource usage when this happens
chunk write queue - since this is new code added in 2.33 with a default queue size of 1000 and our metrics show that the rate of elements added to the queue spikes to around 250k/s I tested Prometheus with a bigger queue size (up to 50M) with no effect on this issue
query concurrency limit - since that’s 20 by default and we seem to usually issue more queries per second I suspected we might be queuing queries too much, but bumping this value up to 96 (on a server with 128 cores) didn’t seem to change anything

Since this is happening every 30 minutes and only seem to affect our biggest instances with ~15M time series I’ve checked what else is happening every 30 minutes. We do have --storage.tsdb.min-block-duration=30m & --storage.tsdb.max-block-duration=30m, mostly to reduce memory usage as we do have a fair amount of metrics churn and so more frequent HEAD compaction helps keep memory usage lower than it would be without it. What I’ve also noticed is that tsdb HEAD active appenders are spiking around the time of this issue. Likely because they spike when there’s HEAD/block compaction, so not sure if that’s the effect or the cause. Looking at historical metrics I see that it was always spiking around that time, but with 2.33 spikes are bigger. See metrics below:

19dm12 - v2.32.1 19dm13 - v2.33.4

screencapture-metrics-cfdata-org-graph-2022-03-01-10_40_04

I didn’t find any useful logs that would point me in any other direction so far and not sure what other metrics might be relevant here. Any tips on further debugging would be very helpful.

Environment

System information:

Linux 5.15.19 x86_64
Prometheus version:

insert output of prometheus --version here
Alertmanager version:

insert output of alertmanager --version here (if relevant to the issue)
Prometheus configuration file:

insert configuration here

Alertmanager configuration file:

insert configuration here (if relevant to the issue)

Logs:

insert Prometheus and Alertmanager logs relevant to the issue here

About this issue

Original URL
State: closed
Created 2 years ago
Reactions: 1
Comments: 47 (46 by maintainers)

Commits related to this issue

Use sync.Mutex for chunk write queue locks sync.RWMutex seem to be starving get operations when there's a lot of write operations because AFAIK it's a write preferring lock. During compaction or othe... — committed to prymitive/prometheus by prymitive 2 years ago
Use a linked list for memSeries.headChunk Currently memSeries holds a single head chunk in-memory and a slice of mmapped chunks. When append is called memSeries might decide that it needs to create a... — committed to prymitive/prometheus by prymitive a year ago
Use a linked list for memSeries.headChunk Currently memSeries holds a single head chunk in-memory and a slice of mmapped chunks. When append is called memSeries might decide that it needs to create a... — committed to prymitive/prometheus by prymitive a year ago
Use a linked list for memSeries.headChunk Currently memSeries holds a single head chunk in-memory and a slice of mmapped chunks. When append is called memSeries might decide that it needs to create a... — committed to prymitive/prometheus by prymitive a year ago
Use a linked list for memSeries.headChunk Currently memSeries holds a single head chunk in-memory and a slice of mmapped chunks. When append is called memSeries might decide that it needs to create a... — committed to prymitive/prometheus by prymitive a year ago
Use a linked list for memSeries.headChunk Currently memSeries holds a single head chunk in-memory and a slice of mmapped chunks. When append is called memSeries might decide that it needs to create a... — committed to prymitive/prometheus by prymitive a year ago
Use a linked list for memSeries.headChunk Currently memSeries holds a single head chunk in-memory and a slice of mmapped chunks. When append is called memSeries might decide that it needs to create a... — committed to prymitive/prometheus by prymitive a year ago
Use a linked list for memSeries.headChunk Currently memSeries holds a single head chunk in-memory and a slice of mmapped chunks. When append() is called on memSeries it might decide that a new headCh... — committed to prymitive/prometheus by prymitive a year ago
Use a linked list for memSeries.headChunk Currently memSeries holds a single head chunk in-memory and a slice of mmapped chunks. When append() is called on memSeries it might decide that a new headCh... — committed to prymitive/prometheus by prymitive a year ago
Use a linked list for memSeries.headChunk Currently memSeries holds a single head chunk in-memory and a slice of mmapped chunks. When append() is called on memSeries it might decide that a new headCh... — committed to prymitive/prometheus by prymitive a year ago
Use a linked list for memSeries.headChunk Currently memSeries holds a single head chunk in-memory and a slice of mmapped chunks. When append() is called on memSeries it might decide that a new headCh... — committed to prymitive/prometheus by prymitive a year ago
Use a linked list for memSeries.headChunk Currently memSeries holds a single head chunk in-memory and a slice of mmapped chunks. When append() is called on memSeries it might decide that a new headCh... — committed to prymitive/prometheus by prymitive a year ago
Use a linked list for memSeries.headChunk Currently memSeries holds a single head chunk in-memory and a slice of mmapped chunks. When append() is called on memSeries it might decide that a new headCh... — committed to prymitive/prometheus by prymitive a year ago
Use a linked list for memSeries.headChunk Currently memSeries holds a single head chunk in-memory and a slice of mmapped chunks. When append() is called on memSeries it might decide that a new headCh... — committed to prymitive/prometheus by prymitive a year ago
Use a linked list for memSeries.headChunk Currently memSeries holds a single head chunk in-memory and a slice of mmapped chunks. When append() is called on memSeries it might decide that a new headCh... — committed to prymitive/prometheus by prymitive a year ago
Use a linked list for memSeries.headChunk Currently memSeries holds a single head chunk in-memory and a slice of mmapped chunks. When append() is called on memSeries it might decide that a new headCh... — committed to prymitive/prometheus by prymitive a year ago
Use a linked list for memSeries.headChunk Currently memSeries holds a single head chunk in-memory and a slice of mmapped chunks. When append() is called on memSeries it might decide that a new headCh... — committed to prymitive/prometheus by prymitive a year ago
Use a linked list for memSeries.headChunk Currently memSeries holds a single head chunk in-memory and a slice of mmapped chunks. When append() is called on memSeries it might decide that a new headCh... — committed to prymitive/prometheus by prymitive a year ago
Use a linked list for memSeries.headChunk Currently memSeries holds a single head chunk in-memory and a slice of mmapped chunks. When append() is called on memSeries it might decide that a new headCh... — committed to prymitive/prometheus by prymitive a year ago
Use a linked list for memSeries.headChunk Currently memSeries holds a single head chunk in-memory and a slice of mmapped chunks. When append() is called on memSeries it might decide that a new headCh... — committed to prymitive/prometheus by prymitive a year ago

Most upvoted comments

I think we can start with adding an option to disable the queue (not by default) while we try to improve the performance. But it will go in v2.35.

We are suspecting it is something to do with expensive rules. So we are investigating in that line.

codesome on Mar 9, 2022

They are on my TODO list to review. Apologies for the delays. I hope to get to it sometime near the end of Feb.

codesome on Feb 9, 2023

Thanks @prymitive for running all the PRs and giving us numbers. Hopefully the updated https://github.com/prometheus/prometheus/pull/10425 now does not cause any issues.

codesome on Mar 10, 2022

To remove any pain for users, we will disable it by default while still being able to enable if required. And in the meantime we will run it enabled in our clusters and try to reduce the performance issues. We can remove it entirely if we are not able to make it better.

codesome on Mar 10, 2022

Will do, thanks!

prymitive on Mar 9, 2022