prometheus: Rule evaluation & scrapes are frequently blocked or delayed with Prometheus >= v2.33.0
What did you do?
Upgraded Prometheus from v2.32.1 to v2.33.1
What did you expect to see?
Similar performance and smooth graphs.
What did you see instead? Under which circumstances?
It appears that starting with v2.33.0 Prometheus hits some scalability limits for us. Something takes longer than it used to OR it started to block when it didn’t use to.
We’ve started noticing that ever since we’ve upgrade Prometheus from v2.32.1 to v2.33.1 (same issue with v2.33.4) we’re seeing an issue where on our biggest instances every 30 minutes we have:
- some counter updates are delayed, which looks like either scrapes are getting delayed - I only mention counters here because it’s more visible on counters vs gauge, so either the actual HTTP scrape is delayed OR sample insertion to tsdb is delayed (or wherever sample timestamp is set on scrape)
- we see a massive spike in rule evaluation duration
- we see rule evaluation duration spike
- we see rule evaluation timeouts
query timed out in expression evaluation
Still trying to debug it, so far it doesn’t seem to be related to:
- queries - we don’t see any elevated spike in queries
- go-routines - they stay flat, so it’s not like Go gets so many goroutines that some gets left behind
- cpu or memory - don’t see any elevated resource usage when this happens
- chunk write queue - since this is new code added in 2.33 with a default queue size of 1000 and our metrics show that the rate of elements added to the queue spikes to around 250k/s I tested Prometheus with a bigger queue size (up to 50M) with no effect on this issue
- query concurrency limit - since that’s 20 by default and we seem to usually issue more queries per second I suspected we might be queuing queries too much, but bumping this value up to 96 (on a server with 128 cores) didn’t seem to change anything
Since this is happening every 30 minutes and only seem to affect our biggest instances with ~15M time series I’ve checked what else is happening every 30 minutes. We do have --storage.tsdb.min-block-duration=30m
& --storage.tsdb.max-block-duration=30m
, mostly to reduce memory usage as we do have a fair amount of metrics churn and so more frequent HEAD compaction helps keep memory usage lower than it would be without it.
What I’ve also noticed is that tsdb HEAD active appenders are spiking around the time of this issue. Likely because they spike when there’s HEAD/block compaction, so not sure if that’s the effect or the cause. Looking at historical metrics I see that it was always spiking around that time, but with 2.33 spikes are bigger. See metrics below:
19dm12 - v2.32.1 19dm13 - v2.33.4
I didn’t find any useful logs that would point me in any other direction so far and not sure what other metrics might be relevant here. Any tips on further debugging would be very helpful.
Environment
-
System information:
Linux 5.15.19 x86_64
-
Prometheus version:
insert output of
prometheus --version
here -
Alertmanager version:
insert output of
alertmanager --version
here (if relevant to the issue) -
Prometheus configuration file:
insert configuration here
- Alertmanager configuration file:
insert configuration here (if relevant to the issue)
- Logs:
insert Prometheus and Alertmanager logs relevant to the issue here
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 1
- Comments: 47 (46 by maintainers)
Commits related to this issue
- Use sync.Mutex for chunk write queue locks sync.RWMutex seem to be starving get operations when there's a lot of write operations because AFAIK it's a write preferring lock. During compaction or othe... — committed to prymitive/prometheus by prymitive 2 years ago
- Use a linked list for memSeries.headChunk Currently memSeries holds a single head chunk in-memory and a slice of mmapped chunks. When append is called memSeries might decide that it needs to create a... — committed to prymitive/prometheus by prymitive a year ago
- Use a linked list for memSeries.headChunk Currently memSeries holds a single head chunk in-memory and a slice of mmapped chunks. When append is called memSeries might decide that it needs to create a... — committed to prymitive/prometheus by prymitive a year ago
- Use a linked list for memSeries.headChunk Currently memSeries holds a single head chunk in-memory and a slice of mmapped chunks. When append is called memSeries might decide that it needs to create a... — committed to prymitive/prometheus by prymitive a year ago
- Use a linked list for memSeries.headChunk Currently memSeries holds a single head chunk in-memory and a slice of mmapped chunks. When append is called memSeries might decide that it needs to create a... — committed to prymitive/prometheus by prymitive a year ago
- Use a linked list for memSeries.headChunk Currently memSeries holds a single head chunk in-memory and a slice of mmapped chunks. When append is called memSeries might decide that it needs to create a... — committed to prymitive/prometheus by prymitive a year ago
- Use a linked list for memSeries.headChunk Currently memSeries holds a single head chunk in-memory and a slice of mmapped chunks. When append is called memSeries might decide that it needs to create a... — committed to prymitive/prometheus by prymitive a year ago
- Use a linked list for memSeries.headChunk Currently memSeries holds a single head chunk in-memory and a slice of mmapped chunks. When append() is called on memSeries it might decide that a new headCh... — committed to prymitive/prometheus by prymitive a year ago
- Use a linked list for memSeries.headChunk Currently memSeries holds a single head chunk in-memory and a slice of mmapped chunks. When append() is called on memSeries it might decide that a new headCh... — committed to prymitive/prometheus by prymitive a year ago
- Use a linked list for memSeries.headChunk Currently memSeries holds a single head chunk in-memory and a slice of mmapped chunks. When append() is called on memSeries it might decide that a new headCh... — committed to prymitive/prometheus by prymitive a year ago
- Use a linked list for memSeries.headChunk Currently memSeries holds a single head chunk in-memory and a slice of mmapped chunks. When append() is called on memSeries it might decide that a new headCh... — committed to prymitive/prometheus by prymitive a year ago
- Use a linked list for memSeries.headChunk Currently memSeries holds a single head chunk in-memory and a slice of mmapped chunks. When append() is called on memSeries it might decide that a new headCh... — committed to prymitive/prometheus by prymitive a year ago
- Use a linked list for memSeries.headChunk Currently memSeries holds a single head chunk in-memory and a slice of mmapped chunks. When append() is called on memSeries it might decide that a new headCh... — committed to prymitive/prometheus by prymitive a year ago
- Use a linked list for memSeries.headChunk Currently memSeries holds a single head chunk in-memory and a slice of mmapped chunks. When append() is called on memSeries it might decide that a new headCh... — committed to prymitive/prometheus by prymitive a year ago
- Use a linked list for memSeries.headChunk Currently memSeries holds a single head chunk in-memory and a slice of mmapped chunks. When append() is called on memSeries it might decide that a new headCh... — committed to prymitive/prometheus by prymitive a year ago
- Use a linked list for memSeries.headChunk Currently memSeries holds a single head chunk in-memory and a slice of mmapped chunks. When append() is called on memSeries it might decide that a new headCh... — committed to prymitive/prometheus by prymitive a year ago
- Use a linked list for memSeries.headChunk Currently memSeries holds a single head chunk in-memory and a slice of mmapped chunks. When append() is called on memSeries it might decide that a new headCh... — committed to prymitive/prometheus by prymitive a year ago
- Use a linked list for memSeries.headChunk Currently memSeries holds a single head chunk in-memory and a slice of mmapped chunks. When append() is called on memSeries it might decide that a new headCh... — committed to prymitive/prometheus by prymitive a year ago
- Use a linked list for memSeries.headChunk Currently memSeries holds a single head chunk in-memory and a slice of mmapped chunks. When append() is called on memSeries it might decide that a new headCh... — committed to prymitive/prometheus by prymitive a year ago
- Use a linked list for memSeries.headChunk Currently memSeries holds a single head chunk in-memory and a slice of mmapped chunks. When append() is called on memSeries it might decide that a new headCh... — committed to prymitive/prometheus by prymitive a year ago
I think we can start with adding an option to disable the queue (not by default) while we try to improve the performance. But it will go in v2.35.
We are suspecting it is something to do with expensive rules. So we are investigating in that line.
They are on my TODO list to review. Apologies for the delays. I hope to get to it sometime near the end of Feb.
Thanks @prymitive for running all the PRs and giving us numbers. Hopefully the updated https://github.com/prometheus/prometheus/pull/10425 now does not cause any issues.
To remove any pain for users, we will disable it by default while still being able to enable if required. And in the meantime we will run it enabled in our clusters and try to reduce the performance issues. We can remove it entirely if we are not able to make it better.
Will do, thanks!