prometheus: Configurable minValidTime in the head block
Introduction
Head block has a concept of minValidTime
wherein it won’t allow samples before 1h to be ingested (Called out of bound samples)
Proposal
Here I propose making minValidTime
configurable which is solely for external users of TSDB and does not affect the functioning of Prometheus.
Sometimes we want to ingest some old data because of long outages lasting >1h. In such cases, the minValidTime
is a blocker. Making it configurable will allow ingesting old data.
Out of order samples are still discarded.
Changes required
To configure minValidTime
There are 2 possible ways we could make it configurable.
-
Introduce a method
SetMinValidTime()
which will set it to the value passed. But this one is easy to make mistakes. -
Pass a config, let’s say
MinValidTimeGracePeriod
, which will accept samples up toMinValidTimeGracePeriod
back in time from the currentminValidTime
that we calculate.
To the compaction
Allowing old samples can interfere with head compaction. So the compaction check for Head can be changed
from head.maxt - head.mint >= chunkRange * 3/2
to if len(blocks) > 0 { head.maxt - lastBlock.maxt >= chunkRange * 3/2 } else { old condition }
Vertical compaction and vertical queries needs to be enabled to take care of overlapping data.
Is it safe
Memory: It is proportional to the number of series. With m-mapping in place, taking in lots of old samples is not a problem.
Data: The vertical query and vertical compaction take care of everything. The user must be aware of what they are getting into.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 2
- Comments: 40 (40 by maintainers)
So, point 2 above is unavoidable, i.e., possible overlapping blocks after restart even if there was no sample in the grace period. I don’t see how it can be tackled without an inefficient lookup in the blocks for many samples (or) modification in the WAL record (out of scope for this). So I think this trade-off is something we have to live with if we want configurable min valid time. @brian-brazil @bwplotka
This PR that we plan to do does not solve all problems of full https://github.com/prometheus/prometheus/issues/535 - however it’s a good first step. I would focus on “short term backfill capabilities” and then we are already planning with @codesome and @pracucci to think about some compaction planner / logic to fix the large blocks problem that has to be solved for https://github.com/prometheus/prometheus/issues/535
Let’s focus on this first step only for now I think 🤗
Yes, this is an amazing step forward in to backfillable Remote Write 👍