VictoriaMetrics: query results may incorrectly overlap time series
Describe the bug
If query step is much greater than data interval (e.g. > 4x), and two series are adjacent in time but not overlapping, the query output or aggregation may incorrectly overlap the time series.
A typical example is build_info{version="..."}. A new version is deployed to instances, which stop updating build_info{version="1.0"} and start updating build_info{version="1.1"}. Due to this bug, at low zoom resolution there will be a point in time where count(build_info{instance="...")) returns 2, even though there is no overlap in the raw data points.
The equivalent query on Prometheus (Thanos) does not exhibit the problem.
To Reproduce
Raw datapoints:
build_info{instance="foo"}[20m]
time version
2020-07-22 00:45:10 20.05.2
2020-07-22 00:46:10 20.05.2
2020-07-22 00:47:10 20.05.2
2020-07-22 00:48:10 20.05.2
2020-07-22 00:51:56 20.05.3
2020-07-22 00:52:56 20.05.3
2020-07-22 00:58:56 20.05.3
2020-07-22 01:02:11 20.05.3
query, step 60 (no overlap)
build_info{instance="foo}
2020-07-22 00:47:00 20.05.2
2020-07-22 00:48:00 20.05.2
2020-07-22 00:49:00 20.05.2
2020-07-22 00:52:00 20.05.3
2020-07-22 00:53:00 20.05.3
2020-07-22 00:54:00 20.05.3
query, step 240
build_info{instance="foo}
2020-07-22 00:48:00 20.05.2
2020-07-22 00:52:00 20.05.2 ** overlap **
2020-07-22 00:52:00 20.05.3 ** overlap **
2020-07-22 00:56:00 20.05.3
2020-07-22 01:00:00 20.05.3
Expected behavior
If two series are not overlapping in time by raw data, the query should not treat them as overlapping when evaluating one interval in the output.
An example implementation would be to treat “start” and “end” points of a series differently when quantizing raw data points into time buckets: include the series in the bucket if 1) raw points are continuous in the bucket range, or 2) the series starts in the bucket range. Therefore, if a series ends in a bucket range, it is not included. It’s similar to the concept of an open-ended range.
Screenshots
Example graph showing artificial spikes in count(build_info) when there is a deployment causing the version label to change:

Version
victoria-metrics-20200815-125320-tags-v1.40.0-0-ged00eb3f3
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 18 (5 by maintainers)
Commits related to this issue
- app/vmselect/promql: improve time series staleness detection This should prevent from double counting for time series at the time when it changes label. The most common case is in K8S, which changes ... — committed to VictoriaMetrics/VictoriaMetrics by valyala 4 years ago
- app/vmselect/promql: improve time series staleness detection This should prevent from double counting for time series at the time when it changes label. The most common case is in K8S, which changes ... — committed to VictoriaMetrics/VictoriaMetrics by valyala 4 years ago
- app/vmselect/promql: an attempt to improve heuristics for dropping trailing data points in time series Now trailing data points are additionally dropped for time series with a single raw sample Upda... — committed to VictoriaMetrics/VictoriaMetrics by valyala 4 years ago
- app/vmselect/promql: an attempt to improve heuristics for dropping trailing data points in time series Now trailing data points are additionally dropped for time series with a single raw sample Upda... — committed to VictoriaMetrics/VictoriaMetrics by valyala 4 years ago
- app/vmselect/promql: do not drop trailing datapoints for instant queries Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/845 Updates https://github.com/VictoriaMetrics/VictoriaMetri... — committed to VictoriaMetrics/VictoriaMetrics by valyala 4 years ago
- app/vmselect/promql: do not drop trailing datapoints for instant queries Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/845 Updates https://github.com/VictoriaMetrics/VictoriaMetri... — committed to VictoriaMetrics/VictoriaMetrics by valyala 4 years ago
- all: add support for Prometheus staleness markers Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1526 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/748 Updates ... — committed to VictoriaMetrics/VictoriaMetrics by valyala 3 years ago
- all: add support for Prometheus staleness markers Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1526 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/748 Updates ... — committed to VictoriaMetrics/VictoriaMetrics by valyala 3 years ago
- lib/promscrape: stop scrapers for the removed targets before starting scrapers for the added targets This should prevent from possible time series overlap when old target is substituted by new target... — committed to VictoriaMetrics/VictoriaMetrics by valyala 3 years ago
- lib/promscrape: stop scrapers for the removed targets before starting scrapers for the added targets This should prevent from possible time series overlap when old target is substituted by new target... — committed to VictoriaMetrics/VictoriaMetrics by valyala 3 years ago
Thank you. For discrepancies like this, it would be nice for VM to have unit tests against the output of the Prometheus query library.
@belm0 , thanks for the detailed bug report and the proposed solution! The solution looks good. We’ll try implementing it and see how it works.
Now I see the opposite problem, where series unexpectedly disappear before they end (for example at head of the series).
I think it’s related to my comment on the commit about correctness of the 90% heuristic.