thanos: Store: `ExpandPostings` returning postings for time series not matched by matchers
Thanos, Prometheus and Golang version used: main 0651f334324572266844f73882e61d746ec7e91b
Object Storage Provider: N/A
What happened:
When using Thanos from main (version stated above) with the Thanos Prometheus engine, we sporadically see vector cannot contain metrics with the same labelset
error for specific queries. Running sum by (__name__) (some_metric{some="label"})
for the metrics that triggered the error, we sometimes got additional time series for metrics different from some_metric
.
Further investigating we noticed that the bucket’s Series
call returns some thousand series, instead of two series we would expected according the the matchers in the request. Debugging lead us to ExpandPostings
of the blockSeriesClient
, which sometimes seems to return postings for times series that are not matched by the provided matchers. We were able to narrow it down to https://github.com/thanos-io/thanos/pull/6420 and removing this PR resolved the issue. We didn’t investigate further yet and don’t know which part of the PR is causing the issue, but we assume it might be a race condition, since it only happens sporadically.
What you expected to happen:
Don’t see vector cannot contain metrics with the same labelset
error or metrics for different time series being returned.
How to reproduce it (as minimally and precisely as possible): We were able to reproduce it by running a store against our production GCP bucket for specific tenants with index cache enabled, but weren’t able to reproduce it with a bucket test yet.
Full logs to relevant components: N/A
Anything else we need to know: N/A
Environment: It happens for both the production deployment and locally when pointing stores at our production buckets.
About this issue
- Original URL
- State: closed
- Created a year ago
- Reactions: 2
- Comments: 22 (18 by maintainers)
Tested and it works 😄 thanks!
Please try out v0.32.3 as well! Thanks!
Are you able to test with a nightly? We merged a fix for deduplication recently that was not yet released
As @fpetkovski and you just pointed out this might be fixed by #6575. While we were never able to reproduce the issue with a test, we can see if the issue is still reproducible with the setup where we saw the issue before.