prometheus: write error , out-of-order series added with label set
1、Prometheus 2.6.0 ( in docker), remote_write + influxdb( in docker). 2、Prometheus always OOM,use 【count by (name)({name=~“.+”}) >10000】got metric 【node_cpu_seconds_total】 3、Stop prometheus ,login influxDB,【drop measurement node_cpu_seconds_total】,In prometheus.yaml add `metric_relabel_configs:
- source_labels: [__name__]
regex: 'node_cpu_seconds_total'
action: drop`
4、start Prometheus,see: prometheus | level=error ts=2019-05-31T08:36:52.501Z caller=db.go:363 component=tsdb msg=“compaction failed” err=“persist head block: write compaction: add series: out-of-order series added with label set "{}"” 5、update Prometheus to version 2.10.0,Still the same mistake。
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 2
- Comments: 50 (23 by maintainers)
Halted, and won’t be able to look at it before 2.20. This issue is old and not introduced in 2.19 as the TSDB changes in 2.19 does not interact with this part.
maxt of file files not set and out of sequence mmap chunks are unrelated to this issue but is a different issue and needs investigation, can you open issues for it (separate issues of both of them) with logs? Thanks!
We looked at this in the bug scrub. This related to a fixed bug and this bug was remaining open in case we wished to do something about bad blocks produced by this bug. Given the lack of confirmed reports in supported settings in over a year, such handling does not seems required. In addition this issue has been derailed by unrelated tsdb support questions. Accordingly we’re going to close this.
@brokencode64 That is not what this issue is about, if you can reproduce on the latest version please open a new issue.
As far as we are aware require POSIX compliance, which exceedingly few network filesystems have. At this point we’ve reports of basically every form of networked filesystem having an issue for someone - however that’s not to say that e.g. some NFS implementation may actually be sufficiently correct to not cause issues (and I suspect at least one is).
If you can show that some networked filesystem is not disobeying POSIX semantics or otherwise doing odd stuff (e.g. creating files without being asked to) but still having issues, then we should update the wording accordingly.
@brian-brazil Trying to get some clarification. Elsewhere the docs say POSIX compliance is the important thing. Here you’re saying that POSIX compliance doesn’t actually matter if it’s some sort of network filesystem (even if it’s not NFS specifically). I know this sounds pedantic but, if you confirm POSIX compliance doesn’t matter, I’ll update the language in the docs with a PR to say something like “local-only POSIX-compliant.” My team has been going in circles on this.
It can be reproduced by running this test (Remove the
t.Skip()
before running. Run it multiple times if it doesn’t fail in the first attempt.) https://github.com/prometheus/prometheus/blob/30505a202a4c33ceeb10bfb3ba01e371a2a10906/tsdb/head_test.go#L1879I plan to investigate this next week.
use metric_relabel_configs to drop label same error
metric_relabel_configs