thanos: thanos compactor crashes with "write compaction: chunk 8 not found: reference sequence 0 out of range"
Thanos, Prometheus and Golang version used
thanos, version 0.5.0 (branch: HEAD, revision: 72820b3f41794140403fd04d6da82299f2c16447) build user: circleci@eeac5eb36061 build date: 20190606-10:53:12 go version: go1.12.5 What happened
thanos compactor crashes with "“write compaction: chunk 8 not found: reference sequence 0 out of range”
What you expected to happen
Should work fine 😃
How to reproduce it (as minimally and precisely as possible):
Not sure 😕
Full logs to relevant components
Out of the list of objects dumped along with error message I’ve found one without chunks
$ gsutil ls -r gs://REDACTED/01DBZNNTM2557YW8T35RBM676P
gs://REDACTED/01DBZNNTM2557YW8T35RBM676P/:
gs://REDACTED/01DBZNNTM2557YW8T35RBM676P/index
gs://REDACTED/01DBZNNTM2557YW8T35RBM676P/meta.json
meta.json contents:
{
"version": 1,
"ulid": "01DBZNNTM2557YW8T35RBM676P",
"minTime": 1557993600000,
"maxTime": 1558022400000,
"stats": {
"numSamples": 35591778,
"numSeries": 39167,
"numChunks": 299446
},
"compaction": {
"level": 2,
"sources": [
"01DB04S079GCEBKMTWZBH8HQA3",
"01DB0BMQG3W7M12M8DE3V9QW5C",
"01DB0JGEQD5RCZ50JAS2NENHQ6",
"01DB0SC6071QW08JWQVG000AKF"
],
"parents": [
{
"ulid": "01DB04S079GCEBKMTWZBH8HQA3",
"minTime": 1557993600000,
"maxTime": 1558000800000
},
{
"ulid": "01DB0BMQG3W7M12M8DE3V9QW5C",
"minTime": 1558000800000,
"maxTime": 1558008000000
},
{
"ulid": "01DB0JGEQD5RCZ50JAS2NENHQ6",
"minTime": 1558008000000,
"maxTime": 1558015200000
},
{
"ulid": "01DB0SC6071QW08JWQVG000AKF",
"minTime": 1558015200000,
"maxTime": 1558022400000
}
]
},
"thanos": {
"labels": {
"environment": "devint",
"instance_number": "1",
"location": "REDACTED",
"prometheus": "monitoring/prometheus-operator-prometheus",
"prometheus_replica": "prometheus-prometheus-operator-prometheus-1",
"stack": "data"
},
"downsample": {
"resolution": 0
},
"source": "compactor"
}
}
It is an object created by compactor apparently.
We’ve been running compactor for some time, but after a while (due to lack of local disk storage) it was crashing constantly. After an extended period of crashing storage has been added and compactor was able to go further until it encountered problem described here.
I guess important part is how such object ended up in bucket, although I wonder if it is possible for thanos to ignore such objects and keep processing data rest of data? (exposing data about bad objects in metrics)
I’d guess that it was somehow created during constant crashes we’ve had earlier, but have nothing to support that.
Anything else we need to know
#688 describes similar issue, although it is about much older thanos than we use here. We’ve been running 0.5 and 0.4 before that. I’m not sure, but it is possible that 0.3.2 (compactor) was used at the beginning.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 21 (11 by maintainers)
Commits related to this issue
- Added single block.MetaFetcher logic for resilient sync of meta files. This replaces man 4 inconsistent meta.json syncs places in other components. Fixes: https://github.com/thanos-io/thanos/issues/... — committed to thanos-io/thanos by bwplotka 5 years ago
- Added single block.MetaFetcher logic for resilient sync of meta files. This replaces man 4 inconsistent meta.json syncs places in other components. Fixes: https://github.com/thanos-io/thanos/issues/... — committed to thanos-io/thanos by bwplotka 5 years ago
- Use block.MetaFetcher in Compactor. Fixes: https://github.com/thanos-io/thanos/issues/1335 Fixes: https://github.com/thanos-io/thanos/issues/1919 Fixes: https://github.com/thanos-io/thanos/issues/130... — committed to thanos-io/thanos by bwplotka 5 years ago
- Use block.MetaFetcher in Compactor. Fixes: https://github.com/thanos-io/thanos/issues/1335 Fixes: https://github.com/thanos-io/thanos/issues/1919 Fixes: https://github.com/thanos-io/thanos/issues/130... — committed to thanos-io/thanos by bwplotka 5 years ago
- Use block.MetaFetcher in Compactor. Fixes: https://github.com/thanos-io/thanos/issues/1335 Fixes: https://github.com/thanos-io/thanos/issues/1919 Fixes: https://github.com/thanos-io/thanos/issues/130... — committed to thanos-io/thanos by bwplotka 5 years ago
- Use block.MetaFetcher in Compactor. Fixes: https://github.com/thanos-io/thanos/issues/1335 Fixes: https://github.com/thanos-io/thanos/issues/1919 Fixes: https://github.com/thanos-io/thanos/issues/130... — committed to thanos-io/thanos by bwplotka 5 years ago
- Use block.MetaFetcher in Compactor. Fixes: https://github.com/thanos-io/thanos/issues/1335 Fixes: https://github.com/thanos-io/thanos/issues/1919 Fixes: https://github.com/thanos-io/thanos/issues/130... — committed to thanos-io/thanos by bwplotka 5 years ago
- Use block.MetaFetcher in Compactor. Fixes: https://github.com/thanos-io/thanos/issues/1335 Fixes: https://github.com/thanos-io/thanos/issues/1919 Fixes: https://github.com/thanos-io/thanos/issues/130... — committed to thanos-io/thanos by bwplotka 5 years ago
- Use block.MetaFetcher in Compactor. (#1937) Fixes: https://github.com/thanos-io/thanos/issues/1335 Fixes: https://github.com/thanos-io/thanos/issues/1919 Fixes: https://github.com/thanos-io/thanos/... — committed to thanos-io/thanos by bwplotka 4 years ago
I’m seeing the same error for blocks uploaded with thanos-compactor 0.6.0 (and then processed by 0.6.0) myself. Backend storage is Ceph cluster via Swift API.
thanos-compactor has uploaded a compacted block yesterday:
and now it’s choking on that block:
First, I’d expect it to survive broken blocks, but what’s more concerning is that the block has been uploaded successfully before (unless those warning messages are not there just for show, and there is indeed something wrong).
What’s uploaded:
and meta.json:
Hi, I am having this issue using thanos 0.8.1. I have tried moving the directories for blocks it complains about out of the bucket but then every time I run the compactor it just finds some more to be sad about 😦
This is crashing the compactor with 0.8.1 even with
--wait
flag.Any help to further debug this would be appreciated! (@bwplotka maybe?)
You should delete the blocks which have duplicated data and only leave one copy. It’s up to you to decide which one it is (: It sounds like you need to delete the one you’ve mentioned but please double check.