thanos: compact: fails with "invalid plan block" error

Those chunks are from an old pod (we’re using deployments, so the replica label doesn’t exist anymore, in case this could help for debugging)

level=info ts=2018-05-24T22:01:03.10004204Z caller=compact.go:245 msg="starting compact node"
level=info ts=2018-05-24T22:01:03.100144202Z caller=compact.go:131 msg="start sync of metas"
level=debug ts=2018-05-24T22:01:03.426927339Z caller=compact.go:165 msg="download meta" block=01CE75YJXFASBNC40BYG8EHD1N
level=debug ts=2018-05-24T22:01:03.528081002Z caller=compact.go:165 msg="download meta" block=01CE7CTA5BG05CXPDZTTPQZ1WX
level=debug ts=2018-05-24T22:01:03.549836998Z caller=compact.go:165 msg="download meta" block=01CE7KP1DCEYPB1J7SYKNKSXB8
level=debug ts=2018-05-24T22:01:03.572966127Z caller=compact.go:165 msg="download meta" block=01CE7THRND9X4YAHZJP17QP9MY
level=debug ts=2018-05-24T22:01:03.623778574Z caller=compact.go:165 msg="download meta" block=01CE81DFXA83PF5ZA34XJH1GJH
level=debug ts=2018-05-24T22:01:03.645886122Z caller=compact.go:165 msg="download meta" block=01CE88975E33ARVHPQNEX8K8TJ
level=debug ts=2018-05-24T22:01:03.704790623Z caller=compact.go:165 msg="download meta" block=01CE8F4YDGM3CC57HKXFBBSWQ2
level=debug ts=2018-05-24T22:01:03.75381572Z caller=compact.go:165 msg="download meta" block=01CE8P0NMQ16JDFD619390KJQH
level=debug ts=2018-05-24T22:01:03.784011615Z caller=compact.go:165 msg="download meta" block=01CE8WWCW8VR2H1YZ92HTK0WS2
level=debug ts=2018-05-24T22:01:03.806186002Z caller=compact.go:165 msg="download meta" block=01CE8WWCWQK9GZE8K6XWRZ9ME1
level=debug ts=2018-05-24T22:01:03.870331206Z caller=compact.go:165 msg="download meta" block=01CE93R44DQWHR2CFG0CXTP5QJ
level=debug ts=2018-05-24T22:01:03.898889804Z caller=compact.go:165 msg="download meta" block=01CE93R44V6DFZEQY1F4F3NVQH
level=debug ts=2018-05-24T22:01:03.919810068Z caller=compact.go:165 msg="download meta" block=01CE9AKVC52RKEJ7Y3EN4VPEFT
level=debug ts=2018-05-24T22:01:03.940709351Z caller=compact.go:165 msg="download meta" block=01CE9AKVCGCSX7VNDS6HGR9CTY
level=debug ts=2018-05-24T22:01:04.026893239Z caller=compact.go:165 msg="download meta" block=01CE9HFJMGYHTQFP2SWM78H8DG
level=debug ts=2018-05-24T22:01:04.049993424Z caller=compact.go:165 msg="download meta" block=01CE9HFJMJ8GB9JX7FB3CJ9094
level=debug ts=2018-05-24T22:01:04.068987694Z caller=compact.go:165 msg="download meta" block=01CE9RB9WG5C11SHFMBP9524W0
level=debug ts=2018-05-24T22:01:04.090585575Z caller=compact.go:165 msg="download meta" block=01CE9RB9WJ501MZ1FFFG0VH0F7
level=debug ts=2018-05-24T22:01:04.123175132Z caller=compact.go:165 msg="download meta" block=01CE9Z714G2BARSFT68HHKAT4M
level=debug ts=2018-05-24T22:01:04.148953081Z caller=compact.go:165 msg="download meta" block=01CE9Z714JHRXJ7BA4D0J8WAAV
level=info ts=2018-05-24T22:01:04.168912292Z caller=compact.go:137 msg="start of GC"


level=error ts=2018-05-24T22:01:27.662643337Z caller=main.go:147 msg="running command failed" err="compaction: invalid plan block /var/thanos/compact/compact/0@{monitor=\"prometheus\",replica=\"prometheus-thanos-1-86d6c8569d-x22rl\"}/01CE75YJXFASBNC40BYG8EHD1N: No chunks are out of order, but found some outsider blocks. (Blocks that outside of block time range): 4160. Complete: 0"

01CE75YJXFASBNC40BYG8EHD1N/meta.json

{
	"version": 1,
	"ulid": "01CE75YJXFASBNC40BYG8EHD1N",
	"minTime": 1527091200000,
	"maxTime": 1527098400000,
	"stats": {
		"numSamples": 1879211741,
		"numSeries": 2858110,
		"numChunks": 16550746
	},
	"compaction": {
		"level": 1,
		"sources": [
			"01CE75YJXFASBNC40BYG8EHD1N"
		]
	},
	"thanos": {
		"labels": {
			"monitor": "prometheus",
			"replica": "prometheus-thanos-1-86d6c8569d-x22rl"
		},
		"downsample": {
			"resolution": 0
		}
	}
}

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 1
  • Comments: 24 (15 by maintainers)

Commits related to this issue

Most upvoted comments

Thanks all for helping with this issue!

Thanos now handles these blocks gracefully (it repairs them in place). The TSDB it not yet fixed (since it is not straightforward), but when the issue occurs, Thanos handles it well now.

Feel free to use master-2018-06-15-5b66b72 docker tag or above. New release candidate will produce soon.

Thanks to @BenoitKnecht who figured the root cause and @clmssz for reporting it! Nice work!

Thanks of @BenoitKnecht we found a root cause and fix is in progress

@povilasv I’m not sure what you mean by the outsiders are before minTime. You seem be in the exact same situation I’m in: the outsider’s minTime (c.MinTime) is exactly equal to the block’s maxTime (maxTime). So all the outsiders “overflow” the block at the end, not at the start.

To make things easier, I’ve been using this script to inspect Prometheus blocks locally, rather than using thanos bucket verify to check the blocks after they’ve been uploaded. It gives me this kind of output on affected blocks.

Interestingly, out of the four Prometheus instances that I run (all identical, based on the prom/prometheus:v2.2.1 Docker image), only one is affected by this issue, but not on every block:

$ ./chunks /var/lib/prometheus/data/01C*
block=01CD222J966Z2JJMS1AZ8JNTBF, MinTime=2018-05-07 00:00:00 +0000 UTC, MaxTime=2018-05-09 06:00:00 +0000 UTC, compaction=4
block=01CD7VF55R54PV4DEHGGC2KZ2H, MinTime=2018-05-09 06:00:00 +0000 UTC, MaxTime=2018-05-11 12:00:00 +0000 UTC, compaction=4
block=01CDDMVSD5B82B27A8XNARER90, MinTime=2018-05-11 12:00:00 +0000 UTC, MaxTime=2018-05-13 18:00:00 +0000 UTC, compaction=4
block=01CDKE8D2PSZFFJ8GRX47DE12D, MinTime=2018-05-13 18:00:00 +0000 UTC, MaxTime=2018-05-16 00:00:00 +0000 UTC, compaction=4
block=01CDS7N2T3XFSME66KD1AMH6PW, MinTime=2018-05-16 00:00:00 +0000 UTC, MaxTime=2018-05-18 06:00:00 +0000 UTC, compaction=4
block=01CDZ11MMPDGRFMQCMS4RXFBGK, MinTime=2018-05-18 06:00:00 +0000 UTC, MaxTime=2018-05-20 12:00:00 +0000 UTC, compaction=4
block=01CE4TEA4E80PPRJQ0GHP1TADE, MinTime=2018-05-20 12:00:00 +0000 UTC, MaxTime=2018-05-22 18:00:00 +0000 UTC, compaction=4
block=01CEAKTWDT2W08HY4PFHMJTYAV, MinTime=2018-05-22 18:00:00 +0000 UTC, MaxTime=2018-05-25 00:00:00 +0000 UTC, compaction=4
block=01CEGD7FBRP4SXHT39XMTBGET5, MinTime=2018-05-25 00:00:00 +0000 UTC, MaxTime=2018-05-27 06:00:00 +0000 UTC, compaction=4
block=01CEP6M3153AGS6XZM6Q9P1APV, MinTime=2018-05-27 06:00:00 +0000 UTC, MaxTime=2018-05-29 12:00:00 +0000 UTC, compaction=4
block=01CEW00QHRRQ52S6YC9QKH9M00, MinTime=2018-05-29 12:00:00 +0000 UTC, MaxTime=2018-05-31 18:00:00 +0000 UTC, compaction=4
posting=11265764, MinTime=2018-05-31 18:00:00 +0000 UTC, maxTime=2018-05-31 18:39:40 +0000 UTC
posting=11266807, MinTime=2018-05-31 18:00:00 +0000 UTC, maxTime=2018-05-31 18:39:40 +0000 UTC
posting=11267482, MinTime=2018-05-31 18:00:00 +0000 UTC, maxTime=2018-05-31 18:39:40 +0000 UTC
posting=11268158, MinTime=2018-05-31 18:00:00 +0000 UTC, maxTime=2018-05-31 18:39:40 +0000 UTC
posting=11268834, MinTime=2018-05-31 18:00:00 +0000 UTC, maxTime=2018-05-31 18:39:40 +0000 UTC
posting=11270881, MinTime=2018-05-31 18:00:00 +0000 UTC, maxTime=2018-05-31 18:39:40 +0000 UTC
posting=11271865, MinTime=2018-05-31 18:00:00 +0000 UTC, maxTime=2018-05-31 18:39:40 +0000 UTC
posting=11273787, MinTime=2018-05-31 18:00:00 +0000 UTC, maxTime=2018-05-31 18:39:40 +0000 UTC
posting=11304482, MinTime=2018-05-31 18:00:00 +0000 UTC, maxTime=2018-05-31 18:39:40 +0000 UTC
posting=11310999, MinTime=2018-05-31 18:00:00 +0000 UTC, maxTime=2018-05-31 18:39:40 +0000 UTC
posting=11317304, MinTime=2018-05-31 18:00:00 +0000 UTC, maxTime=2018-05-31 18:39:40 +0000 UTC
posting=12321976, MinTime=2018-05-31 18:00:00 +0000 UTC, maxTime=2018-05-31 18:39:40 +0000 UTC
block=01CF1SDBE6J6EMHNK799TP4ZFX, MinTime=2018-05-31 18:00:00 +0000 UTC, MaxTime=2018-06-03 00:00:00 +0000 UTC, compaction=4
posting=11658250, MinTime=2018-06-03 00:00:00 +0000 UTC, maxTime=2018-06-03 00:39:40 +0000 UTC
posting=11659307, MinTime=2018-06-03 00:00:00 +0000 UTC, maxTime=2018-06-03 00:39:40 +0000 UTC
posting=11659989, MinTime=2018-06-03 00:00:00 +0000 UTC, maxTime=2018-06-03 00:39:40 +0000 UTC
posting=11660671, MinTime=2018-06-03 00:00:00 +0000 UTC, maxTime=2018-06-03 00:39:40 +0000 UTC
posting=11661354, MinTime=2018-06-03 00:00:00 +0000 UTC, maxTime=2018-06-03 00:39:40 +0000 UTC
posting=11663421, MinTime=2018-06-03 00:00:00 +0000 UTC, maxTime=2018-06-03 00:39:40 +0000 UTC
posting=11664414, MinTime=2018-06-03 00:00:00 +0000 UTC, maxTime=2018-06-03 00:39:40 +0000 UTC
posting=11666356, MinTime=2018-06-03 00:00:00 +0000 UTC, maxTime=2018-06-03 00:39:40 +0000 UTC
posting=11698438, MinTime=2018-06-03 00:00:00 +0000 UTC, maxTime=2018-06-03 00:39:40 +0000 UTC
posting=11705086, MinTime=2018-06-03 00:00:00 +0000 UTC, maxTime=2018-06-03 00:39:40 +0000 UTC
posting=11711510, MinTime=2018-06-03 00:00:00 +0000 UTC, maxTime=2018-06-03 00:39:40 +0000 UTC
posting=12735981, MinTime=2018-06-03 00:00:00 +0000 UTC, maxTime=2018-06-03 00:39:40 +0000 UTC
block=01CF7JSYPJWXJ0F3HAZZWGYRPG, MinTime=2018-06-03 00:00:00 +0000 UTC, MaxTime=2018-06-05 06:00:00 +0000 UTC, compaction=4
posting=11349917, MinTime=2018-06-05 06:00:00 +0000 UTC, maxTime=2018-06-05 06:39:40 +0000 UTC
posting=11350973, MinTime=2018-06-05 06:00:00 +0000 UTC, maxTime=2018-06-05 06:39:40 +0000 UTC
posting=11351656, MinTime=2018-06-05 06:00:00 +0000 UTC, maxTime=2018-06-05 06:39:40 +0000 UTC
posting=11352339, MinTime=2018-06-05 06:00:00 +0000 UTC, maxTime=2018-06-05 06:39:40 +0000 UTC
posting=11353023, MinTime=2018-06-05 06:00:00 +0000 UTC, maxTime=2018-06-05 06:39:40 +0000 UTC
posting=11355086, MinTime=2018-06-05 06:00:00 +0000 UTC, maxTime=2018-06-05 06:39:40 +0000 UTC
posting=11356084, MinTime=2018-06-05 06:00:00 +0000 UTC, maxTime=2018-06-05 06:39:40 +0000 UTC
posting=11358021, MinTime=2018-06-05 06:00:00 +0000 UTC, maxTime=2018-06-05 06:39:40 +0000 UTC
posting=11391043, MinTime=2018-06-05 06:00:00 +0000 UTC, maxTime=2018-06-05 06:39:40 +0000 UTC
posting=11397659, MinTime=2018-06-05 06:00:00 +0000 UTC, maxTime=2018-06-05 06:39:40 +0000 UTC
posting=11404046, MinTime=2018-06-05 06:00:00 +0000 UTC, maxTime=2018-06-05 06:39:40 +0000 UTC
posting=12413590, MinTime=2018-06-05 06:00:00 +0000 UTC, maxTime=2018-06-05 06:39:40 +0000 UTC
block=01CF9GJVFAS4AJF7M9CJ9QV10E, MinTime=2018-06-06 00:00:00 +0000 UTC, MaxTime=2018-06-06 02:00:00 +0000 UTC, compaction=1
block=01CF9GK7ENZF10H6GARMXS7XPQ, MinTime=2018-06-05 06:00:00 +0000 UTC, MaxTime=2018-06-06 00:00:00 +0000 UTC, compaction=3
block=01CF9QEJMQ35ZB60DK72KHBD4N, MinTime=2018-06-06 02:00:00 +0000 UTC, MaxTime=2018-06-06 04:00:00 +0000 UTC, compaction=1

So again, I’m not sure if this is a Prometheus issue (meaning it should never produce blocks with chunks that stick out at the end), or if it’s a Thanos issue (in the sense that it should not error out on blocks with that particular type of outsiders, but just treat them like any other block).