prometheus: Compaction fails when total symbol size exceeds 2^32 bytes

What did you do? compact blocks from 7 2day files to larger one by thanos. duration=300

What did you expect to see? 7*2day blocks can be compacted successfully.

What did you see instead? Under which circumstances? normal compact process halt, and throw this error " 2 errors: populate block: add series: read symbols: invalid checksum; read symbols: invalid checksum"

Environment 4C16G , block files index is 1.5G,4.2g,4.13g,3.74g,5.24g,5.25g,5.06g

  • System information:

    insert output of uname -srm here

  • Prometheus version: 1.8.2 insert output of prometheus --version here

  • Logs:

caller=compact.go:416 msg="critical error detected; halting" err="compaction: group 300@14327834599727325097: compact blocks [xx,xx,xx,xx,xx,xx,xx]: 2 errors: populate block: add series: read symbols: invalid checksum; read symbols: invalid checksum"

code info:
thanos compact/compact.go 
     comp.Compact(dir,plan,nil)

tsdb/compact.go
  L380  Compact
  L424 c.write(dest,meta,blocks)
         c.populateBlock
              indexw.AddSeries(ref,sl,chks...) ??

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Comments: 22 (22 by maintainers)

Commits related to this issue

Most upvoted comments

I think we should also discuss if we shouldn’t make this a known limitation by design, document it and keep it as is. Anything beyond 64GB limit (or even half of it) is basically physically un-queryable (too long to traverse all postings) and should be sharded to multiple blocks by series dimension.

Also I think the original error report is not valid anymore as Thanos handles this in a clean way, by ignoring that compaction group and allowing users to shard manually (until auto vertical block sharding is done).

Hi,LeviHarrison i think i have dug out the reason of my “invalid checksum”: index->symbol len is 4byte, but our symbol datas length, reach to 4388357229, and it’s cut down to 4byte. So the crc32 errors in the end. code:

prometheus/tsdb/index/index.go
func() finishSymbols w.buf1.PutBE32int(int(w.f.pos - w.toc.Symbols - 4)) ##w.f.pos=4388357238,w.toc.Symbols=5 btw, thank you very much for your help and proposal.