thanos: v0.12.0 regression: Partial blocks are never deleted; spam in logs for double deletion mark.

It was hard to track down, but looks like 4 unexpected things are happening on v0.12.0:

  1. Partial blocks (no meta.json) are never cleaned up: image

  2. We have the spam of double mark deletion for partial upload delete attempt (for each partial block):

l=info ts=2020-04-17T16:50:14.305651352Z caller=clean.go:52 msg="deleted aborted partial upload" block=01E5Q7MP483QSDSX036X4TZ5SD thresholdAge=48h0m0s
level=info ts=2020-04-17T16:50:14.305659796Z caller=clean.go:47 msg="found partially uploaded block; marking for deletion" block=01E50RCWDZ5P83NRH4R0GXVGCC
level=warn ts=2020-04-17T16:50:14.312057891Z caller=block.go:140 msg="requested to mark for deletion, but file already exists; this should not happen; investigate" err="file 01E50RCWDZ5P83NRH4R0GXVGCC/deletion-mark.json already exists in bucket"
level=info ts=2020-04-17T16:50:14.312073215Z caller=clean.go:52 msg="deleted aborted partial upload" block=01E50RCWDZ5P83NRH4R0GXVGCC thresholdAge=48h0m0s
level=info ts=2020-04-17T16:50:14.312081259Z caller=clean.go:47 msg="found partially uploaded block; marking for deletion" block=01E5695C03VK6FB89G49T5MRB4
level=warn ts=2020-04-17T16:50:14.317892887Z caller=block.go:140 msg="requested to mark for deletion, but file already exists; this should not happen; investigate" err="file 01E5695C03VK6FB89G49T5MRB4/deletion-mark.json already exists in bucket"
level=info ts=2020-04-17T16:50:14.317907097Z caller=clean.go:52 msg="deleted aborted partial upload" block=01E5695C03VK6FB89G49T5MRB4 thresholdAge=48h0m0s
level=info ts=2020-04-17T16:50:14.317914967Z caller=clean.go:47 msg="found partially uploaded block; marking for deletion" block=01E5J2V84SKW9GYMH8B88M5NR9
  1. We have spam of double mark deletions for garbage collection attempts (thousands+ as well):
level=warn ts=2020-04-16T23:01:18.533895718Z caller=block.go:140 msg="requested to mark for deletion, but file already exists; this should not happen; investigate" err="file 01E57S72A216JY3H6JXXK1GDE5/deletion-mark.json already exists in bucket"
level=info ts=2020-04-16T23:01:18.533922849Z caller=compact.go:271 msg="marking outdated block for deletion" block=01E5F2TGZ6FD6KS5WDMKBKGKP5
level=warn ts=2020-04-16T23:01:18.540796799Z caller=block.go:140 msg="requested to mark for deletion, but file already exists; this should not happen; investigate" err="file 01E5F2TGZ6FD6KS5WDMKBKGKP5/deletion-mark.json already exists in bucket"
level=info ts=2020-04-16T23:01:18.540817511Z caller=compact.go:271 msg="marking outdated block for deletion" block=01E58037KF7XF4TFMDQZYFKJCT
level=warn ts=2020-04-16T23:01:18.546292591Z caller=block.go:140 msg="requested to mark for deletion, but file already exists; this should not happen; investigate" err="file 01E58037KF7XF4TFMDQZYFKJCT/deletion-mark.json already exists in bucket"
level=info ts=2020-04-16T23:01:18.546319501Z caller=compact.go:271 msg="marking outdated block for deletion" block=01E58037KFCFY22BZZR4RJKG7G
  1. (theoretically) Malformed deletion mark can cause the problem: log and inability to delete it.

We found the root cause of 1, 2 and 4 still looking for 3, help wanted (: 3 is not very critical other than spam.

Root cause explained on diagram: image

Fix in progress. Let’s make sure it is in 0.12.1

cc @kakkoyun @squat @pstibrany @pracucci

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 8
  • Comments: 26 (12 by maintainers)

Commits related to this issue

Most upvoted comments

@bwplotka This is still happening when applying optional retention. Using v0.15. The same happens for many many many more chunks - left just 1-2 for easier reading:

level=info ts=2020-10-09T07:04:47.978659482Z caller=compact.go:949 msg="start of GC"
level=info ts=2020-10-09T07:04:47.979054896Z caller=compact.go:961 msg="start of compactions"
level=info ts=2020-10-09T07:04:48.2379625Z caller=compact.go:993 msg="compaction iterations done"
level=info ts=2020-10-09T07:04:48.238018236Z caller=compact.go:315 msg="start first pass of downsampling"
level=info ts=2020-10-09T07:04:50.678208522Z caller=fetcher.go:453 component=block.BaseFetcher msg="successfully synchronized block metadata" duration=2.440176163s cached=170 returned=151 partial=0
level=info ts=2020-10-09T07:04:50.68453155Z caller=compact.go:329 msg="start second pass of downsampling"
level=info ts=2020-10-09T07:04:52.814812001Z caller=fetcher.go:453 component=block.BaseFetcher msg="successfully synchronized block metadata" duration=2.130245473s cached=170 returned=151 partial=0
level=info ts=2020-10-09T07:04:52.821493208Z caller=compact.go:336 msg="downsampling iterations done"
level=info ts=2020-10-09T07:04:55.386447541Z caller=fetcher.go:453 component=block.BaseFetcher msg="successfully synchronized block metadata" duration=2.564924279s cached=170 returned=151 partial=0
level=info ts=2020-10-09T07:04:55.391310955Z caller=retention.go:30 msg="start optional retention"
level=info ts=2020-10-09T07:04:55.391339335Z caller=retention.go:39 msg="applying retention: marking block for deletion" id=01DDW1GEAWR32RGFJGH7RKBFHJ maxTime="2019-06-21 00:00:00 +0000 UTC"
level=warn ts=2020-10-09T07:04:55.400021215Z caller=block.go:132 msg="requested to mark for deletion, but file already exists; this should not happen; investigate" err="file 01DDW1GEAWR32RGFJGH7RKBFHJ/deletion-mark.json already exists in bucket"
level=info ts=2020-10-09T07:04:55.400047177Z caller=retention.go:39 msg="applying retention: marking block for deletion" id=01DE169YS1TGQ7S6G7X3R5H4EY maxTime="2019-06-23 00:00:00 +0000 UTC"
level=warn ts=2020-10-09T07:04:55.407440159Z caller=block.go:132 msg="requested to mark for deletion, but file already exists; this should not happen; investigate" err="file 01DE169YS1TGQ7S6G7X3R5H4EY/deletion-mark.json already exists in bucket"
level=info ts=2020-10-09T07:04:55.40746471Z caller=retention.go:39 msg="applying retention: marking block for deletion" id=01DE7JD6F3YN7KC7A7VKT58CNS maxTime="2019-06-25 14:00:00 +0000 UTC"
level=info ts=2020-10-09T07:04:56.10602337Z caller=retention.go:45 msg="optional retention apply done"
level=info ts=2020-10-09T07:04:56.106035196Z caller=clean.go:33 msg="started cleaning of aborted partial uploads"
level=info ts=2020-10-09T07:04:56.106044972Z caller=clean.go:60 msg="cleaning of aborted partial uploads done"
level=info ts=2020-10-09T07:04:56.1060543Z caller=blocks_cleaner.go:43 msg="started cleaning of blocks marked for deletion"
level=info ts=2020-10-09T07:04:56.106088212Z caller=blocks_cleaner.go:57 msg="cleaning of blocks marked for deletion done"
level=info ts=2020-10-09T07:05:46.310145394Z caller=fetcher.go:453 component=block.BaseFetcher msg="successfully synchronized block metadata" duration=484.017868ms cached=170 returned=170 partial=0
level=info ts=2020-10-09T07:06:46.431037505Z caller=fetcher.go:453 component=block.BaseFetcher msg="successfully synchronized block metadata" duration=604.914414ms cached=170 returned=170 partial=0
level=info ts=2020-10-09T07:07:46.145759802Z caller=fetcher.go:453 component=block.BaseFetcher msg="successfully synchronized block metadata" duration=319.631793ms cached=170 returned=170 partial=0
level=info ts=2020-10-09T07:08:47.035944733Z caller=fetcher.go:453 component=block.BaseFetcher msg="successfully synchronized block metadata" duration=1.209826117s cached=170 returned=170 partial=0

We are in progress in cutting it. Once done we will let you know so you can test with us if all works (:

On Tue, 21 Apr 2020 at 12:53, Stefan Bühler notifications@github.com wrote:

Yes, I did experience this (about 15G logs on a 20G rootfs; compact has it’s own partition) with v0.12.0.

While https://github.com/thanos-io/thanos/commits/release-0.12 seems to indicate that v0.12.1 has been released, there is no tag and no binaries (and no relase on https://github.com/thanos-io/thanos/releases). Any idea when those will surface?

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/thanos-io/thanos/issues/2459#issuecomment-617133065, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABVA3O7L7DS4N5MUBONTKKDRNWCLFANCNFSM4MK5JAQQ .