thanos: sidecar: Handle or do not allow `delete_series` for Thanos with object store backup

Basically https://github.com/prometheus/prometheus/blob/master/docs/querying/api.md#delete-series is unsafe to run with Thanos sidecar that uploads blocks to bucket because:

  • Prometheus rewrites ALL blocks because they are immutable. This causes Thanos sidecar to upload new ones (without removal of old ones) which causes compactor to stop because of overlap.
  • Race conditions (rewrite during Thanos shipper upload)

It is worth to note that it does not make sense much to do delete_series on local Prometheus with Thanos backup, because most likely all blocks are already uploaded at the moment when Prometheus replaces these with rewritten blocks. So deleted series will be seen only locally. Plus every older block will have still that series

Solution:

  • We don’t want to replace the blocks in sidecar. We don’t want keep whole process coordination free, so only compactor is allowed to remove blocks on regular basis (compaction process)
  • We could instead of specifying which ULIDs where deployed in Thanos meta, specify also time ranges, and ignore or error out the rewritten blocks with new ULID and same time range.

Thanks [slack]@jean-louis for spotting this!

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 15 (9 by maintainers)

Most upvoted comments

Can we first enumerate what are the use cases for delete series? I thought the main one is to reduce the amount of unnecessary metric after they are stored in Prometheus, but with Thanos, space is not that much of a problem.

This is generally true, but there is another dimension to consider: memory. We have been experiencing recently problems with the thanos store processes being OOMkilled. The underlying cause is the number of series we have in the object bucket, which is significant as a result of a long period of time in which we (inadvertently) have been uploading metrics with high cardinality dimensions. Now that we have identified the issue, we’d like to be able to remove those metrics from the storage in order to reduce the memory footprint of the store processes, but alas, I haven’t found a way to do so yet that does not involve removing most of the blocks.

Cool, have you put an issue on Prometheus github?

Let’s figure out how to unblock you. I think that all of those issues, like out of order label set etc, is nice to know and super important to solve, but fixing this is generally not necessary for overall thanos compaction. It seems like the solution here would be to just alert on this (produce metric, log) but continue the compaction as it’s critical for system healthiness. Will try to find some time to produce a fix for this.

Not sure, if delete series helps here as you would have something produced broken series continously (: so you would endup deleting this series every 3h (:

@caarlos0 you can do that just fine - but better turn off compactor temporarily for that. The concern is that compactor might be compacting that very block to bigger block and thus deletion would not help.

The reason I deleted a serie was that the client was initially configured incorrectly, and sent metrics with wrong tags to Prometheus. This caused graphs to be broken for that host.

After reconfiguration it was fine, but somehow I need to get rid of the broken samples in the TSDB.