prometheus: Compaction runs forever and WAL grows until OOM

Bug Report

This is a similar issue as https://github.com/prometheus/prometheus/issues/6408 and https://github.com/prometheus/prometheus/issues/6595.

What did you do? I am running Prometheus with default retention and TSDB settings. It is scraping ~4k metrics per second.

What did you expect to see? I expected the WAL to stay small and new compact TSDB block to be created.

What did you see instead? Under which circumstances? Under steady load of ~4k metrics/second, the compaction seems to run forever, leading to the WAL and RAM filling up until OOM. Due to this checkpointing also fails, Prometheus crashes unclean and the cycle repeats.

Environment

  • System information:

Production: Linux 4.9.0-4-amd64 x86_64 Reproduced on: Darwin 18.7.0 x86_64

  • Prometheus version:

Production: Docker image of 2.15.2 Reproduced with: 669592a2c4d59697ce3f654db2c1e7d5e3d42714

  • Alertmanager version:

not relevant

  • Prometheus configuration file:

not relevant

  • Alertmanager configuration file:

not relevant

  • Reproduced with code:
package main

import (
	"github.com/prometheus/prometheus/tsdb"
)

func main() {
	db, err := tsdb.OpenDBReadOnly("data", nil)
	if err != nil {
		panic(err)
	}
	defer db.Close()

	err = db.FlushWAL("data")
	if err != nil {
		panic(err)
	}
}
  • Logs:

no relevant errors or messages in log, no output is printing during compaction running forever at 100% of one CPU core.

  • Probable root cause:

During my reproduction I let the FlushWAL run until the RAM usage of my little program no longer went down (starting at ~8 GB after TSDB and WAL load, down to ~500 MB). Then I paused the program with the debugger and detected the following situation:

I am running into the case of batchNames staying empty for every iteration of the top level for-loop here: https://github.com/prometheus/prometheus/blob/669592a2c4d59697ce3f654db2c1e7d5e3d42714/tsdb/index/index.go#L815-L827

The first label position is 20814, while maxPostings is 19476 and that causes it to break on the first iteration of the inner for-loop, leaving names[] untouched and batchNames empty and that causes it to get stuck in an endless loop.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 2
  • Comments: 18 (9 by maintainers)

Most upvoted comments

Let’s add an error if we enter this endless loop.

I can reproduce with different values w.labelNames[names[0]] 22808, maxPostings, 22611

We now (2.16.0) refuse to add such bad data (assuming that labels are sorted). We will still run forever if data is corrupt. I agree we should address this, but in a way that causes the minimal impact.

Regarding the usage of tsdb: we are aware that people are using the package. Some discussion is happening on the mailing list: https://groups.google.com/d/msgid/prometheus-developers/55776f4a-b073-4bc1-a470-622d9c318344%40googlegroups.com