prometheus: Compaction runs forever and WAL grows until OOM

Bug Report

What did you do? I am running Prometheus with default retention and TSDB settings. It is scraping ~4k metrics per second.

What did you expect to see? I expected the WAL to stay small and new compact TSDB block to be created.

What did you see instead? Under which circumstances? Under steady load of ~4k metrics/second, the compaction seems to run forever, leading to the WAL and RAM filling up until OOM. Due to this checkpointing also fails, Prometheus crashes unclean and the cycle repeats.

Environment

System information:

Production: Linux 4.9.0-4-amd64 x86_64 Reproduced on: Darwin 18.7.0 x86_64

Prometheus version:

Production: Docker image of 2.15.2 Reproduced with: 669592a2c4d59697ce3f654db2c1e7d5e3d42714

Alertmanager version:

not relevant

Prometheus configuration file:

not relevant

Alertmanager configuration file:

not relevant

Reproduced with code:

package main

import (
	"github.com/prometheus/prometheus/tsdb"
)

func main() {
	db, err := tsdb.OpenDBReadOnly("data", nil)
	if err != nil {
		panic(err)
	}
	defer db.Close()

	err = db.FlushWAL("data")
	if err != nil {
		panic(err)
	}
}

Logs:

no relevant errors or messages in log, no output is printing during compaction running forever at 100% of one CPU core.

Probable root cause:

During my reproduction I let the FlushWAL run until the RAM usage of my little program no longer went down (starting at ~8 GB after TSDB and WAL load, down to ~500 MB). Then I paused the program with the debugger and detected the following situation:

I am running into the case of batchNames staying empty for every iteration of the top level for-loop here: https://github.com/prometheus/prometheus/blob/669592a2c4d59697ce3f654db2c1e7d5e3d42714/tsdb/index/index.go#L815-L827

The first label position is 20814, while maxPostings is 19476 and that causes it to break on the first iteration of the inner for-loop, leaving names[] untouched and batchNames empty and that causes it to get stuck in an endless loop.

About this issue

Original URL
State: closed
Created 4 years ago
Reactions: 2
Comments: 18 (9 by maintainers)

Most upvoted comments

Let’s add an error if we enter this endless loop.

roidelapluie on Jun 15, 2020

I can reproduce with different values w.labelNames[names[0]] 22808, maxPostings, 22611

roidelapluie on Jan 18, 2020

We now (2.16.0) refuse to add such bad data (assuming that labels are sorted). We will still run forever if data is corrupt. I agree we should address this, but in a way that causes the minimal impact.

Regarding the usage of tsdb: we are aware that people are using the package. Some discussion is happening on the mailing list: https://groups.google.com/d/msgid/prometheus-developers/55776f4a-b073-4bc1-a470-622d9c318344%40googlegroups.com

roidelapluie on Feb 13, 2020