prometheus: Compaction runs forever and WAL grows until OOM
Bug Report
This is a similar issue as https://github.com/prometheus/prometheus/issues/6408 and https://github.com/prometheus/prometheus/issues/6595.
What did you do? I am running Prometheus with default retention and TSDB settings. It is scraping ~4k metrics per second.
What did you expect to see? I expected the WAL to stay small and new compact TSDB block to be created.
What did you see instead? Under which circumstances? Under steady load of ~4k metrics/second, the compaction seems to run forever, leading to the WAL and RAM filling up until OOM. Due to this checkpointing also fails, Prometheus crashes unclean and the cycle repeats.
Environment
- System information:
Production: Linux 4.9.0-4-amd64 x86_64 Reproduced on: Darwin 18.7.0 x86_64
- Prometheus version:
Production: Docker image of 2.15.2 Reproduced with: 669592a2c4d59697ce3f654db2c1e7d5e3d42714
- Alertmanager version:
not relevant
- Prometheus configuration file:
not relevant
- Alertmanager configuration file:
not relevant
- Reproduced with code:
package main
import (
"github.com/prometheus/prometheus/tsdb"
)
func main() {
db, err := tsdb.OpenDBReadOnly("data", nil)
if err != nil {
panic(err)
}
defer db.Close()
err = db.FlushWAL("data")
if err != nil {
panic(err)
}
}
- Logs:
no relevant errors or messages in log, no output is printing during compaction running forever at 100% of one CPU core.
- Probable root cause:
During my reproduction I let the FlushWAL run until the RAM usage of my little program no longer went down (starting at ~8 GB after TSDB and WAL load, down to ~500 MB). Then I paused the program with the debugger and detected the following situation:
I am running into the case of batchNames staying empty for every iteration of the top level for-loop here: https://github.com/prometheus/prometheus/blob/669592a2c4d59697ce3f654db2c1e7d5e3d42714/tsdb/index/index.go#L815-L827
The first label position is 20814, while maxPostings is 19476 and that causes it to break on the first iteration of the inner for-loop, leaving names[] untouched and batchNames empty and that causes it to get stuck in an endless loop.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 2
- Comments: 18 (9 by maintainers)
Let’s add an error if we enter this endless loop.
I can reproduce with different values w.labelNames[names[0]] 22808, maxPostings, 22611
We now (2.16.0) refuse to add such bad data (assuming that labels are sorted). We will still run forever if data is corrupt. I agree we should address this, but in a way that causes the minimal impact.
Regarding the usage of tsdb: we are aware that people are using the package. Some discussion is happening on the mailing list: https://groups.google.com/d/msgid/prometheus-developers/55776f4a-b073-4bc1-a470-622d9c318344%40googlegroups.com