prometheus: Prometheus disk getting full with WAL files as compaction is failing
What did you do? Prometheus is started using the following options: /bin/prometheus --config.file=/etc/prometheus/prometheus.yml --web.listen-address=0.0.0.0:9090 --web.read-timeout=5m --web.max-connections=512 --storage.tsdb.path=/appdata/cpro/prometheus --storage.tsdb.retention=15d --alertmanager.notification-queue-capacity=10000 --alertmanager.timeout=10s --query.timeout=2m --query.lookback-delta=5m --query.max-concurrency=20 --log.level=info --storage.remote.flush-deadline=1m --storage.tsdb.max-block-duration=2d --storage.tsdb.min-block-duration=2h --web.enable-lifecycle --web.console.libraries=/usr/share/prometheus/console_libraries --web.console.templates=/usr/share/prometheus/consoles --storage.tsdb.retention.time=7d --storage.tsdb.retention.size=512MB What did you expect to see? Compaction to happen properly, and disk size to be limited to the configured values What did you see instead? Under which circumstances? WAL files keeps increasing. Disk usage reached 98% capacity. Environment Linux
-
System information:
insert output of
uname -srm
here Linux SPS-SM-oame-0 3.10.0-1062.12.1.el7.x86_64 #1 SMP Thu Dec 12 06:44:49 EST 2019 x86_64 x86_64 x86_64 GNU/Linux -
Prometheus version:
insert output of
prometheus --version
here
prometheus, version 2.11.1 (branch: HEAD, revision: e5b22494857deca4b806f74f6e3a6ee30c251763) build user: root@d94406f2bb6f build date: 20190710-13:51:17 go version: go1.12.7
-
Alertmanager version:
insert output of
alertmanager --version
here (if relevant to the issue) -
Prometheus configuration file:
insert configuration here
- Alertmanager configuration file:
insert configuration here (if relevant to the issue)
- Logs:
insert Prometheus and Alertmanager logs relevant to the issue here
Following error is seen:
2020-05-26T01:00:09.346514+00:00 hostname-redacted prometheus: level=info ts=2020-05-26T01:00:09.342Z caller=compact.go:495 component=tsdb msg="write block" mint=1590444000000 maxt=1590451200000 ulid=01E977WKVK944WQ7NC34X2SWCV duration=9.17090938s
2020-05-26T01:00:10.394046+00:00 hostname-redacted prometheus: level=info ts=2020-05-26T01:00:10.390Z caller=head.go:586 component=tsdb msg="head GC completed" duration=824.8421ms
2020-05-26T01:00:13.050596+00:00 hostname-redacted prometheus: level=error ts=2020-05-26T01:00:13.047Z caller=db.go:377 component=tsdb msg="compaction failed" err="reload blocks: head truncate failed: create checkpoint: read segments: corruption in segment /appdata/cpro/prometheus/wal/00000174 at 1802240: unexpected non-zero byte in padded page"
2020-05-26T03:00:08.845965+00:00 hostname-redacted prometheus: level=info ts=2020-05-26T03:00:08.833Z caller=compact.go:495 component=tsdb msg="write block" mint=1590451200000 maxt=1590458400000 ulid=01E97ERB340DSHVFRKCWRQVVXJ duration=8.66957819s
2020-05-26T03:00:09.771702+00:00 hostname-redacted prometheus: level=info ts=2020-05-26T03:00:09.771Z caller=head.go:586 component=tsdb msg="head GC completed" duration=639.749706ms
2020-05-26T03:00:12.464615+00:00 hostname-redacted prometheus: level=error ts=2020-05-26T03:00:12.463Z caller=db.go:377 component=tsdb msg="compaction failed" err="reload blocks: head truncate failed: create checkpoint: read segments: corruption in segment /appdata/cpro/prometheus/wal/00000174 at 1802240: unexpected non-zero byte in padded page"
2020-05-26T05:00:09.686095+00:00 hostname-redacted prometheus: level=info ts=2020-05-26T05:00:09.685Z caller=compact.go:495 component=tsdb msg="write block" mint=1590458400000 maxt=1590465600000 ulid=01E97NM2A3Y0BYSM3NRE806N8Q duration=9.528196164s
2020-05-26T05:00:10.706117+00:00 hostname-redacted prometheus: level=info ts=2020-05-26T05:00:10.705Z caller=head.go:586 component=tsdb msg="head GC completed" duration=655.617278ms
2020-05-26T05:00:13.346782+00:00 hostname-redacted prometheus: level=error ts=2020-05-26T05:00:13.343Z caller=db.go:377 component=tsdb msg="compaction failed" err="reload blocks: head truncate failed: create checkpoint: read segments: corruption in segment /appdata/cpro/prometheus/wal/00000174 at 1802240: unexpected non-zero byte in padded page"
Please let us know if any other logs are needed.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 20 (15 by maintainers)
@syepes it has not been identified as a code bug (yet), and most likely a disk corruption. So it will be present as long as the disk is faulty.