VictoriaMetrics: vmagent: drop corrupted data blocks on start up
There seems to be a problem wrt. handling cached data by vmagent v1.52.0:
/usr/sbin/vmagent \
--remoteWrite.flushInterval=5s --httpListenAddr=127.0.0.1:8429 \
--promscrape.config=/data/vmagent/promscrape.json --promscrape.disableKeepAlive=false \
--promscrape.disableCompression --remoteWrite.url=http://vmdbA:8428/api/v1/write \
--remoteWrite.tmpDataPath=/data/vmagent
...
2021-01-22T15:06:38.344Z info VictoriaMetrics/app/vmagent/main.go:88 starting vmagent at "127.0.0.1:8429"...
2021-01-22T15:06:38.350Z info VictoriaMetrics/lib/memory/memory.go:43 limiting caches to 161940359577 bytes, leaving 107960239719 bytes to the OS according to -memory.allowedPercent=60
2021-01-22T15:06:38.350Z error VictoriaMetrics/lib/persistentqueue/persistentqueue.go:214 cannot read metainfo for persistent queue from "/data/vmagent/persistent-queue/1_28103F80D27BCD31/metainfo.json": invalid data read from "/data/vmagent/persistent-queue/1_28103F80D27BCD31/metainfo.json": readerOffset=536871040 cannot exceed writerOffset=16701900; re-creating "/data/vmagent/persistent-queue/1_28103F80D27BCD31"
2021-01-22T15:06:38.364Z info VictoriaMetrics/lib/persistentqueue/fastqueue.go:51 opened fast persistent queue at "/data/vmagent/persistent-queue/1_28103F80D27BCD31" with maxInmemoryBlocks=200, it contains 0 pending bytes
2021-01-22T15:06:38.364Z info VictoriaMetrics/app/vmagent/remotewrite/client.go:130 initialized client for -remoteWrite.url="1:secret-url"
2021-01-22T15:06:38.366Z info VictoriaMetrics/app/vmagent/main.go:111 started vmagent in 0.021 seconds
2021-01-22T15:06:38.366Z info VictoriaMetrics/lib/promscrape/scraper.go:91 reading Prometheus configs from "/data/vmagent/promscrape.json"
2021-01-22T15:06:38.366Z info VictoriaMetrics/lib/httpserver/httpserver.go:82 starting http server at http://127.0.0.1:8429/
2021-01-22T15:06:38.366Z info VictoriaMetrics/lib/httpserver/httpserver.go:83 pprof handlers are exposed at http://127.0.0.1:8429/debug/pprof/
2021-01-22T15:06:38.371Z info VictoriaMetrics/lib/promscrape/scraper.go:344 static_configs: added targets: 3, removed targets: 0; total targets: 3
queue/persistentqueue.go:512 FATAL: too big block size read from "/data/vmagent/persistent-queue/1_28103F80D27BCD31/0000000000000000": 8255741331277644361 bytes; cannot exceed 33554432 bytes
panic: FATAL: too big block size read from "/data/vmagent/persistent-queue/1_28103F80D27BCD31/0000000000000000": 8255741331277644361 bytes; cannot exceed 33554432 bytes
...
The vmagent scrapes:
- 480 metrics every 10s from vmagent itself
- 311 metrics every 2s from dcgm-exporter
- 534 metrics every 1s from node-exporter
so 63720000 data pairs a day (~ 26 MB?).
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 15 (6 by maintainers)
Commits related to this issue
- lib/persistentqueue: delete corrupted persistent queue instead of throwing a fatal error Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1030 — committed to VictoriaMetrics/VictoriaMetrics by valyala 3 years ago
- lib/persistentqueue: delete corrupted persistent queue instead of throwing a fatal error Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1030 — committed to VictoriaMetrics/VictoriaMetrics by valyala 3 years ago
vmagentshould automatically recover when reading from corrupted persistent queue starting from v1.58.0. Closing the issue as fixed.@jelmd ,
vmagentshould recover when reading from corrupted persistent queue at-remoteWrite.tmpDataPathstarting from the commit 95dbebf51214ae459537e8b11e14720a3a587784 . This commit will be included in the next release.Hm, while VM itself is resistant for
power resetcases, the vmagent is not, unfortunately. It needs a proper shutdown procedure to finish on-disk writes. I guess, it may be marked as enhancement. Thanks for report!