VictoriaMetrics: vmagent: drop corrupted data blocks on start up

There seems to be a problem wrt. handling cached data by vmagent v1.52.0:

/usr/sbin/vmagent \
--remoteWrite.flushInterval=5s --httpListenAddr=127.0.0.1:8429 \
--promscrape.config=/data/vmagent/promscrape.json --promscrape.disableKeepAlive=false \
--promscrape.disableCompression --remoteWrite.url=http://vmdbA:8428/api/v1/write \
--remoteWrite.tmpDataPath=/data/vmagent
...
2021-01-22T15:06:38.344Z    info    VictoriaMetrics/app/vmagent/main.go:88  starting vmagent at "127.0.0.1:8429"...
2021-01-22T15:06:38.350Z    info    VictoriaMetrics/lib/memory/memory.go:43 limiting caches to 161940359577 bytes, leaving 107960239719 bytes to the OS according to -memory.allowedPercent=60
2021-01-22T15:06:38.350Z    error   VictoriaMetrics/lib/persistentqueue/persistentqueue.go:214  cannot read metainfo for persistent queue from "/data/vmagent/persistent-queue/1_28103F80D27BCD31/metainfo.json": invalid data read from "/data/vmagent/persistent-queue/1_28103F80D27BCD31/metainfo.json": readerOffset=536871040 cannot exceed writerOffset=16701900; re-creating "/data/vmagent/persistent-queue/1_28103F80D27BCD31"
2021-01-22T15:06:38.364Z    info    VictoriaMetrics/lib/persistentqueue/fastqueue.go:51 opened fast persistent queue at "/data/vmagent/persistent-queue/1_28103F80D27BCD31" with maxInmemoryBlocks=200, it contains 0 pending bytes
2021-01-22T15:06:38.364Z    info    VictoriaMetrics/app/vmagent/remotewrite/client.go:130   initialized client for -remoteWrite.url="1:secret-url"
2021-01-22T15:06:38.366Z    info    VictoriaMetrics/app/vmagent/main.go:111 started vmagent in 0.021 seconds
2021-01-22T15:06:38.366Z    info    VictoriaMetrics/lib/promscrape/scraper.go:91    reading Prometheus configs from "/data/vmagent/promscrape.json"
2021-01-22T15:06:38.366Z    info    VictoriaMetrics/lib/httpserver/httpserver.go:82 starting http server at http://127.0.0.1:8429/
2021-01-22T15:06:38.366Z    info    VictoriaMetrics/lib/httpserver/httpserver.go:83 pprof handlers are exposed at http://127.0.0.1:8429/debug/pprof/
2021-01-22T15:06:38.371Z    info    VictoriaMetrics/lib/promscrape/scraper.go:344   static_configs: added targets: 3, removed targets: 0; total targets: 3
queue/persistentqueue.go:512    FATAL: too big block size read from "/data/vmagent/persistent-queue/1_28103F80D27BCD31/0000000000000000": 8255741331277644361 bytes; cannot exceed 33554432 bytes
panic: FATAL: too big block size read from "/data/vmagent/persistent-queue/1_28103F80D27BCD31/0000000000000000": 8255741331277644361 bytes; cannot exceed 33554432 bytes
...

The vmagent scrapes:

480 metrics every 10s from vmagent itself
311 metrics every 2s from dcgm-exporter
534 metrics every 1s from node-exporter

so 63720000 data pairs a day (~ 26 MB?).

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 15 (6 by maintainers)

Commits related to this issue

lib/persistentqueue: delete corrupted persistent queue instead of throwing a fatal error Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1030 — committed to VictoriaMetrics/VictoriaMetrics by valyala 3 years ago
lib/persistentqueue: delete corrupted persistent queue instead of throwing a fatal error Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1030 — committed to VictoriaMetrics/VictoriaMetrics by valyala 3 years ago

Most upvoted comments

vmagent should automatically recover when reading from corrupted persistent queue starting from v1.58.0. Closing the issue as fixed.

valyala on Apr 8, 2021

@jelmd , vmagent should recover when reading from corrupted persistent queue at -remoteWrite.tmpDataPath starting from the commit 95dbebf51214ae459537e8b11e14720a3a587784 . This commit will be included in the next release.

valyala on Apr 5, 2021

Hm, while VM itself is resistant for power reset cases, the vmagent is not, unfortunately. It needs a proper shutdown procedure to finish on-disk writes. I guess, it may be marked as enhancement. Thanks for report!

hagen1778 on Feb 3, 2021