VictoriaMetrics: vmagent: drop corrupted data blocks on start up

There seems to be a problem wrt. handling cached data by vmagent v1.52.0:

/usr/sbin/vmagent \
--remoteWrite.flushInterval=5s --httpListenAddr=127.0.0.1:8429 \
--promscrape.config=/data/vmagent/promscrape.json --promscrape.disableKeepAlive=false \
--promscrape.disableCompression --remoteWrite.url=http://vmdbA:8428/api/v1/write \
--remoteWrite.tmpDataPath=/data/vmagent
...
2021-01-22T15:06:38.344Z    info    VictoriaMetrics/app/vmagent/main.go:88  starting vmagent at "127.0.0.1:8429"...
2021-01-22T15:06:38.350Z    info    VictoriaMetrics/lib/memory/memory.go:43 limiting caches to 161940359577 bytes, leaving 107960239719 bytes to the OS according to -memory.allowedPercent=60
2021-01-22T15:06:38.350Z    error   VictoriaMetrics/lib/persistentqueue/persistentqueue.go:214  cannot read metainfo for persistent queue from "/data/vmagent/persistent-queue/1_28103F80D27BCD31/metainfo.json": invalid data read from "/data/vmagent/persistent-queue/1_28103F80D27BCD31/metainfo.json": readerOffset=536871040 cannot exceed writerOffset=16701900; re-creating "/data/vmagent/persistent-queue/1_28103F80D27BCD31"
2021-01-22T15:06:38.364Z    info    VictoriaMetrics/lib/persistentqueue/fastqueue.go:51 opened fast persistent queue at "/data/vmagent/persistent-queue/1_28103F80D27BCD31" with maxInmemoryBlocks=200, it contains 0 pending bytes
2021-01-22T15:06:38.364Z    info    VictoriaMetrics/app/vmagent/remotewrite/client.go:130   initialized client for -remoteWrite.url="1:secret-url"
2021-01-22T15:06:38.366Z    info    VictoriaMetrics/app/vmagent/main.go:111 started vmagent in 0.021 seconds
2021-01-22T15:06:38.366Z    info    VictoriaMetrics/lib/promscrape/scraper.go:91    reading Prometheus configs from "/data/vmagent/promscrape.json"
2021-01-22T15:06:38.366Z    info    VictoriaMetrics/lib/httpserver/httpserver.go:82 starting http server at http://127.0.0.1:8429/
2021-01-22T15:06:38.366Z    info    VictoriaMetrics/lib/httpserver/httpserver.go:83 pprof handlers are exposed at http://127.0.0.1:8429/debug/pprof/
2021-01-22T15:06:38.371Z    info    VictoriaMetrics/lib/promscrape/scraper.go:344   static_configs: added targets: 3, removed targets: 0; total targets: 3
queue/persistentqueue.go:512    FATAL: too big block size read from "/data/vmagent/persistent-queue/1_28103F80D27BCD31/0000000000000000": 8255741331277644361 bytes; cannot exceed 33554432 bytes
panic: FATAL: too big block size read from "/data/vmagent/persistent-queue/1_28103F80D27BCD31/0000000000000000": 8255741331277644361 bytes; cannot exceed 33554432 bytes
...

The vmagent scrapes:

  • 480 metrics every 10s from vmagent itself
  • 311 metrics every 2s from dcgm-exporter
  • 534 metrics every 1s from node-exporter

so 63720000 data pairs a day (~ 26 MB?).

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 15 (6 by maintainers)

Commits related to this issue

Most upvoted comments

vmagent should automatically recover when reading from corrupted persistent queue starting from v1.58.0. Closing the issue as fixed.

@jelmd , vmagent should recover when reading from corrupted persistent queue at -remoteWrite.tmpDataPath starting from the commit 95dbebf51214ae459537e8b11e14720a3a587784 . This commit will be included in the next release.

Hm, while VM itself is resistant for power reset cases, the vmagent is not, unfortunately. It needs a proper shutdown procedure to finish on-disk writes. I guess, it may be marked as enhancement. Thanks for report!