VictoriaMetrics: vmstorage is stuck at some disk intensive operation

Describe the bug I should first say that’s a bizarre case: There is a cluster with 6 shards of vmstorage components. At some point, when we decided to shrink memory for a Pod a bit (60GB --> 50GB because we added more shards before) we faced a problem when one of the vmstorage stuck in some Disk operation and stopped receiving data (or receiving significantly less than it should). Returning memory back didn’t help. This pod blocks data ingestion from Vmagent as if we shutdown this vmstorage, we will get restore in ingestion, but when it’s back - we blocked. Screenshot 2022-08-04 at 22 59 06

** PPROF ** pprof.zip

To Reproduce don’t know to be honest

Expected behavior No Disk operation is blocking the ingestion of data

Screenshots Screenshot 2022-08-04 at 23 06 38 Screenshot 2022-08-04 at 23 06 56 Screenshot 2022-08-04 at 23 06 44 Screenshot 2022-08-04 at 23 07 20 Screenshot 2022-08-04 at 23 07 28 Screenshot 2022-08-04 at 23 07 09 Screenshot 2022-08-04 at 23 07 04 Screenshot 2022-08-04 at 23 07 36 Screenshot 2022-08-04 at 23 07 00 Screenshot 2022-08-04 at 23 07 32

Version

/ # /vmstorage-prod --version
vmstorage-20220801-081914-tags-v1.79.1-cluster-0-g8b4726ed7

Used command-line flags

    Args:
      --retentionPeriod=6
      --storageDataPath=/storage
      --dedup.minScrapeInterval=15s
      --envflag.enable=true
      --envflag.prefix=VM_
      --http.maxGracefulShutdownDuration=10s
      --http.shutdownDelay=5s
      --loggerFormat=json
      --loggerLevel=INFO
      --loggerOutput=stdout
      --memory.allowedPercent=60
      --search.maxUniqueTimeseries=10000000

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 16

Most upvoted comments

TSID cache size on restart

Cache structure in VictoriaMetrics works in different modes. One of the modes (picked mode depends on amount of occupied memory by the entries in cache) works in a split fashion, where the cache entries are split into two parts: current and previous. These parts interchange each other ones current part is already full. The split mode may be enabled on the cache load if all dedicated space to the cache was already occupied. So on load time the oldest cache part may be evicted. Another detail is that entries eviction from caches happens each 5min or on the first load time. It is likely that some entries were evicted on the load time as well.

TSID cache utilization and resets

Please see prev response. More details about cache structure can be found here https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/lib/workingsetcache/cache.go

TSID cache lookup

Please find detailed explanation here https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007#issuecomment-1032080840

Adding new shards

There is no way, so far, except manual migration. And that won’t help as well, since you’ll need to have different retentions on storage nodes to do that. Delete API is also not recommended to use in this case, since it doesn’t actually delete the data, only marks it as so.

But still, adding new shards would improve the overall performance of the cluster. If you read carefully this comment you’ll see that cache lookups may be improved by extending capacity of indexdb/dataBlocks cache. This can be done either by increasing available memory (to the process or to this specific cache) or by adding more shards to the cluster.

alright, I have some thoughts to share and also some questions regarding cache of vmstorage.

  • At first, I should say that some vmstorage nodes bound to IO on reads from the filesystem, which we can clearly see on the screenshots above.
  • I was pointed several times to the cache, in particular to the TSID cache (which can be the bottle neck with the high churn rate or in the case of rerouting ) because from my understanding TSID is the cache to look before registering new metric in the storage. This part is clear.
  • also, I can clearly see that ingestion into Nodes with fewer data (new shards) is way faster (cache miss for TSID cache is way lower ) than with more data. Here I have 3 nodes with 4-5 TB of data each (6 months of data) and 3 nodes with 1TB of data each (2 months of data), which were recently added to scale up the cluster and spread the load.

For here, I have several questions regarding TSID cache:

1. TSID cache size on restart

If I restart one pod of vmstorage I can see following picture: save cache:

 {"ts":"2022-08-26T11:21:59.203Z",....,"msg":"saving MetricName->TSID cache to \"/storage/cache/metricName_tsid\"..."}
{"ts":"2022-08-26T11:21:59.581Z",....,"msg":"saved MetricName->TSID cache to \"/storage/cache/metricName_tsid\" in 0.377 seconds; entriesCount: 3198508; sizeBytes: 1660420096"}
{"ts":"2022-08-26T11:21:59.583Z",....,"msg":"saving MetricID->TSID cache to \"/storage/cache/metricID_tsid\"..."}
{"ts":"2022-08-26T11:21:59.602Z",....,"msg":"saved MetricID->TSID cache to \"/storage/cache/metricID_tsid\" in 0.019 seconds; entriesCount: 348323; sizeBytes: 67108864"}

restoring cache

{"ts":"2022-08-26T11:22:23.517Z",....,"msg":"loading MetricName->TSID cache from \"/storage/cache/metricName_tsid\"..."}
{"ts":"2022-08-26T11:22:24.280Z",....,"msg":"loaded MetricName->TSID cache from \"/storage/cache/metricName_tsid\" in 0.763 seconds; entriesCount: 1599126; sizeBytes: 830078976"}
{"ts":"2022-08-26T11:22:24.280Z",....,"msg":"loading MetricID->TSID cache from \"/storage/cache/metricID_tsid\"..."}
{"ts":"2022-08-26T11:22:24.480Z",....,"msg":"loaded MetricID->TSID cache from \"/storage/cache/metricID_tsid\" in 0.200 seconds; entriesCount: 174161; sizeBytes: 33554432"}

as you can see, size of the cache is different on persisting and restoring phases, basically, it’s 2 times fewer on restore. Is that mean that vmstorage does not restore the whole cache? or it’s just the wrong metadata? the same picture I can get from the metrics: Screenshot 2022-08-26 at 13 27 32 so, it’s saving 3 mil entries but restoring only half.

Another unclear moment is regarding metric and type vm_cache_size_bytes{type="storage/tsid"}, seems like it matches by amount of entries /storage/cache/metricName_tsid folder, but what metric covers /storage/cache/metricID_tsid than?

2. TSID cache utilization and resets

Is there a cache reset every 5 minutes? Screenshot 2022-08-26 at 14 11 55

3. TSID cache lookup

Another thing that I would like to understand, is how TSID lookup works in case of a cache miss? which files vmstorage is trying to read? I thought that force merge on heavy nodes for old partitions (> 2 months) will help us and speed up TSID lookup, but seems like we got even slower injection rate after that (more IO reads)

4. Adding new shards

If we are adding more shards to the growing cluster, is there a way to rebalance data between shards? because obviously, new shards would be light and fast, but old ones still be slow. So from here I only see two options to make them equal:

  • wait retention period for old data to be dropped from the old shards
  • setup new cluster and migrate data there via vmctl

is there another way?

we have 3 overloaded shards and 3 are almost empty

“We recommend to run a cluster with big number of small vmstorage nodes instead of a cluster with small number of big vmstorage nodes. This increases chances that the cluster remains available and stable when some of vmstorage nodes are temporarily unavailable during maintenance events such as upgrades, configuration changes or migrations.”

See more details here https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#capacity-planning

I think we can close this issue 😃

Ok! Thank you for good conversation!