VictoriaMetrics: OOM -killed vmstorage causing by expensive queries

Version: v1.27.3 VM used as remote storage for Prometheus in k8s • setup is following: vmselect x1(8Gi), vminsert x1 (2Gi), vmstorage x1(16Gi) • number of active timeseries(vm_cache_entries{type=“storage/hour_metric_ids”}) is ~3kk • max_allowed_memory is calculated properly for containers Queries for time-range more than 2d causing OOMs in vmstorage, which results in crashloops for vminsert and vmselect. The query example is following: sum(rate(http_response_time_seconds_sum{job=~\"|skipper-prod|vesld\",env=~\".*\",path!~\".+/(health|healthcheck)\",namespace=~\"kube-system|prod\",cluster=~\".*\"}[10m])) by (job, env) /sum(rate(http_response_time_seconds_count{job=~\"skipper-prod|vesld\",env=~\".*\",path!~\".+/(health|healthcheck)\",namespace=~\"kube-system|prod\",cluster=~\".*\"}[10m])) by (job, env) @tenmozes @valyala @hagen1778 has been in touched with folks about this Screen Shot 2019-09-24 at 1 53 13 PM

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 16

Commits related to this issue

Most upvoted comments

FYI, the following patch reduced memory usage when processing heavy queries in vmselect - 56dff57f77e19b342b8b4d5f4795a4b9fb30412c . The upcoming release will contain this patch. As for memory usage for vmstorage nodes, it should be reduced too for heavy queries in the upcoming release (v1.28.0) comparing to v.1.27.3 due to re-worked and optimized path when looking up time series with label filters.