VictoriaMetrics: query_range is more and more slowly

Describe the bug

  1. query_range is more and more slowly, but will be recovery at 08:00 every day(utc 00:00)
  2. vminsert always oom

Expected behavior query_range will be more faster

Screenshots 111

Version The line returned when passing --version command line flag to binary. For example:

./vmstorage-prod -version
vmstorage-20200623-210915-heads-cluster-0-g46c5c077

Used command-line flags

./vminsert-prod -influxSkipSingleField -insert.maxQueueDuration 10s -storageNode=10.15.38.29:8400 -storageNode=10.15.38.30:8400 -storageNode=10.15.59.47:8400 -storageNode=10.15.59.48:8400 -storageNode=10.15.59.49:8400 -storageNode=10.15.34.61:8400 -storageNode=10.15.38.19:8400 -storageNode=10.15.38.20:8400 -storageNode=10.15.53.33:8400 -storageNode=10.15.59.20:8400 -storageNode=10.15.36.41:8400
./vmselect-prod -search.maxQueryLen 32768 -storageNode=10.15.78.25:8401 -storageNode=10.15.77.18:8401 -storageNode=10.15.77.19:8401 -storageNode=10.15.77.20:8401 -storageNode=10.15.68.27:8401 -storageNode=10.15.78.44:8401 -storageNode=10.15.81.42:8401 -storageNode=10.15.81.46:8401 -storageNode=10.15.87.23:8401 -storageNode=10.15.49.19:8401 -storageNode=10.15.58.61:8401 -storageNode=10.15.38.29:8401 -storageNode=10.15.38.30:8401 -storageNode=10.15.59.47:8401 -storageNode=10.15.59.48:8401 -storageNode=10.15.59.49:8401 -storageNode=10.15.34.61:8401 -storageNode=10.15.38.19:8401 -storageNode=10.15.38.20:8401 -storageNode=10.15.53.33:8401 -storageNode=10.15.59.20:8401 -storageNode=10.15.36.41:8401
./vmstorage-prod -storageDataPath /data1/vmdata -retentionPeriod 6 -search.maxUniqueTimeseries 5000000

Additional context

  1. we have 11 vmstorage instances. total ingest rate ~ 2.6M points/s
  2. vmstorage log by grep '^2020-07.*error' app.log vmstorage.error.log
  3. vminsert log by grep -E '2020-07-01.*(error|warn)' vminsert.log vminsert.error.log.tar.gz
  4. vmselect log by grep -E '2020-07-01.*(warn|error)' vmselect.log | grep -v 'slow query according' | grep -Ev 'VictoriaMetrics/app/vmselect/main.go:31[78]' vmselect.error.log.tar.gz
  5. vmstorage pprof pprof.vmstorage-prod.samples.cpu.001.pb.gz@utc 04:47 pprof.vmstorage-prod.samples.cpu.002.pb.gz@utc 05:40

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 25 (16 by maintainers)

Commits related to this issue

Most upvoted comments

FYI, the commit that removes dmis.Has calls from data ingestion path has been included in v1.38.0 release.

@n4mine , thanks for the idea about removing dmis.Has check inside Storage.add! It has been appeared it is quite easy to implement it by just resetting MetricName->TSID cache after the deletion of time series. This guarantees that the cache won’t contain entries for the deleted time series. See the commit fe58462bef9f6c211a036fa1e4f9cf3ced4b9ad4 .