VictoriaMetrics: query_range is more and more slowly

Describe the bug

query_range is more and more slowly, but will be recovery at 08:00 every day(utc 00:00)
vminsert always oom

Expected behavior query_range will be more faster

Screenshots 111

Version The line returned when passing --version command line flag to binary. For example:

./vmstorage-prod -version
vmstorage-20200623-210915-heads-cluster-0-g46c5c077

Used command-line flags

./vminsert-prod -influxSkipSingleField -insert.maxQueueDuration 10s -storageNode=10.15.38.29:8400 -storageNode=10.15.38.30:8400 -storageNode=10.15.59.47:8400 -storageNode=10.15.59.48:8400 -storageNode=10.15.59.49:8400 -storageNode=10.15.34.61:8400 -storageNode=10.15.38.19:8400 -storageNode=10.15.38.20:8400 -storageNode=10.15.53.33:8400 -storageNode=10.15.59.20:8400 -storageNode=10.15.36.41:8400

./vmselect-prod -search.maxQueryLen 32768 -storageNode=10.15.78.25:8401 -storageNode=10.15.77.18:8401 -storageNode=10.15.77.19:8401 -storageNode=10.15.77.20:8401 -storageNode=10.15.68.27:8401 -storageNode=10.15.78.44:8401 -storageNode=10.15.81.42:8401 -storageNode=10.15.81.46:8401 -storageNode=10.15.87.23:8401 -storageNode=10.15.49.19:8401 -storageNode=10.15.58.61:8401 -storageNode=10.15.38.29:8401 -storageNode=10.15.38.30:8401 -storageNode=10.15.59.47:8401 -storageNode=10.15.59.48:8401 -storageNode=10.15.59.49:8401 -storageNode=10.15.34.61:8401 -storageNode=10.15.38.19:8401 -storageNode=10.15.38.20:8401 -storageNode=10.15.53.33:8401 -storageNode=10.15.59.20:8401 -storageNode=10.15.36.41:8401

./vmstorage-prod -storageDataPath /data1/vmdata -retentionPeriod 6 -search.maxUniqueTimeseries 5000000

Additional context

we have 11 vmstorage instances. total ingest rate ~ 2.6M points/s
vmstorage log by grep '^2020-07.*error' app.log vmstorage.error.log
vminsert log by grep -E '2020-07-01.*(error|warn)' vminsert.log vminsert.error.log.tar.gz
vmselect log by grep -E '2020-07-01.*(warn|error)' vmselect.log | grep -v 'slow query according' | grep -Ev 'VictoriaMetrics/app/vmselect/main.go:31[78]' vmselect.error.log.tar.gz
vmstorage pprof pprof.vmstorage-prod.samples.cpu.001.pb.gz@utc 04:47 pprof.vmstorage-prod.samples.cpu.002.pb.gz@utc 05:40

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 25 (16 by maintainers)

Commits related to this issue

lib/storage: the `dmis` is no longer check in `insert path` (#596) — committed to n4mine/VictoriaMetrics by n4mine 4 years ago
lib/storage: reset MetricName->TSID cache after deleting time series This should prevent from adding new data points to deleted time series without the need to check for the deleted time series. Thi... — committed to VictoriaMetrics/VictoriaMetrics by valyala 4 years ago
lib/storage: reset MetricName->TSID cache after deleting time series This should prevent from adding new data points to deleted time series without the need to check for the deleted time series. Thi... — committed to VictoriaMetrics/VictoriaMetrics by valyala 4 years ago

Most upvoted comments

FYI, the commit that removes dmis.Has calls from data ingestion path has been included in v1.38.0 release.

valyala on Jul 8, 2020

@n4mine , thanks for the idea about removing dmis.Has check inside Storage.add! It has been appeared it is quite easy to implement it by just resetting MetricName->TSID cache after the deletion of time series. This guarantees that the cache won’t contain entries for the deleted time series. See the commit fe58462bef9f6c211a036fa1e4f9cf3ced4b9ad4 .

valyala on Jul 6, 2020

222

seems it’s slow when determine MetricID in DeletedMetricIDs

https://github.com/VictoriaMetrics/VictoriaMetrics/blob/8bb3622e9dc7309f60b9412090af342d5d5d2192/lib/storage/index_db.go#L1407

n4mine on Jul 3, 2020