prometheus: Memory Leak in v2.17.1: tsdb.txRing consumes much heap memory
What did you do?
We launched a Prometheus 2.17.1 instance with docker and have been running it for 5 ~ 7 days
What did you expect to see?
The memory usage is as stable as other Prometheus instances
What did you see instead? Under which circumstances?
The available memory of the node which hosts the Prometheus 2.17.1 instance gets decreasing like below:

The decreasing green and purple line represents metrics of nodes that host Prometheus 2.17.1, other lines are nodes hosting Prometheus 2.13.
Heap dump of green host

I understand that there is overhead in resource usage as release note says, but this is more than we expected.
Environment
-
System information: Linux 3.10.0-514.10.2.el7.x86_64 x86_64
-
Prometheus version:
prometheus, version 2.17.1 (branch: HEAD, revision: ae041f97cfc6f43494bed65ec4ea4e3a0cf2ac69)
build user: root@806b02dfe114
build date: 20200326-16:18:19
go version: go1.13.9
- Prometheus configuration file:
prometheus.yml
global:
scrape_interval: 10s
scrape_timeout: 5s
evaluation_interval: 10s
external_labels:
global:
scrape_interval: 10s
scrape_timeout: 5s
evaluation_interval: 10s
external_labels:
cluster_name: XXX
phase: beta
role: HBaseUnit
alerting:
alertmanagers:
- static_configs:
- targets:
- XXX
scheme: http
timeout: 10s
api_version: v1
rule_files:
- /prometheus/alerts.yml
scrape_configs:
- job_name: hbase
honor_timestamps: true
scrape_interval: 10s
scrape_timeout: 5s
metrics_path: /metrics
scheme: http
file_sd_configs:
- files:
- /prometheus/hbases.yml
refresh_interval: 5m
relabel_configs:
- source_labels: [__address__]
separator: ;
regex: (.*):\d+
target_label: instance
replacement: ${1}
action: replace
- job_name: namenode
honor_timestamps: true
scrape_interval: 10s
scrape_timeout: 5s
metrics_path: /metrics
scheme: http
file_sd_configs:
- files:
- /prometheus/namenodes.yml
refresh_interval: 5m
relabel_configs:
- source_labels: [__address__]
separator: ;
regex: (.*):\d+
target_label: instance
replacement: ${1}
action: replace
- job_name: datanode
honor_timestamps: true
scrape_interval: 10s
scrape_timeout: 5s
metrics_path: /metrics
scheme: http
file_sd_configs:
- files:
- /prometheus/datanodes.yml
refresh_interval: 5m
relabel_configs:
- source_labels: [__address__]
separator: ;
regex: (.*):\d+
target_label: instance
replacement: ${1}
action: replace
- job_name: node
honor_timestamps: true
scrape_interval: 10s
scrape_timeout: 5s
metrics_path: /metrics
scheme: http
file_sd_configs:
- files:
- /prometheus/nodes.yml
refresh_interval: 5m
relabel_configs:
- source_labels: [__address__]
separator: ;
regex: (.*):\d+
target_label: instance
replacement: ${1}
action: replace
- logs
log
level=info ts=2020-04-09T06:56:28.464Z caller=main.go:635 msg="Server is ready to receive web requests."
level=info ts=2020-04-09T06:56:28.992Z caller=main.go:788 msg="Loading configuration file" filename=/prometheus/prometheus.yml
level=info ts=2020-04-09T06:56:28.996Z caller=main.go:816 msg="Completed loading of configuration file" filename=/prometheus/prometheus.yml
level=info ts=2020-04-09T09:56:35.227Z caller=compact.go:495 component=tsdb msg="write block" mint=1586415394174 maxt=1586419200000 ulid=01E5F5XB279BDHD5BYMGMWCQJE duration=211.398821ms
level=info ts=2020-04-09T09:56:35.273Z caller=head.go:662 component=tsdb msg="head GC completed" duration=10.911142ms
level=info ts=2020-04-09T11:00:00.561Z caller=compact.go:495 component=tsdb msg="write block" mint=1586419200000 maxt=1586426400000 ulid=01E5F9HF4953H200V0NGJM8DG0 duration=295.795606ms
level=info ts=2020-04-09T11:00:00.642Z caller=head.go:662 component=tsdb msg="head GC completed" duration=12.895769ms
level=info ts=2020-04-09T13:00:00.711Z caller=compact.go:495 component=tsdb msg="write block" mint=1586426400000 maxt=1586433600000 ulid=01E5FGD6E70NPBQ5Z8BR8XE07K duration=384.109789ms
level=info ts=2020-04-09T13:00:00.760Z caller=head.go:662 component=tsdb msg="head GC completed" duration=14.807925ms
level=info ts=2020-04-09T15:00:00.598Z caller=compact.go:495 component=tsdb msg="write block" mint=1586433600000 maxt=1586440800000 ulid=01E5FQ8XN65NV1HHMZHMQY00JF duration=303.040387ms
level=info ts=2020-04-09T15:00:00.665Z caller=head.go:662 component=tsdb msg="head GC completed" duration=14.016044ms
level=info ts=2020-04-09T17:00:00.609Z caller=compact.go:495 component=tsdb msg="write block" mint=1586440800000 maxt=1586448000000 ulid=01E5FY4MWB8JT70G37MWN86A24 duration=341.741478ms
level=info ts=2020-04-09T17:00:00.689Z caller=head.go:662 component=tsdb msg="head GC completed" duration=14.505675ms
level=info ts=2020-04-09T17:00:01.673Z caller=head.go:732 component=tsdb msg="WAL checkpoint complete" first=0 last=1 duration=983.625939ms
level=info ts=2020-04-09T17:00:02.232Z caller=compact.go:441 component=tsdb msg="compact blocks" count=3 mint=1586415394174 maxt=1586433600000 ulid=01E5FY4PA0QK840RA6TN3GAG0X sources="[01E5F5XB279BDHD5BYMGMWCQJE 01E5F9HF4953H200V0NGJM8DG0 01E5FGD6E70NPBQ5Z8BR8XE07K]" duration=504.241448ms
level=info ts=2020-04-09T19:00:00.571Z caller=compact.go:495 component=tsdb msg="write block" mint=1586448000000 maxt=1586455200000 ulid=01E5G50C5B5GPSBSXD26EYZNH6 duration=272.270768ms
level=info ts=2020-04-09T19:00:00.660Z caller=head.go:662 component=tsdb msg="head GC completed" duration=11.768384ms
level=info ts=2020-04-09T21:00:00.748Z caller=compact.go:495 component=tsdb msg="write block" mint=1586455200000 maxt=1586462400000 ulid=01E5GBW3CAGVY5SAMP7Q0JKW6X duration=482.354982ms
level=info ts=2020-04-09T21:00:00.830Z caller=head.go:662 component=tsdb msg="head GC completed" duration=12.574958ms
level=info ts=2020-04-09T21:00:03.607Z caller=head.go:732 component=tsdb msg="WAL checkpoint complete" first=2 last=3 duration=2.776596565s
level=info ts=2020-04-09T21:00:04.183Z caller=compact.go:441 component=tsdb msg="compact blocks" count=3 mint=1586433600000 maxt=1586455200000 ulid=01E5GBW6NMC0P5NCWGN1RZ2W8B sources="[01E5FQ8XN65NV1HHMZHMQY00JF 01E5FY4MWB8JT70G37MWN86A24 01E5G50C5B5GPSBSXD26EYZNH6]" duration=546.982619ms
level=info ts=2020-04-09T23:00:00.559Z caller=compact.go:495 component=tsdb msg="write block" mint=1586462400000 maxt=1586469600000 ulid=01E5GJQTMT7KBYSTCTF94F1WX0 duration=276.892134ms
level=info ts=2020-04-09T23:00:00.643Z caller=head.go:662 component=tsdb msg="head GC completed" duration=12.686807ms
level=info ts=2020-04-10T01:00:00.844Z caller=compact.go:495 component=tsdb msg="write block" mint=1586469600000 maxt=1586476800000 ulid=01E5GSKHXG3CTVAM8H2ZZX8FRQ duration=540.344496ms
level=info ts=2020-04-10T01:00:00.913Z caller=head.go:662 component=tsdb msg="head GC completed" duration=16.168625ms
level=info ts=2020-04-10T01:00:02.200Z caller=head.go:732 component=tsdb msg="WAL checkpoint complete" first=4 last=5 duration=1.286600341s
level=info ts=2020-04-10T03:00:00.521Z caller=compact.go:495 component=tsdb msg="write block" mint=1586476800000 maxt=1586484000000 ulid=01E5H0F949JVQ8WZH5A18BWVZV duration=255.834275ms
level=info ts=2020-04-10T03:00:00.578Z caller=head.go:662 component=tsdb msg="head GC completed" duration=11.634575ms
level=info ts=2020-04-10T03:00:01.164Z caller=compact.go:441 component=tsdb msg="compact blocks" count=3 mint=1586455200000 maxt=1586476800000 ulid=01E5H0F9FKV0141FTDJD453X4S sources="[01E5GBW3CAGVY5SAMP7Q0JKW6X 01E5GJQTMT7KBYSTCTF94F1WX0 01E5GSKHXG3CTVAM8H2ZZX8FRQ]" duration=537.117355ms
level=info ts=2020-04-10T05:00:00.540Z caller=compact.go:495 component=tsdb msg="write block" mint=1586484000000 maxt=1586491200000 ulid=01E5H7B0CCP3BA24YKZ52VYQ9V duration=271.39912ms
level=info ts=2020-04-10T05:00:00.605Z caller=head.go:662 component=tsdb msg="head GC completed" duration=16.021546ms
level=info ts=2020-04-10T05:00:01.565Z caller=head.go:732 component=tsdb msg="WAL checkpoint complete" first=6 last=7 duration=959.823352ms
level=info ts=2020-04-10T07:00:00.554Z caller=compact.go:495 component=tsdb msg="write block" mint=1586491200000 maxt=1586498400000 ulid=01E5HE6QMM9EXP4XFB9WFMB89A duration=278.264692ms
level=info ts=2020-04-10T07:00:00.632Z caller=head.go:662 component=tsdb msg="head GC completed" duration=14.531457ms
level=info ts=2020-04-10T09:00:00.705Z caller=compact.go:495 component=tsdb msg="write block" mint=1586498400000 maxt=1586505600000 ulid=01E5HN2EX7TNRQP3JQN8AC2Z20 duration=409.230718ms
level=info ts=2020-04-10T09:00:00.780Z caller=head.go:662 component=tsdb msg="head GC completed" duration=12.824101ms
level=info ts=2020-04-10T09:00:02.134Z caller=head.go:732 component=tsdb msg="WAL checkpoint complete" first=8 last=9 duration=1.353600004s
level=info ts=2020-04-10T09:00:02.670Z caller=compact.go:441 component=tsdb msg="compact blocks" count=3 mint=1586476800000 maxt=1586498400000 ulid=01E5HN2GSFRPSSYD28V5Z5QKBQ sources="[01E5H0F949JVQ8WZH5A18BWVZV 01E5H7B0CCP3BA24YKZ52VYQ9V 01E5HE6QMM9EXP4XFB9WFMB89A]" duration=446.638371ms
level=info ts=2020-04-10T09:00:07.759Z caller=compact.go:441 component=tsdb msg="compact blocks" count=3 mint=1586433600000 maxt=1586498400000 ulid=01E5HN2HDV821KWVP21S7QR8E3 sources="[01E5GBW6NMC0P5NCWGN1RZ2W8B 01E5H0F9FKV0141FTDJD453X4S 01E5HN2GSFRPSSYD28V5Z5QKBQ]" duration=4.883802261s
level=info ts=2020-04-10T11:00:00.568Z caller=compact.go:495 component=tsdb msg="write block" mint=1586505600000 maxt=1586512800000 ulid=01E5HVY64N69P7T15PC35KPE51 duration=290.282037ms
level=info ts=2020-04-10T11:00:00.648Z caller=head.go:662 component=tsdb msg="head GC completed" duration=22.199752ms
level=info ts=2020-04-10T11:00:04.026Z caller=compact.go:441 component=tsdb msg="compact blocks" count=2 mint=1586415394174 maxt=1586498400000 ulid=01E5HVY6HP17ZCFQ0E4J2S336B sources="[01E5FY4PA0QK840RA6TN3GAG0X 01E5HN2HDV821KWVP21S7QR8E3]" duration=3.332164559s
level=info ts=2020-04-10T13:00:00.545Z caller=compact.go:495 component=tsdb msg="write block" mint=1586512800000 maxt=1586520000000 ulid=01E5J2SXCDF4VST7N40X3JC13X duration=276.77561ms
level=info ts=2020-04-10T13:00:00.625Z caller=head.go:662 component=tsdb msg="head GC completed" duration=12.129034ms
level=info ts=2020-04-10T13:00:02.213Z caller=head.go:732 component=tsdb msg="WAL checkpoint complete" first=10 last=11 duration=1.587435813s
level=info ts=2020-04-10T15:00:00.562Z caller=compact.go:495 component=tsdb msg="write block" mint=1586520000000 maxt=1586527200000 ulid=01E5J9NMN9TEHDHZ454KDQMAM1 duration=265.139045ms
level=info ts=2020-04-10T15:00:00.655Z caller=head.go:662 component=tsdb msg="head GC completed" duration=11.98756ms
level=info ts=2020-04-10T15:00:01.212Z caller=compact.go:441 component=tsdb msg="compact blocks" count=3 mint=1586498400000 maxt=1586520000000 ulid=01E5J9NN1WB20G3CFNXKWB3TE2 sources="[01E5HN2EX7TNRQP3JQN8AC2Z20 01E5HVY64N69P7T15PC35KPE51 01E5J2SXCDF4VST7N40X3JC13X]" duration=511.890297ms
level=info ts=2020-04-10T17:00:00.518Z caller=compact.go:495 component=tsdb msg="write block" mint=1586527200000 maxt=1586534400000 ulid=01E5JGHBWHDJQXA4AW8RSAA210 duration=245.695543ms
level=info ts=2020-04-10T17:00:00.582Z caller=head.go:662 component=tsdb msg="head GC completed" duration=12.033627ms
level=info ts=2020-04-10T17:00:01.610Z caller=head.go:732 component=tsdb msg="WAL checkpoint complete" first=12 last=13 duration=1.027540263s
level=info ts=2020-04-10T19:00:00.531Z caller=compact.go:495 component=tsdb msg="write block" mint=1586534400000 maxt=1586541600000 ulid=01E5JQD34JPMVSS2KEMKWMFJDR duration=256.81357ms
level=info ts=2020-04-10T19:00:00.590Z caller=head.go:662 component=tsdb msg="head GC completed" duration=12.890328ms
level=info ts=2020-04-10T21:00:00.541Z caller=compact.go:495 component=tsdb msg="write block" mint=1586541600000 maxt=1586548800000 ulid=01E5JY8TCG3TAA0H73AB0VW69B duration=269.069884ms
level=info ts=2020-04-10T21:00:00.611Z caller=head.go:662 component=tsdb msg="head GC completed" duration=12.311055ms
level=info ts=2020-04-10T21:00:01.616Z caller=head.go:732 component=tsdb msg="WAL checkpoint complete" first=14 last=15 duration=1.005203416s
level=info ts=2020-04-10T21:00:02.239Z caller=compact.go:441 component=tsdb msg="compact blocks" count=3 mint=1586520000000 maxt=1586541600000 ulid=01E5JY8VRQEA0DN9SEMV4YJRBD sources="[01E5J9NMN9TEHDHZ454KDQMAM1 01E5JGHBWHDJQXA4AW8RSAA210 01E5JQD34JPMVSS2KEMKWMFJDR]" duration=551.691552ms
level=info ts=2020-04-10T23:00:00.558Z caller=compact.go:495 component=tsdb msg="write block" mint=1586548800000 maxt=1586556000000 ulid=01E5K54HMZ5N76K87YJYMTDD2C duration=270.259282ms
level=info ts=2020-04-10T23:00:00.610Z caller=head.go:662 component=tsdb msg="head GC completed" duration=16.298414ms
level=info ts=2020-04-11T01:00:00.566Z caller=compact.go:495 component=tsdb msg="write block" mint=1586556000000 maxt=1586563200000 ulid=01E5KC08WER6C1K3GNQD6XZTTE duration=296.577851ms
level=info ts=2020-04-11T01:00:00.655Z caller=head.go:662 component=tsdb msg="head GC completed" duration=13.069606ms
level=info ts=2020-04-11T01:00:01.710Z caller=head.go:732 component=tsdb msg="WAL checkpoint complete" first=16 last=17 duration=1.055042039s
level=info ts=2020-04-11T03:00:00.525Z caller=compact.go:495 component=tsdb msg="write block" mint=1586563200000 maxt=1586570400000 ulid=01E5KJW04QCKGRBEJT17BTYZEP duration=245.678439ms
level=info ts=2020-04-11T03:00:00.587Z caller=head.go:662 component=tsdb msg="head GC completed" duration=12.15654ms
level=info ts=2020-04-11T03:00:01.173Z caller=compact.go:441 component=tsdb msg="compact blocks" count=3 mint=1586541600000 maxt=1586563200000 ulid=01E5KJW0FS9WXR0JGRSZ2NFKXQ sources="[01E5JY8TCG3TAA0H73AB0VW69B 01E5K54HMZ5N76K87YJYMTDD2C 01E5KC08WER6C1K3GNQD6XZTTE]" duration=539.164725ms
level=info ts=2020-04-11T03:00:02.458Z caller=compact.go:441 component=tsdb msg="compact blocks" count=3 mint=1586498400000 maxt=1586563200000 ulid=01E5KJW16EJYJ156FVQGXDWZW1 sources="[01E5J9NN1WB20G3CFNXKWB3TE2 01E5JY8VRQEA0DN9SEMV4YJRBD 01E5KJW0FS9WXR0JGRSZ2NFKXQ]" duration=1.100249096s
level=info ts=2020-04-11T05:00:00.610Z caller=compact.go:495 component=tsdb msg="write block" mint=1586570400000 maxt=1586577600000 ulid=01E5KSQQCP0VCWETE7KVWZTC7A duration=331.473667ms
level=info ts=2020-04-11T05:00:00.668Z caller=head.go:662 component=tsdb msg="head GC completed" duration=12.668092ms
level=info ts=2020-04-11T05:00:05.190Z caller=head.go:732 component=tsdb msg="WAL checkpoint complete" first=18 last=19 duration=4.521528576s
level=info ts=2020-04-11T07:00:00.573Z caller=compact.go:495 component=tsdb msg="write block" mint=1586577600000 maxt=1586584800000 ulid=01E5M0KEMVWB2MAFX9H3SZKW4F duration=290.448449ms
level=info ts=2020-04-11T07:00:00.637Z caller=head.go:662 component=tsdb msg="head GC completed" duration=13.051877ms
level=info ts=2020-04-11T09:00:00.585Z caller=compact.go:495 component=tsdb msg="write block" mint=1586584800000 maxt=1586592000000 ulid=01E5M7F5WP2FNV70YPT3J7BN1W duration=307.030812ms
level=info ts=2020-04-11T09:00:00.691Z caller=head.go:662 component=tsdb msg="head GC completed" duration=16.774636ms
level=info ts=2020-04-11T09:00:04.295Z caller=head.go:732 component=tsdb msg="WAL checkpoint complete" first=20 last=21 duration=3.60420494s
level=info ts=2020-04-11T09:00:04.754Z caller=compact.go:441 component=tsdb msg="compact blocks" count=3 mint=1586563200000 maxt=1586584800000 ulid=01E5M7F9V6ECA1T20X8ZF2F6JK sources="[01E5KJW04QCKGRBEJT17BTYZEP 01E5KSQQCP0VCWETE7KVWZTC7A 01E5M0KEMVWB2MAFX9H3SZKW4F]" duration=427.524147ms
level=info ts=2020-04-11T11:00:00.611Z caller=compact.go:495 component=tsdb msg="write block" mint=1586592000000 maxt=1586599200000 ulid=01E5MEAX53CRB6RMF1SWPG9HYS duration=320.496039ms
level=info ts=2020-04-11T11:00:00.704Z caller=head.go:662 component=tsdb msg="head GC completed" duration=13.672307ms
level=info ts=2020-04-11T13:00:00.567Z caller=compact.go:495 component=tsdb msg="write block" mint=1586599200000 maxt=1586606400000 ulid=01E5MN6MCXH3STPX68NMVTADHX duration=281.787174ms
level=info ts=2020-04-11T13:00:00.645Z caller=head.go:662 component=tsdb msg="head GC completed" duration=14.487547ms
level=info ts=2020-04-11T13:00:01.687Z caller=head.go:732 component=tsdb msg="WAL checkpoint complete" first=22 last=23 duration=1.04214388s
level=info ts=2020-04-11T15:00:00.600Z caller=compact.go:495 component=tsdb msg="write block" mint=1586606400000 maxt=1586613600000 ulid=01E5MW2BMQ5DD91FVWJPDSXQAJ duration=320.318101ms
level=info ts=2020-04-11T15:00:00.673Z caller=head.go:662 component=tsdb msg="head GC completed" duration=12.950098ms
level=info ts=2020-04-11T15:00:01.202Z caller=compact.go:441 component=tsdb msg="compact blocks" count=3 mint=1586584800000 maxt=1586606400000 ulid=01E5MW2C2NJMWVKNCVWQB0804W sources="[01E5M7F5WP2FNV70YPT3J7BN1W 01E5MEAX53CRB6RMF1SWPG9HYS 01E5MN6MCXH3STPX68NMVTADHX]" duration=477.553058ms
level=info ts=2020-04-11T17:00:00.540Z caller=compact.go:495 component=tsdb msg="write block" mint=1586613600000 maxt=1586620800000 ulid=01E5N2Y2WTYZ5GNND425BGWS92 duration=258.781623ms
level=info ts=2020-04-11T17:00:00.607Z caller=head.go:662 component=tsdb msg="head GC completed" duration=13.228643ms
level=info ts=2020-04-11T17:00:02.013Z caller=head.go:732 component=tsdb msg="WAL checkpoint complete" first=24 last=25 duration=1.405864056s
level=info ts=2020-04-11T19:00:00.535Z caller=compact.go:495 component=tsdb msg="write block" mint=1586620800000 maxt=1586628000000 ulid=01E5N9ST52Y53S9E263FGJEW4P duration=244.692082ms
level=info ts=2020-04-11T19:00:00.596Z caller=head.go:662 component=tsdb msg="head GC completed" duration=12.475107ms
level=info ts=2020-04-11T21:00:00.572Z caller=compact.go:495 component=tsdb msg="write block" mint=1586628000000 maxt=1586635200000 ulid=01E5NGNHD9ZSCJ1GH2WG9BW4S1 duration=275.046584ms
level=info ts=2020-04-11T21:00:00.670Z caller=head.go:662 component=tsdb msg="head GC completed" duration=14.832234ms
level=info ts=2020-04-11T21:00:01.827Z caller=head.go:732 component=tsdb msg="WAL checkpoint complete" first=26 last=27 duration=1.156987358s
level=info ts=2020-04-11T21:00:02.378Z caller=compact.go:441 component=tsdb msg="compact blocks" count=3 mint=1586606400000 maxt=1586628000000 ulid=01E5NGNJYTZR8SF50RXE1M7RFS sources="[01E5MW2BMQ5DD91FVWJPDSXQAJ 01E5N2Y2WTYZ5GNND425BGWS92 01E5N9ST52Y53S9E263FGJEW4P]" duration=496.225641ms
level=info ts=2020-04-11T21:00:03.469Z caller=compact.go:441 component=tsdb msg="compact blocks" count=3 mint=1586563200000 maxt=1586628000000 ulid=01E5NGNKHGTE4T1K71V5W5PRHQ sources="[01E5M7F9V6ECA1T20X8ZF2F6JK 01E5MW2C2NJMWVKNCVWQB0804W 01E5NGNJYTZR8SF50RXE1M7RFS]" duration=988.676023ms
level=info ts=2020-04-11T23:00:00.645Z caller=compact.go:495 component=tsdb msg="write block" mint=1586635200000 maxt=1586642400000 ulid=01E5NQH8M6XHTGWTSFJ6FK6RGW duration=383.393233ms
level=info ts=2020-04-11T23:00:00.722Z caller=head.go:662 component=tsdb msg="head GC completed" duration=12.137665ms
level=info ts=2020-04-12T01:00:00.563Z caller=compact.go:495 component=tsdb msg="write block" mint=1586642400000 maxt=1586649600000 ulid=01E5NYCZW61TEPEC61KE2FPA0S duration=300.947362ms
level=info ts=2020-04-12T01:00:00.656Z caller=head.go:662 component=tsdb msg="head GC completed" duration=13.484397ms
level=info ts=2020-04-12T01:00:01.668Z caller=head.go:732 component=tsdb msg="WAL checkpoint complete" first=28 last=29 duration=1.011910668s
level=info ts=2020-04-12T03:00:00.530Z caller=compact.go:495 component=tsdb msg="write block" mint=1586649600000 maxt=1586656800000 ulid=01E5P58Q48526X306P4TSC50HH duration=266.676365ms
level=info ts=2020-04-12T03:00:00.590Z caller=head.go:662 component=tsdb msg="head GC completed" duration=13.118641ms
level=info ts=2020-04-12T03:00:01.323Z caller=compact.go:441 component=tsdb msg="compact blocks" count=3 mint=1586628000000 maxt=1586649600000 ulid=01E5P58QFZ3MXAHXKN8HBKEXW8 sources="[01E5NGNHD9ZSCJ1GH2WG9BW4S1 01E5NQH8M6XHTGWTSFJ6FK6RGW 01E5NYCZW61TEPEC61KE2FPA0S]" duration=683.315282ms
level=info ts=2020-04-12T05:00:00.559Z caller=compact.go:495 component=tsdb msg="write block" mint=1586656800000 maxt=1586664000000 ulid=01E5PC4ECMF9SQ6AD56XNW8CEN duration=282.801305ms
level=info ts=2020-04-12T05:00:00.646Z caller=head.go:662 component=tsdb msg="head GC completed" duration=17.083725ms
level=info ts=2020-04-12T05:00:01.758Z caller=head.go:732 component=tsdb msg="WAL checkpoint complete" first=30 last=31 duration=1.112168292s
level=info ts=2020-04-12T07:00:00.521Z caller=compact.go:495 component=tsdb msg="write block" mint=1586664000000 maxt=1586671200000 ulid=01E5PK05MFDVJFNRFEH8TENJMF duration=250.599261ms
level=info ts=2020-04-12T07:00:00.630Z caller=head.go:662 component=tsdb msg="head GC completed" duration=14.933374ms
level=info ts=2020-04-12T09:00:00.538Z caller=compact.go:495 component=tsdb msg="write block" mint=1586671200000 maxt=1586678400000 ulid=01E5PSVWWV2H0CGFGTTYPHC3Q5 duration=255.01679ms
level=info ts=2020-04-12T09:00:00.600Z caller=head.go:662 component=tsdb msg="head GC completed" duration=13.479274ms
level=info ts=2020-04-12T09:00:01.569Z caller=head.go:732 component=tsdb msg="WAL checkpoint complete" first=32 last=33 duration=969.078953ms
level=info ts=2020-04-12T09:00:02.225Z caller=compact.go:441 component=tsdb msg="compact blocks" count=3 mint=1586649600000 maxt=1586671200000 ulid=01E5PSVY6X5V74Z536THZVYVVS sources="[01E5P58Q48526X306P4TSC50HH 01E5PC4ECMF9SQ6AD56XNW8CEN 01E5PK05MFDVJFNRFEH8TENJMF]" duration=596.843821ms
level=info ts=2020-04-12T11:00:00.589Z caller=compact.go:495 component=tsdb msg="write block" mint=1586678400000 maxt=1586685600000 ulid=01E5Q0QM598MC7XJKRY98DSX9D duration=291.822225ms
level=info ts=2020-04-12T11:00:00.670Z caller=head.go:662 component=tsdb msg="head GC completed" duration=14.153142ms
level=info ts=2020-04-12T13:00:00.554Z caller=compact.go:495 component=tsdb msg="write block" mint=1586685600000 maxt=1586692800000 ulid=01E5Q7KBCYPQJDTR3JG6AMNC61 duration=267.991397ms
level=info ts=2020-04-12T13:00:00.657Z caller=head.go:662 component=tsdb msg="head GC completed" duration=38.274303ms
level=info ts=2020-04-12T13:00:01.594Z caller=head.go:732 component=tsdb msg="WAL checkpoint complete" first=34 last=35 duration=937.719041ms
level=info ts=2020-04-12T15:00:00.528Z caller=compact.go:495 component=tsdb msg="write block" mint=1586692800000 maxt=1586700000000 ulid=01E5QEF2MD1A46DZVXZ1CD0V6Z duration=258.518766ms
level=info ts=2020-04-12T15:00:00.609Z caller=head.go:662 component=tsdb msg="head GC completed" duration=18.926233ms
level=info ts=2020-04-12T15:00:01.179Z caller=compact.go:441 component=tsdb msg="compact blocks" count=3 mint=1586671200000 maxt=1586692800000 ulid=01E5QEF31B1QGNPDDVM588K5QM sources="[01E5PSVWWV2H0CGFGTTYPHC3Q5 01E5Q0QM598MC7XJKRY98DSX9D 01E5Q7KBCYPQJDTR3JG6AMNC61]" duration=495.42025ms
level=info ts=2020-04-12T15:00:02.547Z caller=compact.go:441 component=tsdb msg="compact blocks" count=3 mint=1586628000000 maxt=1586692800000 ulid=01E5QEF3PP52S2Y0RAMSF8P3WF sources="[01E5P58QFZ3MXAHXKN8HBKEXW8 01E5PSVY6X5V74Z536THZVYVVS 01E5QEF31B1QGNPDDVM588K5QM]" duration=1.180887044s
level=info ts=2020-04-12T15:00:09.969Z caller=compact.go:441 component=tsdb msg="compact blocks" count=3 mint=1586498400000 maxt=1586692800000 ulid=01E5QEF50FGF4H3TBF08WX5RDK sources="[01E5KJW16EJYJ156FVQGXDWZW1 01E5NGNKHGTE4T1K71V5W5PRHQ 01E5QEF3PP52S2Y0RAMSF8P3WF]" duration=7.265406704s
level=info ts=2020-04-12T17:00:00.557Z caller=compact.go:495 component=tsdb msg="write block" mint=1586700000000 maxt=1586707200000 ulid=01E5QNASX3HGJDSM9NAAJKGE8A duration=265.955143ms
level=info ts=2020-04-12T17:00:00.642Z caller=head.go:662 component=tsdb msg="head GC completed" duration=12.238233ms
level=info ts=2020-04-12T17:00:02.291Z caller=head.go:732 component=tsdb msg="WAL checkpoint complete" first=36 last=37 duration=1.649002346s
level=info ts=2020-04-12T19:00:00.557Z caller=compact.go:495 component=tsdb msg="write block" mint=1586707200000 maxt=1586714400000 ulid=01E5QW6H4N35HXZRWPNP6MFCRT duration=279.712918ms
level=info ts=2020-04-12T19:00:00.623Z caller=head.go:662 component=tsdb msg="head GC completed" duration=20.789317ms
level=info ts=2020-04-12T21:00:00.541Z caller=compact.go:495 component=tsdb msg="write block" mint=1586714400000 maxt=1586721600000 ulid=01E5R328CYE288PJVBTMSFS055 duration=255.324412ms
level=info ts=2020-04-12T21:00:00.623Z caller=head.go:662 component=tsdb msg="head GC completed" duration=19.113249ms
level=info ts=2020-04-12T21:00:04.803Z caller=head.go:732 component=tsdb msg="WAL checkpoint complete" first=38 last=39 duration=4.179966922s
level=info ts=2020-04-12T21:00:05.284Z caller=compact.go:441 component=tsdb msg="compact blocks" count=3 mint=1586692800000 maxt=1586714400000 ulid=01E5R32CTZV1PRX26ZFMAF1Q4W sources="[01E5QEF2MD1A46DZVXZ1CD0V6Z 01E5QNASX3HGJDSM9NAAJKGE8A 01E5QW6H4N35HXZRWPNP6MFCRT]" duration=453.332068ms
level=info ts=2020-04-12T23:00:00.664Z caller=compact.go:495 component=tsdb msg="write block" mint=1586721600000 maxt=1586728800000 ulid=01E5R9XZMW7KYE40Y806ENZJWT duration=379.646866ms
level=info ts=2020-04-12T23:00:00.771Z caller=head.go:662 component=tsdb msg="head GC completed" duration=14.007677ms
level=info ts=2020-04-13T01:00:00.532Z caller=compact.go:495 component=tsdb msg="write block" mint=1586728800000 maxt=1586736000000 ulid=01E5RGSPWTNW6BRCPBPP10QH57 duration=250.2027ms
level=info ts=2020-04-13T01:00:00.589Z caller=head.go:662 component=tsdb msg="head GC completed" duration=13.116897ms
level=info ts=2020-04-13T01:00:01.629Z caller=head.go:732 component=tsdb msg="WAL checkpoint complete" first=40 last=41 duration=1.039923768s
level=info ts=2020-04-13T03:00:01.077Z caller=compact.go:495 component=tsdb msg="write block" mint=1586736000000 maxt=1586743200000 ulid=01E5RQNE4X01Q9A37J2H47MJB9 duration=718.468703ms
level=info ts=2020-04-13T03:00:01.383Z caller=head.go:662 component=tsdb msg="head GC completed" duration=16.516799ms
level=info ts=2020-04-13T03:00:03.509Z caller=compact.go:441 component=tsdb msg="compact blocks" count=3 mint=1586714400000 maxt=1586736000000 ulid=01E5RQNF8QPG83BS60V5VZFZ4D sources="[01E5R328CYE288PJVBTMSFS055 01E5R9XZMW7KYE40Y806ENZJWT 01E5RGSPWTNW6BRCPBPP10QH57]" duration=2.077578659s
level=info ts=2020-04-13T05:00:00.562Z caller=compact.go:495 component=tsdb msg="write block" mint=1586743200000 maxt=1586750400000 ulid=01E5RYH5CREK5M9QY9QR51RDPW duration=281.488602ms
level=info ts=2020-04-13T05:00:00.624Z caller=head.go:662 component=tsdb msg="head GC completed" duration=12.129385ms
level=info ts=2020-04-13T05:00:06.492Z caller=head.go:732 component=tsdb msg="WAL checkpoint complete" first=42 last=43 duration=5.867861541s
level=info ts=2020-04-13T07:00:00.533Z caller=compact.go:495 component=tsdb msg="write block" mint=1586750400000 maxt=1586757600000 ulid=01E5S5CWMMA41JSMTQ2XE1G8Y5 duration=257.191143ms
level=info ts=2020-04-13T07:00:00.592Z caller=head.go:662 component=tsdb msg="head GC completed" duration=12.999914ms
level=info ts=2020-04-13T09:00:00.670Z caller=compact.go:495 component=tsdb msg="write block" mint=1586757600000 maxt=1586764800000 ulid=01E5SC8KWY2PPH9RQ2JQAVHSTD duration=384.003132ms
level=info ts=2020-04-13T09:00:00.763Z caller=head.go:662 component=tsdb msg="head GC completed" duration=12.408818ms
level=info ts=2020-04-13T09:00:11.203Z caller=head.go:732 component=tsdb msg="WAL checkpoint complete" first=44 last=45 duration=10.440015486s
level=info ts=2020-04-13T09:00:13.775Z caller=compact.go:441 component=tsdb msg="compact blocks" count=3 mint=1586736000000 maxt=1586757600000 ulid=01E5SC8YKHGA4NA1D05EN0AVVC sources="[01E5RQNE4X01Q9A37J2H47MJB9 01E5RYH5CREK5M9QY9QR51RDPW 01E5S5CWMMA41JSMTQ2XE1G8Y5]" duration=2.526729037s
level=info ts=2020-04-13T09:00:18.918Z caller=compact.go:441 component=tsdb msg="compact blocks" count=3 mint=1586692800000 maxt=1586757600000 ulid=01E5SC91DVMYZJ4Y0YXMDNTDNH sources="[01E5R32CTZV1PRX26ZFMAF1Q4W 01E5RQNF8QPG83BS60V5VZFZ4D 01E5SC8YKHGA4NA1D05EN0AVVC]" duration=4.778432448s
level=info ts=2020-04-13T11:00:01.150Z caller=compact.go:495 component=tsdb msg="write block" mint=1586764800000 maxt=1586772000000 ulid=01E5SK4B4K75WSNR27ZWYXFGCY duration=874.910078ms
level=info ts=2020-04-13T11:00:01.207Z caller=head.go:662 component=tsdb msg="head GC completed" duration=13.962196ms
level=info ts=2020-04-13T13:00:00.539Z caller=compact.go:495 component=tsdb msg="write block" mint=1586772000000 maxt=1586779200000 ulid=01E5ST02CS8QQZAS28REPSGF2M duration=258.113641ms
level=info ts=2020-04-13T13:00:00.600Z caller=head.go:662 component=tsdb msg="head GC completed" duration=12.940461ms
level=info ts=2020-04-13T13:00:07.950Z caller=head.go:732 component=tsdb msg="WAL checkpoint complete" first=46 last=47 duration=7.349600168s
level=info ts=2020-04-13T15:00:00.690Z caller=compact.go:495 component=tsdb msg="write block" mint=1586779200000 maxt=1586786400000 ulid=01E5T0VSNR6BKS8V6DPTAAYK5J duration=377.901866ms
level=info ts=2020-04-13T15:00:00.762Z caller=head.go:662 component=tsdb msg="head GC completed" duration=12.070248ms
level=info ts=2020-04-13T15:00:02.811Z caller=compact.go:441 component=tsdb msg="compact blocks" count=3 mint=1586757600000 maxt=1586779200000 ulid=01E5T0VT4W0RQBEGX3FS9J808V sources="[01E5SC8KWY2PPH9RQ2JQAVHSTD 01E5SK4B4K75WSNR27ZWYXFGCY 01E5ST02CS8QQZAS28REPSGF2M]" duration=1.973302844s
level=info ts=2020-04-13T17:00:00.983Z caller=compact.go:495 component=tsdb msg="write block" mint=1586786400000 maxt=1586793600000 ulid=01E5T7QGWRPZRBDW6GYYD05FAS duration=674.875345ms
level=info ts=2020-04-13T17:00:01.535Z caller=head.go:662 component=tsdb msg="head GC completed" duration=13.795139ms
level=info ts=2020-04-13T17:00:08.426Z caller=head.go:732 component=tsdb msg="WAL checkpoint complete" first=48 last=49 duration=6.89093912s
level=info ts=2020-04-13T19:00:00.509Z caller=compact.go:495 component=tsdb msg="write block" mint=1586793600000 maxt=1586800800000 ulid=01E5TEK84J9NCHWEZ9VRG08A3T duration=235.257692ms
level=info ts=2020-04-13T19:00:00.579Z caller=head.go:662 component=tsdb msg="head GC completed" duration=12.68045ms
level=info ts=2020-04-13T21:00:00.601Z caller=compact.go:495 component=tsdb msg="write block" mint=1586800800000 maxt=1586808000000 ulid=01E5TNEZCCZDBTZXYTJV92M68P duration=332.537807ms
level=info ts=2020-04-13T21:00:00.708Z caller=head.go:662 component=tsdb msg="head GC completed" duration=12.395655ms
level=info ts=2020-04-13T21:00:08.960Z caller=head.go:732 component=tsdb msg="WAL checkpoint complete" first=50 last=51 duration=8.251654282s
level=info ts=2020-04-13T21:00:10.378Z caller=compact.go:441 component=tsdb msg="compact blocks" count=3 mint=1586779200000 maxt=1586800800000 ulid=01E5TNF80SX7CJJQ1V5BGEJYPA sources="[01E5T0VSNR6BKS8V6DPTAAYK5J 01E5T7QGWRPZRBDW6GYYD05FAS 01E5TEK84J9NCHWEZ9VRG08A3T]" duration=1.346677985s
level=info ts=2020-04-13T23:00:00.570Z caller=compact.go:495 component=tsdb msg="write block" mint=1586808000000 maxt=1586815200000 ulid=01E5TWAPMQC0H35EPXA3E8ZCK6 duration=263.520418ms
level=info ts=2020-04-13T23:00:00.635Z caller=head.go:662 component=tsdb msg="head GC completed" duration=12.705482ms
level=info ts=2020-04-14T01:00:00.555Z caller=compact.go:495 component=tsdb msg="write block" mint=1586815200000 maxt=1586822400000 ulid=01E5V36DWE1CZPWSX3SCK21YHS duration=284.500905ms
level=info ts=2020-04-14T01:00:00.779Z caller=head.go:662 component=tsdb msg="head GC completed" duration=11.965455ms
level=info ts=2020-04-14T01:00:07.997Z caller=head.go:732 component=tsdb msg="WAL checkpoint complete" first=52 last=53 duration=7.218466282s
level=info ts=2020-04-14T03:00:00.541Z caller=compact.go:495 component=tsdb msg="write block" mint=1586822400000 maxt=1586829600000 ulid=01E5VA254QJ9SB7MQY6DZG1HDD duration=262.050415ms
level=info ts=2020-04-14T03:00:00.600Z caller=head.go:662 component=tsdb msg="head GC completed" duration=14.400483ms
level=info ts=2020-04-14T03:00:03.386Z caller=compact.go:441 component=tsdb msg="compact blocks" count=3 mint=1586800800000 maxt=1586822400000 ulid=01E5VA25HEZD7G1Z1WV8J453NJ sources="[01E5TNEZCCZDBTZXYTJV92M68P 01E5TWAPMQC0H35EPXA3E8ZCK6 01E5V36DWE1CZPWSX3SCK21YHS]" duration=2.68210276s
level=info ts=2020-04-14T03:00:09.152Z caller=compact.go:441 component=tsdb msg="compact blocks" count=3 mint=1586757600000 maxt=1586822400000 ulid=01E5VA28AQW4G557DPMZH6PBJS sources="[01E5T0VT4W0RQBEGX3FS9J808V 01E5TNF80SX7CJJQ1V5BGEJYPA 01E5VA25HEZD7G1Z1WV8J453NJ]" duration=5.608522712s
level=info ts=2020-04-14T05:00:00.574Z caller=compact.go:495 component=tsdb msg="write block" mint=1586829600000 maxt=1586836800000 ulid=01E5VGXWD4JZZJDT6VWC4H4Q4K duration=266.825995ms
level=info ts=2020-04-14T05:00:00.664Z caller=head.go:662 component=tsdb msg="head GC completed" duration=21.997169ms
level=info ts=2020-04-14T05:00:08.329Z caller=head.go:732 component=tsdb msg="WAL checkpoint complete" first=54 last=55 duration=7.660927137s
level=info ts=2020-04-14T07:00:00.685Z caller=compact.go:495 component=tsdb msg="write block" mint=1586836800000 maxt=1586844000000 ulid=01E5VQSKN6PH6M54891CT7YW8D duration=390.620325ms
level=info ts=2020-04-14T07:00:00.752Z caller=head.go:662 component=tsdb msg="head GC completed" duration=12.113668ms
level=info ts=2020-04-14T09:00:00.654Z caller=compact.go:495 component=tsdb msg="write block" mint=1586844000000 maxt=1586851200000 ulid=01E5VYNAW56CWSR3ZVMNXYJR57 duration=392.614807ms
level=info ts=2020-04-14T09:00:00.764Z caller=head.go:662 component=tsdb msg="head GC completed" duration=12.353077ms
level=info ts=2020-04-14T09:00:07.655Z caller=head.go:732 component=tsdb msg="WAL checkpoint complete" first=56 last=57 duration=6.881659425s
level=info ts=2020-04-14T09:00:10.171Z caller=compact.go:441 component=tsdb msg="compact blocks" count=3 mint=1586822400000 maxt=1586844000000 ulid=01E5VYNJ5AGAHKTF6S40SE2C8W sources="[01E5VA254QJ9SB7MQY6DZG1HDD 01E5VGXWD4JZZJDT6VWC4H4Q4K 01E5VQSKN6PH6M54891CT7YW8D]" duration=2.44318066s
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 49 (33 by maintainers)
After digging into the issue, I have found a way to reproduce this. First, with a unit test, while trying to investigate possible concurrency issue: #7134 , and then with a real-life scenario.
Both cases led to a low watermark of 1.
Real life reproducer:
Make sure that:
There is no data directory
Nothing listens to localhost:2020 (and therefore scrape will fail)
Launch prometheus:
What is happening
As head is empty, we use a initAppender:
https://github.com/prometheus/prometheus/blob/209d4bb8a1491f4535cc6d991681e7dc03bb1d56/tsdb/head.go#L852-L858
Which means that there is no appender until we run Add:
https://github.com/prometheus/prometheus/blob/209d4bb8a1491f4535cc6d991681e7dc03bb1d56/tsdb/head.go#L813-L815
As the scrape is failing, Add is NEVER called on that initAppender.
BUT, an appenderId was already created:
https://github.com/prometheus/prometheus/blob/209d4bb8a1491f4535cc6d991681e7dc03bb1d56/tsdb/head.go#L847
That appenderId is never cleared, because without Add, no Appender, and Commit and Rollback do nothing.
https://github.com/prometheus/prometheus/blob/209d4bb8a1491f4535cc6d991681e7dc03bb1d56/tsdb/head.go#L830-L832
And the low watermark is blocked at 1.
Is this what is going on here?
The wal replay logs provided here show that:
case 1: there is 1 segment, and replay takes 332.425µs case 2: there are 2 segments, and replay takes 450.026µs
That is quite low and might indicate that indeed, WAL were empty, therefore this race condition could have happened.
Prometheus v2.17.2 has been released with a fix.
And thank you for the detailed report & screenshots.
Thank you for your great work!
This actually would explain the pattern that we’re seeing as well. Newly created prometheuses or ones where the data is wiped on startup show this behaviour for us. I’m almost certain that the ones that have data directories do not.
prometheus_tsdb_isolation_low_watermark value is 1 here as well.
We have 2 alerting rules and 0 recording rules. Almost no people were sending a query to the instance.
It was 1 all the time. We restarted the instance since it got down due to OOM, and after that the low watermark started growing.