prometheus: Prometheus raise out of bounds error for all targets after resume the linux system from a suspend

What did you do?

After suspend the system and resume again, prometheus report following error and can not scrape any new metrics, unless restart the Prometheus service.

What did you expect to see?

promtheus should continue to scrape new metrics.

What did you see instead? Under which circumstances?

check the log above. and in the webui, i got following

image

Environment

  • System information:
Linux t470p 5.4.79-1-lts #1 SMP Sun, 22 Nov 2020 14:22:21 +0000 x86_64 GNU/Linux
  • Prometheus version:
prometheus, version 2.22.2 (branch: tarball, revision: 2.22.2)
  build user:       someone@builder
  build date:       20201117-18:44:08
  go version:       go1.15.5
  platform:         linux/amd64
# prometheus.service

# /usr/lib/systemd/system/prometheus.service
[Unit]
Description=Prometheus service
Requires=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Restart=on-failure
WorkingDirectory=/usr/share/prometheus
EnvironmentFile=-/etc/conf.d/prometheus
ExecStart=/usr/bin/prometheus --config.file=/etc/prometheus/prometheus.yml --storage.tsdb.path=/var/lib/prometheus/dat>
ExecReload=/bin/kill -HUP $MAINPID
LimitNOFILE=65535
NoNewPrivileges=true
ProtectHome=true
ProtectSystem=full
ProtectHostname=true
ProtectControlGroups=true
ProtectKernelModules=true
ProtectKernelTunables=true
LockPersonality=true
RestrictRealtime=yes
RestrictNamespaces=yes
MemoryDenyWriteExecute=yes
PrivateDevices=yes
CapabilityBoundingSet=

[Install]
WantedBy=multi-user.target

# /etc/systemd/system/prometheus.service.d/prometheus.conf
[Service]
ExecStart=
ExecStart=/usr/bin/prometheus --config.file=/etc/prometheus/prometheus.yml --storage.tsdb.path=/home/prometheus $PROME>
ProtectHome=False
  • Prometheus configuration file:
---
global:
  scrape_interval: 300s
  evaluation_interval: 10s

alerting:
  alertmanagers:
    - static_configs:
        - targets:
            - localhost:9093
scrape_configs:
  - job_name: 'node_exporter'
    static_configs:
      - targets:
          - localhost:9100
        labels:
          uid: t470
  • Logs:
Dec 01 09:46:29 t470p prometheus[1629652]: level=info ts=2020-12-01T01:46:29.714Z caller=head.go:889 component=tsdb msg="WAL checkpoint complete" first=1668 last=1669 duration=32.511167ms
Dec 01 09:46:29 t470p prometheus[1629652]: level=info ts=2020-12-01T01:46:29.763Z caller=head.go:809 component=tsdb msg="Head GC completed" duration=739.34µs
Dec 01 09:46:29 t470p prometheus[1629652]: level=info ts=2020-12-01T01:46:29.812Z caller=head.go:809 component=tsdb msg="Head GC completed" duration=802.06µs
Dec 01 09:46:29 t470p prometheus[1629652]: level=info ts=2020-12-01T01:46:29.812Z caller=checkpoint.go:96 component=tsdb msg="Creating checkpoint" from_segment=1670 to_segment=1671 mint=1606780800000
Dec 01 09:46:29 t470p prometheus[1629652]: level=info ts=2020-12-01T01:46:29.822Z caller=head.go:889 component=tsdb msg="WAL checkpoint complete" first=1670 last=1671 duration=10.152588ms
Dec 01 09:46:47 t470p prometheus[1629652]: level=warn ts=2020-12-01T01:46:47.293Z caller=scrape.go:1378 component="scrape manager" scrape_pool=ssh target="http://127.0.0.1:9115/probe?module=ssh_banner&target=172.20.149.141%3A22" msg="Error on ingesting samples that are too old or are too far into the future" num_dropped=6
Dec 01 09:46:47 t470p prometheus[1629652]: level=warn ts=2020-12-01T01:46:47.293Z caller=scrape.go:1145 component="scrape manager" scrape_pool=ssh target="http://127.0.0.1:9115/probe?module=ssh_banner&target=xxxx%3A22" msg="Append failed" err="out of bounds"
Dec 01 09:46:47 t470p prometheus[1629652]: level=warn ts=2020-12-01T01:46:47.293Z caller=scrape.go:1094 component="scrape manager" scrape_pool=ssh target="http://127.0.0.1:9115/probe?module=ssh_banner&target=xxx%3A22" msg="Appending scrape report failed" err="out of bounds"
Dec 01 09:46:56 t470p prometheus[1629652]: level=warn ts=2020-12-01T01:46:56.162Z caller=scrape.go:1378 component="scrape manager" scrape_pool=blackbox target="http://127.0.0.1:9115/probe?module=http_2xx&target=http%3A%2F%2Fxxx" msg="Error on ingesting samples that are too old or are too far into the future" num_dropped=17
Dec 01 09:46:56 t470p prometheus[1629652]: level=warn ts=2020-12-01T01:46:56.162Z caller=scrape.go:1145 component="scrape manager" scrape_pool=blackbox target="http://127.0.0.1:9115/probe?module=http_2xx&target=http%3A%2F%2Fwww.baidu.com" msg="Append failed" err="out of bounds"
Dec 01 09:46:56 t470p prometheus[1629652]: level=warn ts=2020-12-01T01:46:56.162Z caller=scrape.go:1094 component="scrape manager" scrape_pool=blackbox target="http://127.0.0.1:9115/probe?module=http_2xx&target=http%3A%2F%2Fwww.baidu.com" msg="Appending scrape report failed" err="out of bounds"
Dec 01 09:47:01 t470p prometheus[1629652]: level=warn ts=2020-12-01T01:47:01.836Z caller=scrape.go:1378 component="scrape manager" scrape_pool=gitea target=http://localhost:10080/metrics msg="Error on ingesting samples that are too old or are too far into the future" num_dropped=67
Dec 01 09:47:01 t470p prometheus[1629652]: level=warn ts=2020-12-01T01:47:01.836Z caller=scrape.go:1145 component="scrape manager" scrape_pool=gitea target=http://localhost:10080/metrics msg="Append failed" err="out of bounds"

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 5
  • Comments: 17 (9 by maintainers)

Most upvoted comments

FWIW, same issue on macOS: I’m experimenting with Prometheus in a container on my mac, so it goes to sleep at night, and when it wakes up, I get this error too. What’s even more problematic is that even when stopping and restarting the container, the issue resumes: some information seems to be corrupted in storage, remaining across restarts.

I have changed my mind and released v2.25.2 with this fix. Thanks all!

I am having the exact same issue:

  • macOS 1.11.
  • Docker Desktop 3.1
  • Docker-Compose

The hosts localtime is shared with all containers as read-only ro volume: /private/etc/localtime:/etc/localtime:ro.

Below are the docker logs from the container:

  monitor     | level=info ts=2021-02-04T17:32:39.671Z caller=main.go:326 msg="No time or size retention was set so using the default time retention" duration=15d
  monitor     | level=info ts=2021-02-04T17:32:39.672Z caller=main.go:364 msg="Starting Prometheus" version="(version=2.24.1, branch=HEAD, revision=e4487274853c587717006eeda8804e597d120340)"
  monitor     | level=info ts=2021-02-04T17:32:39.672Z caller=main.go:369 build_context="(go=go1.15.6, user=root@0b5231a0de0f, date=20210120-00:09:36)"
  monitor     | level=info ts=2021-02-04T17:32:39.672Z caller=main.go:370 host_details="(Linux 4.19.121-linuxkit #1 SMP Tue Dec 1 17:50:32 UTC 2020 x86_64 monitor (none))"
  monitor     | level=info ts=2021-02-04T17:32:39.672Z caller=main.go:371 fd_limits="(soft=1048576, hard=1048576)"
  monitor     | level=info ts=2021-02-04T17:32:39.672Z caller=main.go:372 vm_limits="(soft=unlimited, hard=unlimited)"
  monitor     | level=info ts=2021-02-04T17:32:39.695Z caller=web.go:530 component=web msg="Start listening for connections" address=0.0.0.0:9090
  monitor     | level=info ts=2021-02-04T17:32:39.720Z caller=main.go:738 msg="Starting TSDB ..."
  monitor     | level=info ts=2021-02-04T17:32:39.729Z caller=tls_config.go:227 component=web msg="TLS is disabled." http2=false
  monitor     | level=info ts=2021-02-04T17:32:39.778Z caller=head.go:645 component=tsdb msg="Replaying on-disk memory mappable chunks if any"
  monitor     | level=info ts=2021-02-04T17:32:39.779Z caller=head.go:659 component=tsdb msg="On-disk memory mappable chunks replay completed" duration=1.1577ms
  monitor     | level=info ts=2021-02-04T17:32:39.779Z caller=head.go:665 component=tsdb msg="Replaying WAL, this may take a while"
  monitor     | level=info ts=2021-02-04T17:32:39.784Z caller=head.go:717 component=tsdb msg="WAL segment loaded" segment=0 maxSegment=0
  monitor     | level=info ts=2021-02-04T17:32:39.784Z caller=head.go:722 component=tsdb msg="WAL replay completed" checkpoint_replay_duration=163.1µs wal_replay_duration=4.9535ms total_replay_duration=6.377ms
  monitor     | level=info ts=2021-02-04T17:32:39.792Z caller=main.go:758 fs_type=EXT4_SUPER_MAGIC
  monitor     | level=info ts=2021-02-04T17:32:39.793Z caller=main.go:761 msg="TSDB started"
  monitor     | level=info ts=2021-02-04T17:32:39.793Z caller=main.go:887 msg="Loading configuration file" filename=/etc/prometheus/root.yml
  monitor     | level=info ts=2021-02-04T17:32:39.837Z caller=main.go:918 msg="Completed loading of configuration file" filename=/etc/prometheus/root.yml totalDuration=44.0719ms remote_storage=29.8µs web_handler=8.5µs query_engine=9.4µs scrape=2.7028ms scrape_sd=13.3593ms notify=11.4µs notify_sd=16µs rules=16.9668ms
  monitor     | level=info ts=2021-02-04T17:32:39.837Z caller=main.go:710 msg="Server is ready to receive web requests."
  monitor     | level=info ts=2021-02-04T21:00:58.179Z caller=compact.go:507 component=tsdb msg="write block" mint=1612459968023 maxt=1612461600000 ulid=01EXQDJ6T6K3D9YMVM709VS4ZZ duration=125.4913ms
  monitor     | level=info ts=2021-02-04T21:00:58.255Z caller=head.go:824 component=tsdb msg="Head GC completed" duration=75.2094ms
  monitor     | level=info ts=2021-02-04T21:00:58.558Z caller=compact.go:507 component=tsdb msg="write block" mint=1612461603023 maxt=1612468800000 ulid=01EXQDJ70MZ8XHJF6EB8YRJTN8 duration=297.3878ms
  monitor     | level=info ts=2021-02-04T21:00:58.560Z caller=head.go:824 component=tsdb msg="Head GC completed" duration=1.3618ms
  monitor     | level=warn ts=2021-02-04T21:01:05.233Z caller=scrape.go:1094 component="scrape manager" scrape_pool=proxy-exporter target=http://exporter:9113/prometheus msg="Appending scrape report failed" err="out of bounds"
  monitor     | level=warn ts=2021-02-04T21:01:09.803Z caller=scrape.go:1094 component="scrape manager" scrape_pool=proxy target=http://example.test:80/stub_status msg="Appending scrape report failed" err="out of bounds"
  monitor     | level=warn ts=2021-02-04T21:01:09.899Z caller=scrape.go:1094 component="scrape manager" scrape_pool=monitor target=https://localhost:9090/metrics msg="Appending scrape report failed" err="out of bounds"
  monitor     | level=warn ts=2021-02-04T21:01:10.825Z caller=scrape.go:1094 component="scrape manager" scrape_pool=controller target=http://traefik.example.test:8082/metrics msg="Appending scrape report failed" err="out of bounds"
  monitor     | level=warn ts=2021-02-04T21:01:12.617Z caller=scrape.go:1094 component="scrape manager" scrape_pool=backend target=https://example.test:443/status%3Fjson msg="Appending scrape report failed" err="out of bounds"
  monitor     | level=warn ts=2021-02-04T21:01:12.674Z caller=scrape.go:1094 component="scrape manager" scrape_pool=dashboard target=https://dashboard.example.test:443/api/health msg="Appending scrape report failed" err="out of bounds"
  monitor     | level=warn ts=2021-02-04T21:01:13.071Z caller=scrape.go:1094 component="scrape manager" scrape_pool=alerts target=https://alerts.example.test:443/-/healthy msg="Appending scrape report failed" err="out of bounds"
  monitor     | level=warn ts=2021-02-04T21:01:17.920Z caller=scrape.go:1094 component="scrape manager" scrape_pool=admin target=https://hi.example.test:443/health msg="Appending scrape report failed" err="out of bounds"
  monitor     | level=warn ts=2021-02-04T21:01:20.229Z caller=scrape.go:1094 component="scrape manager" scrape_pool=proxy-exporter target=http://exporter:9113/prometheus msg="Appending scrape report failed" err="out of bounds"
  monitor     | level=warn ts=2021-02-04T21:01:25.256Z caller=scrape.go:1094 component="scrape manager" scrape_pool=proxy target=http://bell.example:80/stub_status msg="Appending scrape report failed" err="out of bounds"
  monitor     | level=warn ts=2021-02-04T21:01:28.136Z caller=scrape.go:1094 component="scrape manager" scrape_pool=dashboard target=https://dashboard.example.test:443/api/health msg="Appending scrape report failed" err="out of bounds"
  monitor     | level=warn ts=2021-02-04T21:01:40.257Z caller=scrape.go:1094 component="scrape manager" scrape_pool=proxy target=http://example.test:80/stub_status msg="Appending scrape report failed" err="out of bounds"
  monitor     | level=warn ts=2021-02-04T21:01:40.352Z caller=scrape.go:1094 component="scrape manager" scrape_pool=monitor target=https://localhost:9090/metrics msg="Appending scrape report failed" err="out of bounds"
  monitor     | level=warn ts=2021-02-04T21:01:41.277Z caller=scrape.go:1094 component="scrape manager" scrape_pool=controller target=http://traefik.example.test:8082/metrics msg="Appending scrape report failed" err="out of bounds"

I’m hopeful the linked PR #8601 will fix this. (Note if you want to test it, you likely need to suspend for >10 minutes).

if you shift time backwards this is expected. There is no reasonable way we can deal with that I think.

This will be released in the Prometheus 2.26 series.

A workaround if you run an earlier version is to use the --no-scrape.adjust-timestamps flag.

I experience the same issue (on various environments). Looks like restart of prometheus helps, but it’s not clear how big the time jump must be to trigger the problem. @brian-brazil @roidelapluie Any suggestions on that? Asking because I plan to do an automatic restart of prometheus in case of a time change, so I need to know the time difference my script should react to. Maybe there is another way to detect the problem from a script?