prometheus: Ingestion stops, probably due to deadlocked series maintenance

hi folks,

i update prometheus form 0.16.2 to 0.17.0. i try to reuse the old prometheus configuration and the data. but i got the error in status page, i can’t get any samples.

my configuration is very simple

scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # Override the global default and scrape targets from this job every 5 seconds.
    scrape_interval: 5s
    scrape_timeout: 10s

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    target_groups:
      - targets: ['localhost:9090']

  - job_name: 'overwritten-default'
    scrape_interval: 5s
    scrape_timeout: 10s
    consul_sd_configs:
      - server: 'consul server'

    relabel_configs:
      - source_labels: ['__meta_consul_service_id']
        regex:         '(.*)'
        target_label:  'job'
        replacement:   '$1'
        action:        'replace'
      - source_labels: ['__meta_consul_service_address','__meta_consul_service_port']
        separator:     ';'
        regex:         '(.*);(.*)'
        target_label:  '__address__'
        replacement:   '$1:$2'
        action:        'replace'
      - source_labels: ['__meta_consul_service_id']
        regex:         '^prometheus_.*'
        action:        'keep'

there is not any useful debug log or hints. what am i lost?

thanks.

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 34 (16 by maintainers)

Most upvoted comments

In case it gets into that state again, a goroutine dump would be great. Then we could see which goroutine is deadlocked, if any. You get it with

curl http://your-prometheus-server:9090/debug/pprof/goroutine?debug=2

Another explanation would be if your server is stuck in writing a checkpoint file, e.g. because the underlying disk is very slow or blocked. (Perhaps that could happen on Amazon or other cloud providers if you are running out of IOps quota?)