prometheus: "Context deadline exceeded" for responsive target

What did you do?

Tried to run Prometheus server in a docker container and scrape metrics from a node_exporter instance running in a separate but linked container (see environment section.)

Used a basic config which scrapes the linked container, via its container name dns address, every 10 seconds.

Noticed that this very basic target was showing as DOWN in the Prometheus dashboard targets view, with the error: “context deadline exceeded”

I attached to the running Prometheus container and timed a manual scrape of the target using time and wget, this worked as expected:

$ docker exec prometheus time wget -qO- http://node_exporter:9100/metrics
# HELP go_gc_duration_seconds A summary of the GC invocation durations.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 0.000302033
go_gc_duration_seconds{quantile="0.25"} 0.0005395150000000001
... (other metrics) ...
real	0m 0.01s
user	0m 0.00s
sys	0m 0.00s

Created a separate target group with scrape_interval and scrape_timeout both increased to 30s but targeted still at the same node_exporter instance. Scrapes for this group are successful but appear to take a long time (Last Scrape reported as > 36s ago which I assume means the scrape took upwards of 6 seconds from Prometheus’ perspective.)

What did you expect to see?

Both target groups reporting as UP and scrapes completing quickly

What did you see instead? Under which circumstances?

DOWN “context deadline exceeded” for the 10s scrape group

Environment

Recreate my exact environment:

# Run monitoring (node_exporter + prometheus server)
docker run --name "node_exporter" -d prom/node-exporter

# Write config into file for mounting into prometheus container
mkdir -p /tmp/prometheus/
cat << EOF > /tmp/prometheus/prometheus.yml
---
global:
  scrape_interval: 10s
  scrape_timeout: 10s

scrape_configs:
  - job_name: 'defaults'
    static_configs:
      - targets: ['node_exporter:9100']
        labels:
          group: 'defaults'
          
  - job_name: 'increased_timeout'
    scrape_interval: 30s
    scrape_timeout: 30s
    static_configs:
      - targets: ['node_exporter:9100']
        labels:
          group: 'increased_timeout'
EOF

docker run --name "prometheus" \
  -d \
  -p 9090:9090 \
  --link node_exporter:node_exporter \
  -v /tmp/prometheus/prometheus.yml:/prometheus.yml \
  -v /tmp/prometheus/data:/data \
  prom/prometheus:v1.5.2 \
  -storage.local.path=/data \
  -config.file=/prometheus.yml

# Open http://localhost:9090 in your browser and look at the Targets section.

# Stop and remove containers:
docker stop node_exporter prometheus
docker rm node_exporter prometheus

Prometheus version:

$ docker run -ti prom/prometheus:v1.5.2 -version
prometheus, version 1.5.2 (branch: master, revision: bd1182d29f462c39544f94cc822830e1c64cf55b)
  build user:       root@1a01c5f68840
  build date:       20170210-16:23:28
  go version:       go1.7.5

Prometheus configuration file:

---
global:
  scrape_interval: 10s
  scrape_timeout: 10s

scrape_configs:
  - job_name: 'defaults'
    static_configs:
      - targets: ['node_exporter:9100']
        labels:
          group: 'defaults'
          
  - job_name: 'increased_timeout'
    scrape_interval: 30s
    scrape_timeout: 30s
    static_configs:
      - targets: ['node_exporter:9100']
        labels:
          group: 'increased_timeout'

Logs:

time="2017-03-01T11:53:25Z" level=info msg="Starting prometheus (version=1.5.2, branch=master, revision=bd1182d29f462c39544f94cc822830e1c64cf55b)" source="main.go:75" 
time="2017-03-01T11:53:25Z" level=info msg="Build context (go=go1.7.5, user=root@1a01c5f68840, date=20170210-16:23:28)" source="main.go:76" 
time="2017-03-01T11:53:25Z" level=info msg="Loading configuration file /prometheus.yml" source="main.go:248" 
time="2017-03-01T11:53:26Z" level=info msg="Loading series map and head chunks..." source="storage.go:373" 
time="2017-03-01T11:53:26Z" level=info msg="1434 series loaded." source="storage.go:378" 
time="2017-03-01T11:53:26Z" level=info msg="Listening on :9090" source="web.go:259" 
time="2017-03-01T11:53:26Z" level=info msg="Starting target manager..." source="targetmanager.go:61" 
time="2017-03-01T11:58:26Z" level=info msg="Checkpointing in-memory metrics and chunks..." source="persistence.go:612" 
time="2017-03-01T11:58:26Z" level=info msg="Done checkpointing in-memory metrics and chunks in 107.457463ms." source="persistence.go:639" 
time="2017-03-01T12:03:26Z" level=info msg="Checkpointing in-memory metrics and chunks..." source="persistence.go:612" 
time="2017-03-01T12:03:26Z" level=info msg="Done checkpointing in-memory metrics and chunks in 97.668263ms." source="persistence.go:639"
...

Nothing which seems to be relevant in the logs.

Hopefully I’m not doing anything obviously wrong!

Thanks

About this issue

Original URL
State: closed
Created 7 years ago
Comments: 32 (6 by maintainers)

Most upvoted comments

I had the same problem and i solved it opening tcp ports 9100 and 9101 on AWS security groups. I hope it can be helpful for you also.

RdL87 on Jan 29, 2020

Ran into a similar problem on Docker 17.03-17.05 ce

These versions of docker have a bug such that connecting the service/container to the host bridge network is harder than it should be.

Work-around:

$ docker network connect bridge <cid_of_prometheus>

prologic on Jul 18, 2017

I am experiencing the same issue.

CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS c4120abe7244 prometheus 0.03% 21.57MiB / 15.58GiB 0.14% 44.7kB / 128kB 393kB / 0B 21

My docker has 15GB of limit, adding more memory could solve the issue?

dcmspe on Sep 27, 2018

Happening in Centos 7.3 in AWS without using docker. This is a very basic setup as the OP reported, with nothing fancy going on, except that i am running everything native. Would be happy to provide any details that might help resolution.

meowtochondria on Mar 27, 2017

Hi, I’ve had the same problem, but noticed by docker environment is very slow and swapping quite a lot. After adding more RAM and restarting docker, the problem is gone for the moment.

roberteventival on Jan 31, 2018