prometheus: Prometheus can't access to docker.sock
Sorry if this is not the right place, been spinning around, reading docs and how other people done it, nothing seems to work, IRC seems pretty dead and unresponsive. =/ hope someone can point me in the right direction.
What did you do?
Follow documentation about prometheus and docker swarm using: https://prometheus.io/blog/2015/06/01/advanced-service-discovery/ and https://prometheus.io/docs/guides/dockerswarm/ and https://github.com/prometheus/prometheus/blob/release-2.22/documentation/examples/prometheus-dockerswarm.yml
What did you expect to see?
Prometheus should be able to access docker.sock
What did you see instead? Under which circumstances?
level=error ts=2020-11-15T19:36:07.282Z caller=refresh.go:98 component="discovery manager scrape" discovery=dockerswarm msg="Unable to refresh target groups" err="error while listing swarm services: Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get \"http://%2Fvar%2Frun%2Fdocker.sock/v1.24/tasks\": dial unix /var/run/docker.sock: connect: permission denied"
Environment
CentOS, Docker swarm, using prom/prometheus:latest
- System information:
# docker --version
Docker version 19.03.13, build 4484c46d9d
# uname -srm
Linux 4.18.0-193.28.1.el8_2.x86_64 x86_64
- Prometheus version:
# docker exec -ti 636014888e15 /bin/sh
/prometheus $ prometheus --version
prometheus, version 2.22.1 (branch: HEAD, revision: 00f16d1ac3a4c94561e5133b821d8e4d9ef78ec2)
build user: root@516b109b1732
build date: 20201105-14:02:25
go version: go1.15.3
platform: linux/amd64
- Prometheus configuration file:
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:9090']
- job_name: 'dockmaster'
static_configs:
- targets: ['redacted:9100']
- job_name: 'vms'
static_configs:
- targets: ['redacted:9100', 'redacted:9100']
- job_name: 'home-servers'
static_configs:
- targets: ['redacted:9100', 'redacted:9100']
- job_name: 'pis'
static_configs:
- targets: ['redacted']
- job_name: 'docker'
dockerswarm_sd_configs:
- host: unix:///var/run/docker.sock
role: nodes
relabel_configs:
# Fetch metrics on port 9323.
- source_labels: [__meta_dockerswarm_node_address]
target_label: __address__
replacement: $1:9323
- job_name: 'dockerswarm'
dockerswarm_sd_configs:
- host: unix:///var/run/docker.sock
role: tasks
relabel_configs:
# Only keep containers that should be running.
- source_labels: [__meta_dockerswarm_task_desired_state]
regex: running
action: keep
- Prometheus Docker compose file:
version: '3.3'
private:
image: prom/prometheus
networks:
- dockadmin_rp
- private
volumes:
- /srv/data/prometheus/config:/etc/prometheus
- /srv/data/prometheus/data:/prometheus
- /var/run/docker.sock:/var/run/docker.sock:0444
deploy:
replicas: 1
restart_policy:
condition: on-failure
placement:
constraints:
- node.labels.type == private
labels:
traefik.enable: "true"
traefik.docker.network: "dockadmin_rp"
traefik.http.routers.pm02.service: "pm02"
traefik.http.routers.pm02.tls: "true"
traefik.http.routers.pm02.tls.certResolver: "default"
traefik.http.routers.pm02.middlewares: "defaultsecheaders@file, pmauth"
traefik.http.routers.pm02.rule: "Host(`pm02.mydomain.xyz`)"
traefik.http.middlewares.pmauth.basicauth.usersfile: "/auth/prometheus"
traefik.http.services.pm02.loadbalancer.server.port: 9090
cadvisor:
image: gcr.io/google-containers/cadvisor
command: -docker_only
networks:
- private
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
deploy:
mode: global
restart_policy:
condition: on-failure
networks:
private:
dockadmin_rp:
external: true
- Logs:
level=error ts=2020-11-15T19:45:07.282Z caller=refresh.go:98 component="discovery manager scrape" discovery=dockerswarm msg="Unable to refresh target groups" err="error while listing swarm nodes: Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get \"http://%2Fvar%2Frun%2Fdocker.sock/v1.24/nodes\": dial unix /var/run/docker.sock: connect: permission denied"
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 1
- Comments: 37 (13 by maintainers)
I just leave this example for future references using prometheus with docker socat proxy:
prometheus.yml
IMO: This alone would be enough for people considering security to do their own research. I do not think there is a golden solution to this but with this note everyone can use his/her/its brain. In prod environments everyone has their own consideration to make.
This being said … What to add to the documentation?
Any suggestions?
Guess at this point this thread and this https://groups.google.com/g/prometheus-users/c/EuEW0qRzXvg/m/0aqKh_ZABQAJ?pli=1 has enough data, and someone has to make a decision which is not either up to me or @Ruppsn, so leaving up to Prometheus maintainers to make a decision about it.
I will be more than glad to put up that PR once a decision has been taken.
ack, thanks for your input.
will see if i have some time to go around it during weekend and get that PR up 😃
It depends what is a standard configuration. I works for me with the socket and I do not run Prometheus as root. Any improvement to the documentation is welcome.