prometheus: Prometheus can't access to docker.sock

Sorry if this is not the right place, been spinning around, reading docs and how other people done it, nothing seems to work, IRC seems pretty dead and unresponsive. =/ hope someone can point me in the right direction.

What did you do?

Follow documentation about prometheus and docker swarm using: https://prometheus.io/blog/2015/06/01/advanced-service-discovery/ and https://prometheus.io/docs/guides/dockerswarm/ and https://github.com/prometheus/prometheus/blob/release-2.22/documentation/examples/prometheus-dockerswarm.yml

What did you expect to see?

Prometheus should be able to access docker.sock

What did you see instead? Under which circumstances?

level=error ts=2020-11-15T19:36:07.282Z caller=refresh.go:98 component="discovery manager scrape" discovery=dockerswarm msg="Unable to refresh target groups" err="error while listing swarm services: Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get \"http://%2Fvar%2Frun%2Fdocker.sock/v1.24/tasks\": dial unix /var/run/docker.sock: connect: permission denied"

Environment

CentOS, Docker swarm, using prom/prometheus:latest

  • System information:
# docker --version
Docker version 19.03.13, build 4484c46d9d

# uname -srm
Linux 4.18.0-193.28.1.el8_2.x86_64 x86_64
  • Prometheus version:
# docker exec -ti 636014888e15 /bin/sh
/prometheus $ prometheus --version
prometheus, version 2.22.1 (branch: HEAD, revision: 00f16d1ac3a4c94561e5133b821d8e4d9ef78ec2)
  build user:       root@516b109b1732
  build date:       20201105-14:02:25
  go version:       go1.15.3
  platform:         linux/amd64
  • Prometheus configuration file:
# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']

  - job_name: 'dockmaster'

    static_configs:
    - targets: ['redacted:9100']

  - job_name: 'vms'

    static_configs:
      - targets: ['redacted:9100', 'redacted:9100']

  - job_name: 'home-servers'

    static_configs:
      - targets: ['redacted:9100', 'redacted:9100']

  - job_name: 'pis'

    static_configs:
      - targets: ['redacted']

  - job_name: 'docker'
    dockerswarm_sd_configs:
      - host: unix:///var/run/docker.sock
        role: nodes
    relabel_configs:
      # Fetch metrics on port 9323.
      - source_labels: [__meta_dockerswarm_node_address]
        target_label: __address__
        replacement: $1:9323

  - job_name: 'dockerswarm'
    dockerswarm_sd_configs:
      - host: unix:///var/run/docker.sock
        role: tasks
    relabel_configs:
      # Only keep containers that should be running.
      - source_labels: [__meta_dockerswarm_task_desired_state]
        regex: running
        action: keep

  • Prometheus Docker compose file:
version: '3.3'

  private:
    image: prom/prometheus
    networks:
      - dockadmin_rp
      - private
    volumes:
      - /srv/data/prometheus/config:/etc/prometheus
      - /srv/data/prometheus/data:/prometheus
      - /var/run/docker.sock:/var/run/docker.sock:0444
    deploy:
      replicas: 1
      restart_policy:
        condition: on-failure
      placement:
        constraints:
          - node.labels.type == private 
      labels:
        traefik.enable: "true"
        traefik.docker.network: "dockadmin_rp"
        traefik.http.routers.pm02.service: "pm02"
        traefik.http.routers.pm02.tls: "true"
        traefik.http.routers.pm02.tls.certResolver: "default"
        traefik.http.routers.pm02.middlewares: "defaultsecheaders@file, pmauth"
        traefik.http.routers.pm02.rule: "Host(`pm02.mydomain.xyz`)"
        traefik.http.middlewares.pmauth.basicauth.usersfile: "/auth/prometheus"
        traefik.http.services.pm02.loadbalancer.server.port: 9090 

  cadvisor:
    image:  gcr.io/google-containers/cadvisor
    command: -docker_only
    networks:
      - private
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
    deploy:
      mode: global
      restart_policy:
        condition: on-failure


networks:
  private:
  dockadmin_rp:
    external: true

  • Logs:
level=error ts=2020-11-15T19:45:07.282Z caller=refresh.go:98 component="discovery manager scrape" discovery=dockerswarm msg="Unable to refresh target groups" err="error while listing swarm nodes: Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get \"http://%2Fvar%2Frun%2Fdocker.sock/v1.24/nodes\": dial unix /var/run/docker.sock: connect: permission denied"

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 1
  • Comments: 37 (13 by maintainers)

Most upvoted comments

I just leave this example for future references using prometheus with docker socat proxy:

version: "3.9"

networks:
  socat:
  traefik:
    name: "traefik-gateway-test_default"
    external: true

volumes:
  prometheus_data:

configs:
  prometheus-config:
    name: $PROM_CONFIG_NAME
    file: ./prometheus.yml

services:

  prometheus:
    image: prom/prometheus:v2.30.1
    networks:
      - traefik
      - socat
    configs:
      - source: prometheus-config
        target: /configs/prometheus.yml
        mode: 0444
    command:
      - "--config.file=/configs/prometheus.yml"
      - '--storage.tsdb.path=/prometheus'
      - '--web.console.libraries=/usr/share/prometheus/console_libraries'
      - '--web.console.templates=/usr/share/prometheus/consoles'
    volumes:
      - prometheus_data:/prometheus
    deploy:
      placement:
        constraints:
          - node.labels.prometheus==yes
      labels:
        - "traefik.enable=true"
        - "traefik.http.routers.prometheus.entrypoints=http"
        - "traefik.http.routers.prometheus.rule=Host(`example.com`)"
        - "traefik.http.services.prometheus.loadbalancer.server.port=9090"

  docker-api-socat:
    image: tecnativa/docker-socket-proxy:0.1
    networks:
      - socat
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    environment:
      NODES: 1
      NETWORKS: 1
      SERVICES: 1
      TASKS: 1
    logging:
      # Socat logs send to black hole (we don't need them)
      driver: none
    deploy:
      mode: global
      resources:
        reservations:
          memory: 5M
          cpus: '0.05'
        limits:
          memory: 10M
          cpus: '0.1'
      update_config:
        parallelism: 1
        order: start-first
        failure_action: rollback
      rollback_config:
        parallelism: 1
        order: start-first
      placement:
        constraints:
          - node.role == manager

prometheus.yml

scrape_configs:
  # Make Prometheus scrape itself for metrics.
  - job_name: 'prometheus'
    static_configs:
    - targets: ['localhost:9090']

  # Create a job for Docker daemons.
  - job_name: 'docker'
    dockerswarm_sd_configs:
      - host: tcp://tasks.docker-api-socat:2375
        role: nodes
    relabel_configs:
      # Fetch metrics on port 9323.
      - source_labels: [__meta_dockerswarm_node_address]
        target_label: __address__
        replacement: $1:9323
      # Set hostname as instance label
      - source_labels: [__meta_dockerswarm_node_hostname]
        target_label: instance

NOTE: When using docker swarm bind mounting the docker socket into container won’t provide Prometheus with access to it, please consider using a docker socket proxy, or read This thread for other alternatives.

IMO: This alone would be enough for people considering security to do their own research. I do not think there is a golden solution to this but with this note everyone can use his/her/its brain. In prod environments everyone has their own consideration to make.

This being said … What to add to the documentation?

NOTE: When using docker swarm bind mounting the docker socket into container won’t provide Prometheus with access to it, please consider using a docker socket proxy, or read This thread for other alternatives.

Any suggestions?

Guess at this point this thread and this https://groups.google.com/g/prometheus-users/c/EuEW0qRzXvg/m/0aqKh_ZABQAJ?pli=1 has enough data, and someone has to make a decision which is not either up to me or @Ruppsn, so leaving up to Prometheus maintainers to make a decision about it.

I will be more than glad to put up that PR once a decision has been taken.

ack, thanks for your input.

As I said we accept PR to improve the documentation.

will see if i have some time to go around it during weekend and get that PR up 😃

It depends what is a standard configuration. I works for me with the socket and I do not run Prometheus as root. Any improvement to the documentation is welcome.