prometheus: Prometheus can't access to docker.sock

Sorry if this is not the right place, been spinning around, reading docs and how other people done it, nothing seems to work, IRC seems pretty dead and unresponsive. =/ hope someone can point me in the right direction.

What did you do?

Follow documentation about prometheus and docker swarm using: https://prometheus.io/blog/2015/06/01/advanced-service-discovery/ and https://prometheus.io/docs/guides/dockerswarm/ and https://github.com/prometheus/prometheus/blob/release-2.22/documentation/examples/prometheus-dockerswarm.yml

What did you expect to see?

Prometheus should be able to access docker.sock

What did you see instead? Under which circumstances?

level=error ts=2020-11-15T19:36:07.282Z caller=refresh.go:98 component="discovery manager scrape" discovery=dockerswarm msg="Unable to refresh target groups" err="error while listing swarm services: Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get \"http://%2Fvar%2Frun%2Fdocker.sock/v1.24/tasks\": dial unix /var/run/docker.sock: connect: permission denied"

Environment

CentOS, Docker swarm, using prom/prometheus:latest

System information:

# docker --version
Docker version 19.03.13, build 4484c46d9d

# uname -srm
Linux 4.18.0-193.28.1.el8_2.x86_64 x86_64

Prometheus version:

# docker exec -ti 636014888e15 /bin/sh
/prometheus $ prometheus --version
prometheus, version 2.22.1 (branch: HEAD, revision: 00f16d1ac3a4c94561e5133b821d8e4d9ef78ec2)
  build user:       root@516b109b1732
  build date:       20201105-14:02:25
  go version:       go1.15.3
  platform:         linux/amd64

Prometheus configuration file:

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']

  - job_name: 'dockmaster'

    static_configs:
    - targets: ['redacted:9100']

  - job_name: 'vms'

    static_configs:
      - targets: ['redacted:9100', 'redacted:9100']

  - job_name: 'home-servers'

    static_configs:
      - targets: ['redacted:9100', 'redacted:9100']

  - job_name: 'pis'

    static_configs:
      - targets: ['redacted']

  - job_name: 'docker'
    dockerswarm_sd_configs:
      - host: unix:///var/run/docker.sock
        role: nodes
    relabel_configs:
      # Fetch metrics on port 9323.
      - source_labels: [__meta_dockerswarm_node_address]
        target_label: __address__
        replacement: $1:9323

  - job_name: 'dockerswarm'
    dockerswarm_sd_configs:
      - host: unix:///var/run/docker.sock
        role: tasks
    relabel_configs:
      # Only keep containers that should be running.
      - source_labels: [__meta_dockerswarm_task_desired_state]
        regex: running
        action: keep

Prometheus Docker compose file:

version: '3.3'

  private:
    image: prom/prometheus
    networks:
      - dockadmin_rp
      - private
    volumes:
      - /srv/data/prometheus/config:/etc/prometheus
      - /srv/data/prometheus/data:/prometheus
      - /var/run/docker.sock:/var/run/docker.sock:0444
    deploy:
      replicas: 1
      restart_policy:
        condition: on-failure
      placement:
        constraints:
          - node.labels.type == private 
      labels:
        traefik.enable: "true"
        traefik.docker.network: "dockadmin_rp"
        traefik.http.routers.pm02.service: "pm02"
        traefik.http.routers.pm02.tls: "true"
        traefik.http.routers.pm02.tls.certResolver: "default"
        traefik.http.routers.pm02.middlewares: "defaultsecheaders@file, pmauth"
        traefik.http.routers.pm02.rule: "Host(`pm02.mydomain.xyz`)"
        traefik.http.middlewares.pmauth.basicauth.usersfile: "/auth/prometheus"
        traefik.http.services.pm02.loadbalancer.server.port: 9090 

  cadvisor:
    image:  gcr.io/google-containers/cadvisor
    command: -docker_only
    networks:
      - private
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
    deploy:
      mode: global
      restart_policy:
        condition: on-failure


networks:
  private:
  dockadmin_rp:
    external: true

Logs:

level=error ts=2020-11-15T19:45:07.282Z caller=refresh.go:98 component="discovery manager scrape" discovery=dockerswarm msg="Unable to refresh target groups" err="error while listing swarm nodes: Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get \"http://%2Fvar%2Frun%2Fdocker.sock/v1.24/nodes\": dial unix /var/run/docker.sock: connect: permission denied"

About this issue

Original URL
State: closed
Created 4 years ago
Reactions: 1
Comments: 37 (13 by maintainers)

Most upvoted comments

I just leave this example for future references using prometheus with docker socat proxy:

version: "3.9"

networks:
  socat:
  traefik:
    name: "traefik-gateway-test_default"
    external: true

volumes:
  prometheus_data:

configs:
  prometheus-config:
    name: $PROM_CONFIG_NAME
    file: ./prometheus.yml

services:

  prometheus:
    image: prom/prometheus:v2.30.1
    networks:
      - traefik
      - socat
    configs:
      - source: prometheus-config
        target: /configs/prometheus.yml
        mode: 0444
    command:
      - "--config.file=/configs/prometheus.yml"
      - '--storage.tsdb.path=/prometheus'
      - '--web.console.libraries=/usr/share/prometheus/console_libraries'
      - '--web.console.templates=/usr/share/prometheus/consoles'
    volumes:
      - prometheus_data:/prometheus
    deploy:
      placement:
        constraints:
          - node.labels.prometheus==yes
      labels:
        - "traefik.enable=true"
        - "traefik.http.routers.prometheus.entrypoints=http"
        - "traefik.http.routers.prometheus.rule=Host(`example.com`)"
        - "traefik.http.services.prometheus.loadbalancer.server.port=9090"

  docker-api-socat:
    image: tecnativa/docker-socket-proxy:0.1
    networks:
      - socat
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    environment:
      NODES: 1
      NETWORKS: 1
      SERVICES: 1
      TASKS: 1
    logging:
      # Socat logs send to black hole (we don't need them)
      driver: none
    deploy:
      mode: global
      resources:
        reservations:
          memory: 5M
          cpus: '0.05'
        limits:
          memory: 10M
          cpus: '0.1'
      update_config:
        parallelism: 1
        order: start-first
        failure_action: rollback
      rollback_config:
        parallelism: 1
        order: start-first
      placement:
        constraints:
          - node.role == manager

prometheus.yml

scrape_configs:
  # Make Prometheus scrape itself for metrics.
  - job_name: 'prometheus'
    static_configs:
    - targets: ['localhost:9090']

  # Create a job for Docker daemons.
  - job_name: 'docker'
    dockerswarm_sd_configs:
      - host: tcp://tasks.docker-api-socat:2375
        role: nodes
    relabel_configs:
      # Fetch metrics on port 9323.
      - source_labels: [__meta_dockerswarm_node_address]
        target_label: __address__
        replacement: $1:9323
      # Set hostname as instance label
      - source_labels: [__meta_dockerswarm_node_hostname]
        target_label: instance

brunocascio on Nov 1, 2021

NOTE: When using docker swarm bind mounting the docker socket into container won’t provide Prometheus with access to it, please consider using a docker socket proxy, or read This thread for other alternatives.

IMO: This alone would be enough for people considering security to do their own research. I do not think there is a golden solution to this but with this note everyone can use his/her/its brain. In prod environments everyone has their own consideration to make.

Ruppsn on May 5, 2021

This being said … What to add to the documentation?

NOTE: When using docker swarm bind mounting the docker socket into container won’t provide Prometheus with access to it, please consider using a docker socket proxy, or read This thread for other alternatives.

Any suggestions?

Guess at this point this thread and this https://groups.google.com/g/prometheus-users/c/EuEW0qRzXvg/m/0aqKh_ZABQAJ?pli=1 has enough data, and someone has to make a decision which is not either up to me or @Ruppsn, so leaving up to Prometheus maintainers to make a decision about it.

I will be more than glad to put up that PR once a decision has been taken.

4s3ti on May 5, 2021

ack, thanks for your input.

roidelapluie on May 5, 2021

As I said we accept PR to improve the documentation.

will see if i have some time to go around it during weekend and get that PR up 😃

4s3ti on Apr 21, 2021

It depends what is a standard configuration. I works for me with the socket and I do not run Prometheus as root. Any improvement to the documentation is welcome.

roidelapluie on Apr 21, 2021