blackbox_exporter: ICMP probes fails continually after down and up of several target hosts, until manual restart of blackbox-exporter.

Host operating system: output of uname -a

Linux prometheus 4.4.0-134-generic #160-Ubuntu SMP Wed Aug 15 14:58:00 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

blackbox_exporter version: output of blackbox_exporter -version

blackbox_exporter, version 0.12.0 (branch: HEAD, revision: 4a22506cf0cf139d9b2f9cde099f0012d9fcabde) build user: root@634195974c8e build date: 20180227-11:50:29 go version: go1.10

What is the blackbox.yml module config.

modules:
  icmp:
    prober: icmp
    timeout: 2s
    icmp:
      preferred_ip_protocol: ip4

What is the prometheus.yml scrape config.

scrape_configs:
  - job_name: 'icmp-ping'
    metrics_path: /probe
    params:
      module: [icmp]
    scrape_interval: 5s
    scrape_timeout: 2s
    file_sd_configs:
      - files:
        - '/etc/prometheus/targets/ping-hosts.yml'
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 'prometheus.domain.zz:9115'

What logging output did you get from adding &debug=true to the probe URL?

Logs for the probe:
ts=2018-09-19T10:54:04.147594552Z caller=main.go:116 module=icmp target=probed-host.domain.zz level=info msg="Beginning probe" probe=icmp timeout_seconds=1.5
ts=2018-09-19T10:54:04.147696813Z caller=utils.go:42 module=icmp target=probed-host.domain.zz level=info msg="Resolving target address" preferred_ip_protocol=ip4
ts=2018-09-19T10:54:04.148567095Z caller=utils.go:65 module=icmp target=probed-host.domain.zz level=info msg="Resolved target address" ip=192.168.100.49
ts=2018-09-19T10:54:04.148667279Z caller=icmp.go:71 module=icmp target=probed-host.domain.zz level=info msg="Creating socket"
ts=2018-09-19T10:54:04.14885651Z caller=icmp.go:117 module=icmp target=probed-host.domain.zz level=info msg="Creating ICMP packet" seq=61478 id=7522
ts=2018-09-19T10:54:04.148950165Z caller=icmp.go:129 module=icmp target=probed-host.domain.zz level=info msg="Writing out packet"
ts=2018-09-19T10:54:04.149184806Z caller=icmp.go:157 module=icmp target=probed-host.domain.zz level=info msg="Waiting for reply packets"
ts=2018-09-19T10:54:05.647899261Z caller=icmp.go:162 module=icmp target=probed-host.domain.zz level=warn msg="Timeout reading from socket" err="read ip 0.0.0.0: raw-read ip4 0.0.0.0: i/o timeout"
ts=2018-09-19T10:54:05.648033921Z caller=main.go:129 module=icmp target=probed-host.domain.zz level=error msg="Probe failed" duration_seconds=1.5003776850000001



Metrics that would have been returned:
# HELP probe_dns_lookup_time_seconds Returns the time taken for probe dns lookup in seconds
# TYPE probe_dns_lookup_time_seconds gauge
probe_dns_lookup_time_seconds 0.000923902
# HELP probe_duration_seconds Returns how long the probe took to complete in seconds
# TYPE probe_duration_seconds gauge
probe_duration_seconds 1.5003776850000001
# HELP probe_ip_protocol Specifies whether probe ip protocol is IP4 or IP6
# TYPE probe_ip_protocol gauge
probe_ip_protocol 4
# HELP probe_success Displays whether or not the probe was a success
# TYPE probe_success gauge
probe_success 0



Module configuration:
prober: icmp
timeout: 2s
icmp:
  preferred_ip_protocol: ip4

What did you do that produced an error?

Restarting openvpn client on hypervisor host, which run virtual machine with prometheus and blackbox-exporter. Blackbox-exporter target file has around 70 entries, more than 50 behind that vpn connection.

What did you expect to see?

Some failed probes during vpn restart on hypervisor and then successfull probes again.

What did you see instead?

Probes was continually fails, for ten’s of minutes, just until i manually restarted blackbox-exporter.

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 21 (8 by maintainers)

Most upvoted comments

Update for us here in case it helps other people googling for it:

This was caused by the payload for the blackbox icmp probe being 36 bytes. When we increased it to 64 bytes our probes were successful (using the payload_size parameter)

Well, we don’t have too much control to fix Amazon networking, i think. I you compare packets emitted by standard linux ping utility and blackbox_exporter, you will surely see difference in id header field.

Question, whether ping utility is RFC compliant or no, remains open:)

I have a similar error, though it may be caused by something else. For me, the only way I can get the icmp probe to succeed is by trying it against a target of 127.0.0.1 or localhost.

Any other IP address, either within the local LAN or without seems to fail.

I’ve also added the capability with: sudo setcap cap_net_raw+ep /usr/local/bin/blackbox_exporter

And I’ve instructed systemd to run it as root.

[Unit]
Description=Blackbox Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=root
Group=root
Type=simple
ExecStart=/usr/local/bin/blackbox_exporter --config.file /etc/blackbox_exporter/blackbox.yml

[Install]
WantedBy=multi-user.target

uname -a

Linux nuc 4.13.0-1024-oem #27-Ubuntu SMP Fri Apr 13 08:27:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

blackbox_exporter version: output of blackbox_exporter -version

blackbox_exporter, version 0.12.0 (branch: HEAD, revision: 4a22506cf0cf139d9b2f9cde099f0012d9fcabde)
  build user:       root@634195974c8e
  build date:       20180227-11:50:29
  go version:       go1.10

What is the blackbox.yml module config.

modules:
    http_2xx:
        prober: http
        timeout: 5s
        http:
            valid_status_codes: []
            method: GET
    icmp_ipv4:
        prober: icmp
        timeout: 15s
        icmp:
            preferred_ip_protocol: "ip4"
            source_ip_address: "127.0.0.1"

What is the prometheus.yml scrape config.

global:
    scrape_interval: 15s
    evaluation_interval: 30s
    # scrape_timeout is set to the global default (10s).
    external_labels:
        monitor: nuc

rule_files:
    - "rules.d/*.rules"

scrape_configs:
    - job_name: "prometheus"
      scrape_interval: 5s
      static_configs:
          - targets: ["localhost:9090"]

    - job_name: "node_exporter"
      scrape_interval: 5s
      static_configs:
          - targets: ["localhost:9100"]

    - job_name: "blackbox"
      scrape_interval: 5s
      metrics_path: /probe
      params:
          module: [http_2xx]
      static_configs:
          - targets:
                - google.com
      relabel_configs:
          - source_labels: [__address__]
            target_label: __param_target
          - source_labels: [__param_target]
            target_label: instance
          - target_label: __address__
            replacement: 127.0.0.1:9115 # The blackbox exporter's real hostname:port.

    - job_name: "pingtime"
      scrape_interval: 1s
      metrics_path: /probe
      params:
          module: [icmp_ipv4]
      static_configs:
          - targets:
                - 127.0.0.1
                - 192.168.10.1
                - google.com
      relabel_configs:
          - source_labels: [__address__]
            target_label: __param_target
          - source_labels: [__param_target]
            target_label: instance
          - target_label: __address__
            replacement: 127.0.0.1:9115 # The blackbox exporter's real hostname:port.

What logging output did you get from adding &debug=true to the probe URL?

Logs for the probe:
ts=2018-10-09T21:30:58.758299096Z caller=main.go:116 module=icmp_ipv4 target=192.168.10.1 level=info msg="Beginning probe" probe=icmp timeout_seconds=9.5
ts=2018-10-09T21:30:58.758346367Z caller=utils.go:42 module=icmp_ipv4 target=192.168.10.1 level=info msg="Resolving target address" preferred_ip_protocol=ip4
ts=2018-10-09T21:30:58.758360112Z caller=utils.go:65 module=icmp_ipv4 target=192.168.10.1 level=info msg="Resolved target address" ip=192.168.10.1
ts=2018-10-09T21:30:58.758368729Z caller=icmp.go:68 module=icmp_ipv4 target=192.168.10.1 level=info msg="Using source address" srcIP=127.0.0.1
ts=2018-10-09T21:30:58.758378392Z caller=icmp.go:71 module=icmp_ipv4 target=192.168.10.1 level=info msg="Creating socket"
ts=2018-10-09T21:30:58.758408948Z caller=icmp.go:117 module=icmp_ipv4 target=192.168.10.1 level=info msg="Creating ICMP packet" seq=875 id=3048
ts=2018-10-09T21:30:58.758419928Z caller=icmp.go:129 module=icmp_ipv4 target=192.168.10.1 level=info msg="Writing out packet"
ts=2018-10-09T21:30:58.758453495Z caller=icmp.go:157 module=icmp_ipv4 target=192.168.10.1 level=info msg="Waiting for reply packets"
ts=2018-10-09T21:31:08.258410105Z caller=icmp.go:162 module=icmp_ipv4 target=192.168.10.1 level=warn msg="Timeout reading from socket" err="read ip 0.0.0.0: raw-read ip4 0.0.0.0: i/o timeout"
ts=2018-10-09T21:31:08.25850825Z caller=main.go:129 module=icmp_ipv4 target=192.168.10.1 level=error msg="Probe failed" duration_seconds=9.500172787



Metrics that would have been returned:
# HELP probe_dns_lookup_time_seconds Returns the time taken for probe dns lookup in seconds
# TYPE probe_dns_lookup_time_seconds gauge
probe_dns_lookup_time_seconds 1.1342e-05
# HELP probe_duration_seconds Returns how long the probe took to complete in seconds
# TYPE probe_duration_seconds gauge
probe_duration_seconds 9.500172787
# HELP probe_ip_protocol Specifies whether probe ip protocol is IP4 or IP6
# TYPE probe_ip_protocol gauge
probe_ip_protocol 4
# HELP probe_success Displays whether or not the probe was a success
# TYPE probe_success gauge
probe_success 0



Module configuration:
prober: icmp
timeout: 15s
icmp:
  preferred_ip_protocol: ip4
  source_ip_address: 127.0.0.1

That is my systemd service:

root@prometheus:~# cat /etc/systemd/system/blackbox_exporter.service 
# Ansible managed

[Unit]
Description=prometheus blackbox_exporter
After=syslog.target
After=network.target

[Service]
Type=simple
User=blackbox
Group=blackbox
AmbientCapabilities=CAP_NET_RAW
ExecStart=/opt/blackbox_exporter-0.12.0.linux-amd64/blackbox_exporter --config.file=/etc/blackbox/blackbox.yml
StandardOutput=syslog
StandardError=syslog
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target