telegraf: telegraf --test shows Ceph output, but actual Telegraf run returns failed to find sockets at path '/var/run/ceph': Failed to read socket directory '/var/run/ceph': open /var/run/ceph: permission denied
Relevant telegraf.conf:
# # Collects performance metrics from the MON and OSD nodes in a Ceph storage cluster.
[[inputs.ceph]]
# ## This is the recommended interval to poll. Too frequent and you will lose
# ## data points due to timeouts during rebalancing and recovery
interval = '1m'
#
# ## All configuration values are optional, defaults are shown below
#
# ## location of ceph binary
# ceph_binary = "/usr/bin/ceph"
#
# ## directory in which to look for socket files
# socket_dir = "/var/run/ceph"
#
# ## prefix of MON and OSD socket files, used to determine socket type
# mon_prefix = "ceph-mon"
# osd_prefix = "ceph-osd"
#
# ## suffix used to identify socket files
# socket_suffix = "asok"
#
# ## Ceph user to authenticate as
# ceph_user = "client.admin"
#
# ## Ceph configuration to use to locate the cluster
# ceph_config = "/etc/ceph/ceph.conf"
#
# ## Whether to gather statistics via the admin socket
# gather_admin_socket_stats = true
#
# ## Whether to gather statistics via ceph commands
gather_cluster_stats = true
System info:
Telegraf 1.10.0 (git: HEAD fe33ee8) Proxmox 5.3 (based on Debian Stretch) sysstat version 11.4.3
Contents of /var/run/ceph - public seems to have both read and execute bits on the relevant socket files. (However, I noticed the directly itself isn’t open to public?).
root@syd1:/var/run/ceph# ls -lah
total 0
drwxrwx--- 2 ceph ceph 160 Mar 14 07:48 .
drwxr-xr-x 27 root root 1.3K Mar 16 05:30 ..
srwxr-xr-x 1 ceph ceph 0 Mar 14 07:45 ceph-mgr.syd1.asok
srwxr-xr-x 1 ceph ceph 0 Mar 14 07:45 ceph-mon.syd1.asok
srwxr-xr-x 1 ceph ceph 0 Mar 14 07:48 ceph-osd.0.asok
srwxr-xr-x 1 ceph ceph 0 Mar 14 07:48 ceph-osd.1.asok
srwxr-xr-x 1 ceph ceph 0 Mar 14 07:48 ceph-osd.2.asok
srwxr-xr-x 1 ceph ceph 0 Mar 14 07:48 ceph-osd.3.asok
Steps to reproduce:
- Install Telegraf from InfluxData repositories.
- Edit /etc/telegraf/telegraf.conf and enable ceph input plugin, and various options (per above).
- Attempt to restart telegraf using systemctl restart telegraf
- Run
telegraf --test
to check syntax. - Run
journalctl -u telegraf
to check status of telegraf.
Expected behavior:
Ceph data that is shown in telegraf --test
should also be populated into InfluxDB output.
No permission errors should be seen in Telegraf logs around Ceph.
Actual behavior:
telegraf --test
does show ceph data in output.
However, no ceph data is populated into InfluxDB.
journalctl
shows error messages around Ceph permissions:
Mar 16 02:32:00 syd1 telegraf[577778]: 2019-03-15T15:32:00Z E! [inputs.ceph]: Error in plugin: failed to find sockets at path '/var/run/ceph': Failed to read socket directory '/var/run/ceph': open /var/run/ceph: permission denied
Additional info:
I did find this earlier issue around Ceph permissions and Telegraf:
https://github.com/influxdata/telegraf/issues/1657
which mentions this Ceph PR to add a config option for socket permissions:
https://github.com/ceph/ceph/pull/11684
However, as per above, it seems like my actual socket files have public read/execute already set?
Also it’s odd that telegraf --test
returns Ceph output. (Not sure if it’s something related to Proxmox, which doesn’t include sudo
by default). What user does telegraf --test
run under?
(Interestingly - the above Ceph admin socket option is apparently not that well documented - see this Medium article.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 21 (9 by maintainers)
In the comment above I tried it with copying the keyring and ceph.conf to
/etc/telegraf/
directory, setting permissions and all the stuff… Didn’t work.To summarize my final setup…
Telegraf is in the ceph group:
The ownership of telegraf keyring is set to telegraf user:
The
ceph.conf
is holding theclient.telegraf
and pointing to the keyring:The
telegraf.conf
is also holding theclient.telegraf
user and the ceph config file:Results:
I also added
--name client.telegraf
to the commands inceph.go
plugin and built it on my machine. Results same as above without the change, even if telegraf is running the check with--name client.telegraf
argument. Still ignoring theclient.telegraf
user inceph.conf
and trying to find the keyring…Any suggestions?