telegraf: Telegraf stops publishing metrics to InfluxDB; All plugins take too long to collect
Bug report
After a seemingly random amount of time, Telegraf stops publishing metrics to InfluxDB over UDP. I have been experiencing this issue since Nov 2016 on both Telegraf 1.3.x and 1.4.4 on FreeBSD, on three separate servers. In the telegraf log, all collectors start to fail with
Error in plugin [inputs.$name]: took longer to collect than collection interval (10s)
I can’t see anything unusual or interesting published from the Telegraf internal metrics.
This same issue has been reported in #3318, #2183, #2919, #2780 and #2870 but all those issues are either abandoned by the requestor, or confused with several separate issues; I am pening a new issue for my specific problem but if it’s a duplicate (#3318 seems to be the closest) then please feel free to close
Relevant telegraf.conf:
System info:
Telegraf v1.4.4 (git: unknown unknown)
running on FreeBSD 10.3-RELEASE-p24
Steps to reproduce:
service telegraf start
- wait
$random
time period
Expected behavior:
Telegraf publishes metrics to InfluxDB server over UDP
Actual behavior:
Telegraf stops publishing metrics seemingly randomly, all input plugins start to fail with:
2018-01-02T17:16:50Z E! Error in plugin [inputs.processes]: took longer to collect than collection interval (10s)
2018-01-02T17:16:51Z E! Error in plugin [inputs.apache]: took longer to collect than collection interval (10s)
2018-01-02T17:16:51Z E! Error in plugin [inputs.mem]: took longer to collect than collection interval (10s)
2018-01-02T17:16:51Z E! Error in plugin [inputs.internal]: took longer to collect than collection interval (10s)
2018-01-02T17:16:51Z E! Error in plugin [inputs.system]: took longer to collect than collection interval (10s)
2018-01-02T17:16:51Z E! Error in plugin [inputs.swap]: took longer to collect than collection interval (10s)
2018-01-02T17:16:52Z E! Error in plugin [inputs.cpu]: took longer to collect than collection interval (10s)
2018-01-02T17:16:58Z E! Error: statsd message queue full. We have dropped 1 messages so far. You may want to increase allowed_pending_messages in the config
2018-01-02T17:17:00Z E! Error in plugin [inputs.phpfpm]: took longer to collect than collection interval (10s)
2018-01-02T17:17:00Z E! Error in plugin [inputs.statsd]: took longer to collect than collection interval (10s)
Additional info:
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 31 (6 by maintainers)
I wouldn’t expect
socket_listener
to complain since it is a “service input”. This means that instead of being called each interval it is event driven: it adds new metrics when it receives on on its socket.I suspect even though there is no log message that you cannot send items to the socket_listener and have them delivered to an output.