telegraf: MySQL collection took longer than expected error, after update from '1.25.3-1' to '1.26.0-1'
Relevant telegraf.conf
[global_tags]
[agent]
hostname = "it-dads-glr-cmn-01"
omit_hostname = false
interval = "60s"
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = "0s"
flush_interval = "10s"
flush_jitter = "0s"
precision = ""
logfile = "/var/log/telegraf/telegraf.log"
debug = true
quiet = false
[outputs.graphite]
graphite_tag_support = false
prefix = "telegraf"
servers = ["it-carbon-relay10.domain.de:2003", "it-carbon-relay11.domain.de:2003"]
template = "host.measurement.tags.field"
timeout = 10
[inputs.cpu]
percpu = false
totalcpu = true
[inputs.disk]
ignore_fs = ["afs", "aufs", "auristorfs", "beegfs", "cvmfs2", "devfs", "devtmpfs", "fuse.gvfsd-fuse", "fuse", "gpfs", "iso9660", "nfs", "nfs4", "overlay", "shm", "squashfs", "tmpfs"]
[inputs.diskio.tagdrop]
name = ["loop*"]
[inputs.net]
interfaces = ["bond*", "data*", "em*", "en*", "eth*", "ib*", "lo", "p[0-9]p[0-9]"]
[[inputs.kernel]]
[[inputs.mem]]
[[inputs.processes]]
[[inputs.swap]]
[[inputs.system]]
[[inputs.mysql]]
interval = "300s"
servers = ["metricbeat@tcp(127.0.0.1:4306)/",
"metricbeat@tcp(127.0.0.1:5306)/",
"metricbeat@tcp(127.0.0.1:6306)/",
"metricbeat@tcp(127.0.0.1:7306)/",
"metricbeat@tcp(127.0.0.1:8306)/",
"metricbeat@tcp(127.0.0.1:9306)/",
"metricbeat@tcp(127.0.0.1:10306)/",
"metricbeat@tcp(127.0.0.1:11306)/",
"metricbeat@tcp(127.0.0.1:12306)/"]
perf_events_statements_digest_text_limit = 120
perf_events_statements_limit = 250
perf_events_statements_time_limit = 86400
table_schema_databases = []
gather_table_schema = false
gather_process_list = true
gather_user_statistics = true
gather_info_schema_auto_inc = true
gather_innodb_metrics = true
gather_slave_status = true
gather_binary_logs = false
gather_table_io_waits = true
gather_table_lock_waits = true
gather_index_io_waits = true
gather_event_waits = true
gather_file_events_stats = false
gather_perf_events_statements = false
interval_slow = "60m"
mariadb_dialect = true
[[outputs.prometheus_client]]
listen = ":9273"
ip_range = ["192.168.46.58/32", "127.0.0.1/32"]
expiration_interval = "90s"
Logs from Telegraf
2023-03-21T05:29:07Z D! [outputs.prometheus_client] Buffer fullness: 0 / 10000 metrics
2023-03-21T05:29:07Z D! [outputs.graphite] Wrote batch of 33 metrics in 12.592805ms
2023-03-21T05:29:07Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
2023-03-21T05:29:17Z D! [outputs.prometheus_client] Buffer fullness: 0 / 10000 metrics
2023-03-21T05:29:17Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
2023-03-21T05:29:27Z D! [outputs.prometheus_client] Buffer fullness: 0 / 10000 metrics
2023-03-21T05:29:27Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
2023-03-21T05:29:37Z D! [outputs.prometheus_client] Buffer fullness: 0 / 10000 metrics
2023-03-21T05:29:37Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
2023-03-21T05:29:47Z D! [outputs.prometheus_client] Buffer fullness: 0 / 10000 metrics
2023-03-21T05:29:47Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
2023-03-21T05:29:57Z D! [outputs.prometheus_client] Buffer fullness: 0 / 10000 metrics
2023-03-21T05:29:57Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
2023-03-21T05:30:00Z D! [inputs.mysql] Previous collection has not completed; scheduled collection skipped
2023-03-21T05:30:00Z D! [inputs.disk] [SystemPS] => unable to get disk usage ("/sys/kernel/debug/tracing"): permission denied
2023-03-21T05:30:00Z D! [inputs.disk] [SystemPS] => unable to get disk usage ("/var/lib/containers/storage/overlay"): permission denied
2023-03-21T05:30:00Z W! [inputs.mysql] Collection took longer than expected; not complete after interval of 5m0s
2023-03-21T05:30:07Z D! [outputs.prometheus_client] Wrote batch of 33 metrics in 1.54032ms
2023-03-21T05:30:07Z D! [outputs.prometheus_client] Buffer fullness: 0 / 10000 metrics
2023-03-21T05:30:07Z D! [outputs.graphite] Wrote batch of 33 metrics in 13.107871ms
2023-03-21T05:30:07Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
2023-03-21T05:30:17Z D! [outputs.prometheus_client] Buffer fullness: 0 / 10000 metrics
2023-03-21T05:30:17Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
2023-03-21T05:30:27Z D! [outputs.prometheus_client] Buffer fullness: 0 / 10000 metrics
2023-03-21T05:30:27Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
2023-03-21T05:30:37Z D! [outputs.prometheus_client] Buffer fullness: 0 / 10000 metrics
2023-03-21T05:30:37Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
2023-03-21T05:30:47Z D! [outputs.prometheus_client] Buffer fullness: 0 / 10000 metrics
2023-03-21T05:30:47Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
2023-03-21T05:30:57Z D! [outputs.prometheus_client] Buffer fullness: 0 / 10000 metrics
2023-03-21T05:30:57Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
System info
1.26.0, Ubuntu 22.04
Docker
No response
Steps to reproduce
- set a mysql input
- update telegraf from 1.25.3 to 1.26.0
- wait some hours
Expected behavior
mysql metrics collection should work continuous like with 1.25.3
Actual behavior
After a couple of hours telegraf cant collect mysql metrics anymore.
Other inputs are still working.
Additional info
Hello,
It makes no difference, if I increase the interval for the mysql input.
With 1.25.3 the interval was one minute. There was no problem with this.
After the upgrade to 1.26.0 and increasing the interval to five minutes the problem still exists.
2023-03-21T09:45:00Z D! [inputs.mysql] Previous collection has not completed; scheduled collection skipped 2023-03-21T09:45:00Z W! [inputs.mysql] Collection took longer than expected; not complete after interval of 5m0s
As soon the problem starts, restarting telegraf needs much longer.
Problem exists
:~# time systemctl restart telegraf;
real 1m30.335s
user 0m0.007s
sys 0m0.008s
Normal restart
:~# time systemctl restart telegraf;
real 0m0.117s
user 0m0.008s
sys 0m0.009s
After downgrading to 1.25.3 again, the problem is gone.
Best regards, Patrick
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 25 (13 by maintainers)
Released the fixes with 1.26.1. Will close this for now, @Foxeronie if you are still having trouble please reopen!
Hello,
I’ve been running the artifact from https://github.com/influxdata/telegraf/pull/13002 since Friday now. I can confirm the issue seems to be fixed, I have not had any gaps in my polling since then.
I was writing my post, while you already asked for testing the other artifact. 😃 Thats why I deleted my comment.
Tests already started. 👍
Hi @srebhan !
https://github.com/influxdata/telegraf/pull/12919 produces the issue too.
If you need more information to debug just ask.
Best regards, Patrick
@Foxeronie and @N3v3R3nD sorry for causing trouble in your installation! Can you guy please try the artifact in PR #12919 and see if this improves the situation!?
Alright, both external dependencies were the same in v1.25.0 and v1.26.0:
I have found 3 commits that are in master, but not in the v1.25.0 release branch related to the mysql plugin. Here is what I would ask: first, try the artifacts in https://github.com/influxdata/telegraf/pull/12929 first:
Thanks!