telegraf: MySQL collection took longer than expected error, after update from '1.25.3-1' to '1.26.0-1'

Relevant telegraf.conf

[global_tags]
[agent]
  hostname = "it-dads-glr-cmn-01"
  omit_hostname = false
  interval = "60s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "0s"
  precision = ""
  logfile = "/var/log/telegraf/telegraf.log"
  debug = true
  quiet = false
[outputs.graphite]
graphite_tag_support = false
prefix = "telegraf"
servers = ["it-carbon-relay10.domain.de:2003", "it-carbon-relay11.domain.de:2003"]
template = "host.measurement.tags.field"
timeout = 10
[inputs.cpu]
percpu = false
totalcpu = true
[inputs.disk]
ignore_fs = ["afs", "aufs", "auristorfs", "beegfs", "cvmfs2", "devfs", "devtmpfs", "fuse.gvfsd-fuse", "fuse", "gpfs", "iso9660", "nfs", "nfs4", "overlay", "shm", "squashfs", "tmpfs"]
[inputs.diskio.tagdrop]
name = ["loop*"]
[inputs.net]
interfaces = ["bond*", "data*", "em*", "en*", "eth*", "ib*", "lo", "p[0-9]p[0-9]"]
[[inputs.kernel]]
[[inputs.mem]]
[[inputs.processes]]
[[inputs.swap]]
[[inputs.system]]
[[inputs.mysql]]
  interval = "300s"
  servers = ["metricbeat@tcp(127.0.0.1:4306)/",
             "metricbeat@tcp(127.0.0.1:5306)/",
             "metricbeat@tcp(127.0.0.1:6306)/",
             "metricbeat@tcp(127.0.0.1:7306)/",
             "metricbeat@tcp(127.0.0.1:8306)/",
             "metricbeat@tcp(127.0.0.1:9306)/",
             "metricbeat@tcp(127.0.0.1:10306)/",
             "metricbeat@tcp(127.0.0.1:11306)/",
             "metricbeat@tcp(127.0.0.1:12306)/"]
  perf_events_statements_digest_text_limit  = 120
  perf_events_statements_limit              = 250
  perf_events_statements_time_limit         = 86400
  table_schema_databases                    = []
  gather_table_schema                       = false
  gather_process_list                       = true
  gather_user_statistics                    = true
  gather_info_schema_auto_inc               = true
  gather_innodb_metrics                     = true
  gather_slave_status                       = true
  gather_binary_logs                        = false
  gather_table_io_waits                     = true
  gather_table_lock_waits                   = true
  gather_index_io_waits                     = true
  gather_event_waits                        = true
  gather_file_events_stats                  = false
  gather_perf_events_statements             = false
  interval_slow                             = "60m"
  mariadb_dialect                           = true
[[outputs.prometheus_client]]
  listen = ":9273"
  ip_range = ["192.168.46.58/32", "127.0.0.1/32"]
  expiration_interval = "90s"

Logs from Telegraf

2023-03-21T05:29:07Z D! [outputs.prometheus_client] Buffer fullness: 0 / 10000 metrics
2023-03-21T05:29:07Z D! [outputs.graphite] Wrote batch of 33 metrics in 12.592805ms
2023-03-21T05:29:07Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
2023-03-21T05:29:17Z D! [outputs.prometheus_client] Buffer fullness: 0 / 10000 metrics
2023-03-21T05:29:17Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
2023-03-21T05:29:27Z D! [outputs.prometheus_client] Buffer fullness: 0 / 10000 metrics
2023-03-21T05:29:27Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
2023-03-21T05:29:37Z D! [outputs.prometheus_client] Buffer fullness: 0 / 10000 metrics
2023-03-21T05:29:37Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
2023-03-21T05:29:47Z D! [outputs.prometheus_client] Buffer fullness: 0 / 10000 metrics
2023-03-21T05:29:47Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
2023-03-21T05:29:57Z D! [outputs.prometheus_client] Buffer fullness: 0 / 10000 metrics
2023-03-21T05:29:57Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
2023-03-21T05:30:00Z D! [inputs.mysql] Previous collection has not completed; scheduled collection skipped
2023-03-21T05:30:00Z D! [inputs.disk] [SystemPS] => unable to get disk usage ("/sys/kernel/debug/tracing"): permission denied
2023-03-21T05:30:00Z D! [inputs.disk] [SystemPS] => unable to get disk usage ("/var/lib/containers/storage/overlay"): permission denied
2023-03-21T05:30:00Z W! [inputs.mysql] Collection took longer than expected; not complete after interval of 5m0s
2023-03-21T05:30:07Z D! [outputs.prometheus_client] Wrote batch of 33 metrics in 1.54032ms
2023-03-21T05:30:07Z D! [outputs.prometheus_client] Buffer fullness: 0 / 10000 metrics
2023-03-21T05:30:07Z D! [outputs.graphite] Wrote batch of 33 metrics in 13.107871ms
2023-03-21T05:30:07Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
2023-03-21T05:30:17Z D! [outputs.prometheus_client] Buffer fullness: 0 / 10000 metrics
2023-03-21T05:30:17Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
2023-03-21T05:30:27Z D! [outputs.prometheus_client] Buffer fullness: 0 / 10000 metrics
2023-03-21T05:30:27Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
2023-03-21T05:30:37Z D! [outputs.prometheus_client] Buffer fullness: 0 / 10000 metrics
2023-03-21T05:30:37Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
2023-03-21T05:30:47Z D! [outputs.prometheus_client] Buffer fullness: 0 / 10000 metrics
2023-03-21T05:30:47Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
2023-03-21T05:30:57Z D! [outputs.prometheus_client] Buffer fullness: 0 / 10000 metrics
2023-03-21T05:30:57Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics

System info

1.26.0, Ubuntu 22.04

Docker

No response

Steps to reproduce

set a mysql input
update telegraf from 1.25.3 to 1.26.0
wait some hours

Expected behavior

mysql metrics collection should work continuous like with 1.25.3

Actual behavior

After a couple of hours telegraf cant collect mysql metrics anymore.

Other inputs are still working.

Additional info

Hello,

It makes no difference, if I increase the interval for the mysql input.

With 1.25.3 the interval was one minute. There was no problem with this.

After the upgrade to 1.26.0 and increasing the interval to five minutes the problem still exists.

2023-03-21T09:45:00Z D! [inputs.mysql] Previous collection has not completed; scheduled collection skipped 2023-03-21T09:45:00Z W! [inputs.mysql] Collection took longer than expected; not complete after interval of 5m0s

As soon the problem starts, restarting telegraf needs much longer.

Problem exists

:~# time systemctl restart telegraf;

real	1m30.335s
user	0m0.007s
sys	0m0.008s

Normal restart

:~# time systemctl restart telegraf;

real	0m0.117s
user	0m0.008s
sys	0m0.009s

After downgrading to 1.25.3 again, the problem is gone.

Best regards, Patrick

About this issue

Original URL
State: closed
Created a year ago
Comments: 25 (13 by maintainers)

Most upvoted comments

Released the fixes with 1.26.1. Will close this for now, @Foxeronie if you are still having trouble please reopen!

srebhan on Apr 3, 2023

Hello,

I’ve been running the artifact from https://github.com/influxdata/telegraf/pull/13002 since Friday now. I can confirm the issue seems to be fixed, I have not had any gaps in my polling since then.

N3v3R3nD on Apr 3, 2023

@Foxeronie this is not the correct version… It should read 1.27.0~947190c8 or something like this.

I was writing my post, while you already asked for testing the other artifact. 😃 Thats why I deleted my comment.

:~# telegraf version
Telegraf 1.27.0-947190c8 (git: pull/13002@947190c8)

Tests already started. 👍

Foxeronie on Apr 3, 2023

Hi @srebhan !

https://github.com/influxdata/telegraf/pull/12919 produces the issue too.

If you need more information to debug just ask.

Best regards, Patrick

Foxeronie on Mar 29, 2023

@Foxeronie and @N3v3R3nD sorry for causing trouble in your installation! Can you guy please try the artifact in PR #12919 and see if this improves the situation!?

srebhan on Mar 29, 2023

Alright, both external dependencies were the same in v1.25.0 and v1.26.0:

github.com/go-sql-driver/mysql was v1.6.0
github.com/gofrs/uuid/v5 was v5.0.0

I have found 3 commits that are in master, but not in the v1.25.0 release branch related to the mysql plugin. Here is what I would ask: first, try the artifacts in https://github.com/influxdata/telegraf/pull/12929 first:

if 12929 reproduces the issue, then try the artifacts in https://github.com/influxdata/telegraf/pull/12928 and let me know the result.
if 12929 does not reproduce the issue, then try the artifacts in https://github.com/influxdata/telegraf/pull/12930 and let me know the result.

Thanks!

powersj on Mar 23, 2023