do-agent: do-agent process constant high CPU usage

Describe the problem

The agent process uses 250-300% CPU the entire time it’s running. That can’t be normal.

Steps to reproduce

Run do-agent on a droplet.

Expected behavior

Does not constantly eat 250-300% CPU.

System Information

Ubuntu 20.04.1

do-agent information:

Paste the entire output

/opt/digitalocean/bin/do-agent --version:

do-agent (DigitalOcean Agent)

Version:     3.7.1
Revision:    32704ad
Build Date:  Mon Oct  5 16:27:32 UTC 2020
Go Version:  go1.15.2
Website:     https://github.com/digitalocean/do-agent

Copyright (c) 2020 DigitalOcean, Inc. All rights reserved.

This work is licensed under the terms of the Apache 2.0 license.
For a copy, see <https://www.apache.org/licenses/LICENSE-2.0.html>.

Ubuntu, Debian

apt-cache policy do-agent:

do-agent:
Installed: 3.7.1
Candidate: 3.7.1
Version table:
*** 3.7.1 500
500 https://repos.insights.digitalocean.com/apt/do-agent main/main amd64 Packages
100 /var/lib/dpkg/status
3.6.0 500
500 https://repos.insights.digitalocean.com/apt/do-agent main/main amd64 Packages
3.5.6 500
500 https://repos.insights.digitalocean.com/apt/do-agent main/main amd64 Packages
3.5.5 500
500 https://repos.insights.digitalocean.com/apt/do-agent main/main amd64 Packages
3.5.4 500
500 https://repos.insights.digitalocean.com/apt/do-agent main/main amd64 Packages
3.5.2 500
500 https://repos.insights.digitalocean.com/apt/do-agent main/main amd64 Packages
3.5.1 500
500 https://repos.insights.digitalocean.com/apt/do-agent main/main amd64 Packages
3.3.1 500
500 https://repos.insights.digitalocean.com/apt/do-agent main/main amd64 Packages
3.2.1 500
500 https://repos.insights.digitalocean.com/apt/do-agent main/main amd64 Packages
3.0.5 500
500 https://repos.insights.digitalocean.com/apt/do-agent main/main amd64 Packages
2.2.4 500
500 https://repos.insights.digitalocean.com/apt/do-agent main/main amd64 Packages
2.2.3 500
500 https://repos.insights.digitalocean.com/apt/do-agent main/main amd64 Packages
2.2.1 500
500 https://repos.insights.digitalocean.com/apt/do-agent main/main amd64 Packages
2.2.0 500
500 https://repos.insights.digitalocean.com/apt/do-agent main/main amd64 Packages
2.1.3 500
500 https://repos.insights.digitalocean.com/apt/do-agent main/main amd64 Packages
2.0.2 500
500 https://repos.insights.digitalocean.com/apt/do-agent main/main amd64 Packages
2.0.1 500
500 https://repos.insights.digitalocean.com/apt/do-agent main/main amd64 Packages
2.0.0 500
500 https://repos.insights.digitalocean.com/apt/do-agent main/main amd64 Packages
1.1.3 500
500 https://repos.insights.digitalocean.com/apt/do-agent main/main amd64 Packages

The systemd journal is being spammed constantly with the following:

Logs begin at Tue 2020-10-13 22:28:25 UTC, end at Wed 2020-10-14 18:24:57 UTC. --
Oct 14 18:24:57 k8s-cluster-stage--worker-3 /opt/digitalocean/bin/do-agent[901]: /home/do-agent/cmd/do-agent/run.go:60: failed to gather metrics: 2 error(s) occurred:
* collected metric "node_filesystem_size_bytes" { label:<name:"device" value:"/dev/vda15" > label:<name:"fstype" value:"vfat" > label:<name:"mountpoint" value:"/boot/efi" > gauge:<value:1.09422592e+08 > } was collected before with the same name and label values
* collected metric "node_filesystem_free_bytes" { label:<name:"device" value:"/dev/vda15" > label:<name:"fstype" value:"vfat" > label:<name:"mountpoint" value:"/boot/efi" > gauge:<value:9.9854336e+07 > } was collected before with the same name and label values
Oct 14 18:24:57 k8s-cluster-stage--worker-3 /opt/digitalocean/bin/do-agent[901]: /home/do-agent/cmd/do-agent/run.go:60: failed to gather metrics: 2 error(s) occurred:
* collected metric "node_filesystem_size_bytes" { label:<name:"device" value:"/dev/vda15" > label:<name:"fstype" value:"vfat" > label:<name:"mountpoint" value:"/boot/efi" > gauge:<value:1.09422592e+08 > } was collected before with the same name and label values
* collected metric "node_filesystem_free_bytes" { label:<name:"device" value:"/dev/vda15" > label:<name:"fstype" value:"vfat" > label:<name:"mountpoint" value:"/boot/efi" > gauge:<value:9.9854336e+07 > } was collected before with the same name and label values
Oct 14 18:24:57 k8s-cluster-stage--worker-3 /opt/digitalocean/bin/do-agent[901]: /home/do-agent/cmd/do-agent/run.go:60: failed to gather metrics: 2 error(s) occurred:
* collected metric "node_filesystem_size_bytes" { label:<name:"device" value:"/dev/vda15" > label:<name:"fstype" value:"vfat" > label:<name:"mountpoint" value:"/boot/efi" > gauge:<value:1.09422592e+08 > } was collected before with the same name and label values
* collected metric "node_filesystem_free_bytes" { label:<name:"device" value:"/dev/vda15" > label:<name:"fstype" value:"vfat" > label:<name:"mountpoint" value:"/boot/efi" > gauge:<value:9.9854336e+07 > } was collected before with the same name and label values
Oct 14 18:24:57 k8s-cluster-stage--worker-3 /opt/digitalocean/bin/do-agent[901]: /home/do-agent/cmd/do-agent/run.go:60: failed to gather metrics: 2 error(s) occurred:
* collected metric "node_filesystem_size_bytes" { label:<name:"device" value:"/dev/vda15" > label:<name:"fstype" value:"vfat" > label:<name:"mountpoint" value:"/boot/efi" > gauge:<value:1.09422592e+08 > } was collected before with the same name and label values
* collected metric "node_filesystem_free_bytes" { label:<name:"device" value:"/dev/vda15" > label:<name:"fstype" value:"vfat" > label:<name:"mountpoint" value:"/boot/efi" > gauge:<value:9.9854336e+07 > } was collected before with the same name and label values
Oct 14 18:24:57 k8s-cluster-stage--worker-3 /opt/digitalocean/bin/do-agent[901]: /home/do-agent/cmd/do-agent/run.go:60: failed to gather metrics: 2 error(s) occurred:
* collected metric "node_filesystem_size_bytes" { label:<name:"device" value:"/dev/vda15" > label:<name:"fstype" value:"vfat" > label:<name:"mountpoint" value:"/boot/efi" > gauge:<value:1.09422592e+08 > } was collected before with the same name and label values
* collected metric "node_filesystem_free_bytes" { label:<name:"device" value:"/dev/vda15" > label:<name:"fstype" value:"vfat" > label:<name:"mountpoint" value:"/boot/efi" > gauge:<value:9.9854336e+07 > } was collected before with the same name and label values
Oct 14 18:24:57 k8s-cluster-stage--worker-3 /opt/digitalocean/bin/do-agent[901]: /home/do-agent/cmd/do-agent/run.go:60: failed to gather metrics: 2 error(s) occurred:
* collected metric "node_filesystem_size_bytes" { label:<name:"device" value:"/dev/vda15" > label:<name:"fstype" value:"vfat" > label:<name:"mountpoint" value:"/boot/efi" > gauge:<value:1.09422592e+08 > } was collected before with the same name and label values
* collected metric "node_filesystem_free_bytes" { label:<name:"device" value:"/dev/vda15" > label:<name:"fstype" value:"vfat" > label:<name:"mountpoint" value:"/boot/efi" > gauge:<value:9.9854336e+07 > } was collected before with the same name and label values
Oct 14 18:24:57 k8s-cluster-stage--worker-3 /opt/digitalocean/bin/do-agent[901]: /home/do-agent/cmd/do-agent/run.go:60: failed to gather metrics: 2 error(s) occurred:

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 1
  • Comments: 22 (11 by maintainers)

Most upvoted comments

3.8.0 is officially released. I am going to close this. Please open a new issue if you see anything similar in the future. Thanks!

Sorry for the delayed response. We had a hurricane here and I was without power for a few days. I’m glad to see that y’all were able to identify the problem and implement a fix! Thanks!!

24 hours later, and its still OK. Note: if you’ve been affected by this issue you might want to clean out your systemctl logs. Just got rid of 3.5 Gb of spam from mine using “/bin/journalctl --vacuum-size=500M”. Your mileage may vary: there may be more subtle ways to remove the logs from just do-agent, although I haven’t found them.

@bsnyder788 The 3.8.0 pre-release you have just made seems to have fixed the issue. At least it is no longer spamming those two error messages and taking a whole lot of CPU resources.

I couldn’t reproduce on a myriad of 20.04 droplets either, but I went ahead and made a new beta release that disables the collection of /boot mountpoints. If some of you would give it a try to see if it now works on your specific droplets that would be fantastic. You can install it via curl -SsL https://repos.insights.digitalocean.com/install.sh | sudo BETA=1 bash . Please let me know if that fixes your issues. cc @UnKnoWn-Consortium @lots0logs @plutocrat

Is your 20.04 image the stock DO image or a custom 20.04 image? On Fri, Oct 30, 2020, 5:15 AM plutocrat @.***> wrote: Confirmed here. Running 4 Ubuntu 20.04 droplets. On all 4 do-agent cpu is around 95% all the time. Tried the --web.listen instruction, but apparently nothing listening on that port when I do. Installed version: 3.7.1 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#233 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXDLP3EWU6SM2GGR2YRJQDSNJ74XANCNFSM4SQ6RVVQ .

Both affected droplets were built from the stock DO Ubuntu 20.04 LTS image with “Monitoring” option checked at the DO dashboard https://cloud.digitalocean.com/droplets/new.

Thanks. I will try to reproduce from that .

@lots0logs I was not able to reproduce on it a k8s cluster either. Can you try adding --web.listen in the systemd unit file (in the ExecStart line). e.g. ExecStart=/opt/digitalocean/bin/do-agent --web.listen --syslog. After doing a systemctl daemon-reload and a systemctl restart do-agent, you should be able to do a curl localhost:9100 and get the raw metrics that are being scraped. I would be curious to see if that is somehow having duplicate entries for the metrics in your original log.

Thanks for the extra info @lots0logs . I’ll see if I can reproduce it on a k8s cluster and get to the bottom of why these errors are popping up for you.