do-agent: do-agent process constant high CPU usage
Describe the problem
The agent process uses 250-300% CPU the entire time it’s running. That can’t be normal.
Steps to reproduce
Run do-agent on a droplet.
Expected behavior
Does not constantly eat 250-300% CPU.
System Information
Ubuntu 20.04.1
do-agent information:
Paste the entire output
/opt/digitalocean/bin/do-agent --version:
do-agent (DigitalOcean Agent)
Version: 3.7.1
Revision: 32704ad
Build Date: Mon Oct 5 16:27:32 UTC 2020
Go Version: go1.15.2
Website: https://github.com/digitalocean/do-agent
Copyright (c) 2020 DigitalOcean, Inc. All rights reserved.
This work is licensed under the terms of the Apache 2.0 license.
For a copy, see <https://www.apache.org/licenses/LICENSE-2.0.html>.
Ubuntu, Debian
apt-cache policy do-agent:
do-agent:
Installed: 3.7.1
Candidate: 3.7.1
Version table:
*** 3.7.1 500
500 https://repos.insights.digitalocean.com/apt/do-agent main/main amd64 Packages
100 /var/lib/dpkg/status
3.6.0 500
500 https://repos.insights.digitalocean.com/apt/do-agent main/main amd64 Packages
3.5.6 500
500 https://repos.insights.digitalocean.com/apt/do-agent main/main amd64 Packages
3.5.5 500
500 https://repos.insights.digitalocean.com/apt/do-agent main/main amd64 Packages
3.5.4 500
500 https://repos.insights.digitalocean.com/apt/do-agent main/main amd64 Packages
3.5.2 500
500 https://repos.insights.digitalocean.com/apt/do-agent main/main amd64 Packages
3.5.1 500
500 https://repos.insights.digitalocean.com/apt/do-agent main/main amd64 Packages
3.3.1 500
500 https://repos.insights.digitalocean.com/apt/do-agent main/main amd64 Packages
3.2.1 500
500 https://repos.insights.digitalocean.com/apt/do-agent main/main amd64 Packages
3.0.5 500
500 https://repos.insights.digitalocean.com/apt/do-agent main/main amd64 Packages
2.2.4 500
500 https://repos.insights.digitalocean.com/apt/do-agent main/main amd64 Packages
2.2.3 500
500 https://repos.insights.digitalocean.com/apt/do-agent main/main amd64 Packages
2.2.1 500
500 https://repos.insights.digitalocean.com/apt/do-agent main/main amd64 Packages
2.2.0 500
500 https://repos.insights.digitalocean.com/apt/do-agent main/main amd64 Packages
2.1.3 500
500 https://repos.insights.digitalocean.com/apt/do-agent main/main amd64 Packages
2.0.2 500
500 https://repos.insights.digitalocean.com/apt/do-agent main/main amd64 Packages
2.0.1 500
500 https://repos.insights.digitalocean.com/apt/do-agent main/main amd64 Packages
2.0.0 500
500 https://repos.insights.digitalocean.com/apt/do-agent main/main amd64 Packages
1.1.3 500
500 https://repos.insights.digitalocean.com/apt/do-agent main/main amd64 Packages
The systemd journal is being spammed constantly with the following:
Logs begin at Tue 2020-10-13 22:28:25 UTC, end at Wed 2020-10-14 18:24:57 UTC. --
Oct 14 18:24:57 k8s-cluster-stage--worker-3 /opt/digitalocean/bin/do-agent[901]: /home/do-agent/cmd/do-agent/run.go:60: failed to gather metrics: 2 error(s) occurred:
* collected metric "node_filesystem_size_bytes" { label:<name:"device" value:"/dev/vda15" > label:<name:"fstype" value:"vfat" > label:<name:"mountpoint" value:"/boot/efi" > gauge:<value:1.09422592e+08 > } was collected before with the same name and label values
* collected metric "node_filesystem_free_bytes" { label:<name:"device" value:"/dev/vda15" > label:<name:"fstype" value:"vfat" > label:<name:"mountpoint" value:"/boot/efi" > gauge:<value:9.9854336e+07 > } was collected before with the same name and label values
Oct 14 18:24:57 k8s-cluster-stage--worker-3 /opt/digitalocean/bin/do-agent[901]: /home/do-agent/cmd/do-agent/run.go:60: failed to gather metrics: 2 error(s) occurred:
* collected metric "node_filesystem_size_bytes" { label:<name:"device" value:"/dev/vda15" > label:<name:"fstype" value:"vfat" > label:<name:"mountpoint" value:"/boot/efi" > gauge:<value:1.09422592e+08 > } was collected before with the same name and label values
* collected metric "node_filesystem_free_bytes" { label:<name:"device" value:"/dev/vda15" > label:<name:"fstype" value:"vfat" > label:<name:"mountpoint" value:"/boot/efi" > gauge:<value:9.9854336e+07 > } was collected before with the same name and label values
Oct 14 18:24:57 k8s-cluster-stage--worker-3 /opt/digitalocean/bin/do-agent[901]: /home/do-agent/cmd/do-agent/run.go:60: failed to gather metrics: 2 error(s) occurred:
* collected metric "node_filesystem_size_bytes" { label:<name:"device" value:"/dev/vda15" > label:<name:"fstype" value:"vfat" > label:<name:"mountpoint" value:"/boot/efi" > gauge:<value:1.09422592e+08 > } was collected before with the same name and label values
* collected metric "node_filesystem_free_bytes" { label:<name:"device" value:"/dev/vda15" > label:<name:"fstype" value:"vfat" > label:<name:"mountpoint" value:"/boot/efi" > gauge:<value:9.9854336e+07 > } was collected before with the same name and label values
Oct 14 18:24:57 k8s-cluster-stage--worker-3 /opt/digitalocean/bin/do-agent[901]: /home/do-agent/cmd/do-agent/run.go:60: failed to gather metrics: 2 error(s) occurred:
* collected metric "node_filesystem_size_bytes" { label:<name:"device" value:"/dev/vda15" > label:<name:"fstype" value:"vfat" > label:<name:"mountpoint" value:"/boot/efi" > gauge:<value:1.09422592e+08 > } was collected before with the same name and label values
* collected metric "node_filesystem_free_bytes" { label:<name:"device" value:"/dev/vda15" > label:<name:"fstype" value:"vfat" > label:<name:"mountpoint" value:"/boot/efi" > gauge:<value:9.9854336e+07 > } was collected before with the same name and label values
Oct 14 18:24:57 k8s-cluster-stage--worker-3 /opt/digitalocean/bin/do-agent[901]: /home/do-agent/cmd/do-agent/run.go:60: failed to gather metrics: 2 error(s) occurred:
* collected metric "node_filesystem_size_bytes" { label:<name:"device" value:"/dev/vda15" > label:<name:"fstype" value:"vfat" > label:<name:"mountpoint" value:"/boot/efi" > gauge:<value:1.09422592e+08 > } was collected before with the same name and label values
* collected metric "node_filesystem_free_bytes" { label:<name:"device" value:"/dev/vda15" > label:<name:"fstype" value:"vfat" > label:<name:"mountpoint" value:"/boot/efi" > gauge:<value:9.9854336e+07 > } was collected before with the same name and label values
Oct 14 18:24:57 k8s-cluster-stage--worker-3 /opt/digitalocean/bin/do-agent[901]: /home/do-agent/cmd/do-agent/run.go:60: failed to gather metrics: 2 error(s) occurred:
* collected metric "node_filesystem_size_bytes" { label:<name:"device" value:"/dev/vda15" > label:<name:"fstype" value:"vfat" > label:<name:"mountpoint" value:"/boot/efi" > gauge:<value:1.09422592e+08 > } was collected before with the same name and label values
* collected metric "node_filesystem_free_bytes" { label:<name:"device" value:"/dev/vda15" > label:<name:"fstype" value:"vfat" > label:<name:"mountpoint" value:"/boot/efi" > gauge:<value:9.9854336e+07 > } was collected before with the same name and label values
Oct 14 18:24:57 k8s-cluster-stage--worker-3 /opt/digitalocean/bin/do-agent[901]: /home/do-agent/cmd/do-agent/run.go:60: failed to gather metrics: 2 error(s) occurred:
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 1
- Comments: 22 (11 by maintainers)
3.8.0 is officially released. I am going to close this. Please open a new issue if you see anything similar in the future. Thanks!
Sorry for the delayed response. We had a hurricane here and I was without power for a few days. I’m glad to see that y’all were able to identify the problem and implement a fix! Thanks!!
24 hours later, and its still OK. Note: if you’ve been affected by this issue you might want to clean out your systemctl logs. Just got rid of 3.5 Gb of spam from mine using “/bin/journalctl --vacuum-size=500M”. Your mileage may vary: there may be more subtle ways to remove the logs from just do-agent, although I haven’t found them.
@bsnyder788 The 3.8.0 pre-release you have just made seems to have fixed the issue. At least it is no longer spamming those two error messages and taking a whole lot of CPU resources.
I couldn’t reproduce on a myriad of 20.04 droplets either, but I went ahead and made a new beta release that disables the collection of
/bootmountpoints. If some of you would give it a try to see if it now works on your specific droplets that would be fantastic. You can install it viacurl -SsL https://repos.insights.digitalocean.com/install.sh | sudo BETA=1 bash. Please let me know if that fixes your issues. cc @UnKnoWn-Consortium @lots0logs @plutocratThanks. I will try to reproduce from that .
@lots0logs I was not able to reproduce on it a k8s cluster either. Can you try adding
--web.listenin the systemd unit file (in the ExecStart line). e.g.ExecStart=/opt/digitalocean/bin/do-agent --web.listen --syslog. After doing asystemctl daemon-reloadand asystemctl restart do-agent, you should be able to do acurl localhost:9100and get the raw metrics that are being scraped. I would be curious to see if that is somehow having duplicate entries for the metrics in your original log.Thanks for the extra info @lots0logs . I’ll see if I can reproduce it on a k8s cluster and get to the bottom of why these errors are popping up for you.