troubleshoot: Slow performance generating a host collector support bundle
Bug Description
Performed the following procedure to generate a host collector support bundle:
- Installed support bundle binary:
curl -L https://github.com/replicatedhq/troubleshoot/releases/latest/download/support-bundle_linux_amd64.tar.gz | tar xzvf - sudo ./support-bundle --interactive=false https://raw.githubusercontent.com/replicatedhq/troubleshoot-specs/main/host/default.yaml— I waited at least 5 minutes for the support bundle to generate and then cancelled.
I originally thought that the support bundle was hanging because it was taking so long; however, I did not realize that the process was not hanging but just extremely slow. I found this out by just letting the process run until it completed. The process generally took 6 mins +/- 15 sec to complete; however, I experienced wait times for up to ~11-12 minutes.
Environment:
- 3 node kurl installed k8s cluster using linode shared cpu instances
Expected Behavior I expected the host collector support bundle generation process to be completed in <90-120 secs which is what I have always experienced when generating a host collect support bundle prior.
Note: I experienced the same behavior on all three nodes.
Steps To Reproduce see bug description above for steps to reproduce.
Additional Context
Include the following information.
- Troubleshoot version. If you built from source, note that including the version of Go you used to build with.
- Replicated Troubleshoot v0.62.1-26-g97efe83
- Go version: go version go1.20.3 linux/amd64
- Operating system: Ubuntu
- Operating system version: 22.04.2 LTS (Jammy Jellyfish)
- Other details that might be helpful in diagnosing the problem -Further investigation elucidated that the redactor process appears to be the culprit in causing the long processing time. I will articulate details in how I came to suspect the redactor process in my next comments in this issue.
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 20 (20 by maintainers)
https://github.com/adamancini/troubleshoot/blob/ada/redact-faster/pkg/collect/redact.go#L109
After adding some traces to each call of Redact and ReplaceResult:
the rest are sub 50ms
Complete traces from redaction
via https://github.com/replicatedhq/troubleshoot/issues/1151#issuecomment-1544754521
Nathan Sullivan [May 10 2023, 1:20 am] For anyone picking this up: focus your attention on https://github.com/replicatedhq/troubleshoot/blob/main/pkg/collect/redact.go#L19-L112
Reply Nathan SullivanNathan Sullivan [May 10 2023, 1:26 am] If I was to not redact some files, first candidates I’d be looking at (to see if they are safe to not redact) are:
host-collectors/run-host/kubeadm-kustomize-patches.txt host-collectors/apiserver-audit-logs/k8s-audit.log
These both feel like they have a fairly big wall time component to redacting them…
Reply Nathan SullivanNathan Sullivan [May 10 2023, 1:33 am] This operation is fairly expensive with the volume of logs we’re shuffling around during redaction:
https://github.com/replicatedhq/troubleshoot/blob/main/pkg/collect/redact.go#L106
Which in turn ends up sitting here for a while:
https://github.com/replicatedhq/troubleshoot/blob/main/pkg/collect/result.go#L125
Relevant: https://app.shortcut.com/replicated/story/51656/redaction-is-inefficient