trivy-operator: Faulty scan jobs blocking further scans from being executed
What steps did you take and what happened:
Due to an error reported in https://github.com/aquasecurity/trivy-operator/issues/206 scan jobs getting stuck.
In this case, other PODs will not be scanned anymore as when the OPERATOR_CONCURRENT_SCAN_JOBS_LIMIT
is reached, no more Scan PODs will be re-spawned up anymore as trivy-operator still wait for them to finish.
Example (due to the error in https://github.com/aquasecurity/trivy-operator/issues/206) :
scan-vulnerabilityreport-5759f44647--1-qf7sh 0/1 Completed 0 7m49s
scan-vulnerabilityreport-7d57cffd5f--1-47vds 0/1 Completed 0 2m58s
scan-vulnerabilityreport-849fffd5c7--1-p9fdt 0/1 Completed 0 6m58s
scan-vulnerabilityreport-dc5fb6cf--1-xq5kw 0/1 Completed 0 7m28s
scan-vulnerabilityreport-f49679dcc--1-cvd8x 0/1 Completed 0 118s
What did you expect to happen: Even though that jobs get stuck due to an unforeseen error, they should get released after some time to make sure that the scan will continue with other Repositories/Registries. Otherwise, no more scan is happening.
Anything else you would like to add:
If the Job/Pod gets manually deleted it is likely that trivy-operator
picks up any other remaining deployment to scan, and then
scanning continues, but if it comes back to the deployment which results back into the error, again the POD gets stuck.
So to get all deployments scanned you need to increase the OPERATOR_CONCURRENT_SCAN_JOBS_LIMIT
to a high value
and you need frequently to delete all jobs/pods which got hung, to give ‘trivy-operator’ the freedom to re-spawn new scans.
Environment:
- Trivy-Operator version: 0.1.0
- Kubernetes version: 1.22
About this issue
- Original URL
- State: open
- Created 2 years ago
- Reactions: 4
- Comments: 18 (13 by maintainers)
@VF-mbrauer this issue is under investigation, I will update you once we have a solid solution.
I think you mean completed ones in
?