trivy-operator: OOMKilled in vulnerability scan job
What steps did you take and what happened:
Since we migrate from starboard-operator to trivy-operator, we see now many jobs terminate with OOMKilled in the trivy-operator log:
{"level":"error","ts":1658230025.0035832,"logger":"reconciler.vulnerabilityreport","msg":"Scan job container","job":"security/scan-vulnerabilityreport-5d76c6d6d8","container":"teleport","status.reason":"OOMKilled","status.message":"","stacktrace":"github.com/aquasecurity/trivy-operator/pkg/vulnerabilityreport.(*WorkloadController).reconcileJobs.func1\n\t/home/runner/work/trivy-operator/trivy-operator/pkg/vulnerabilityreport/controller.go:363\nsigs.k8s.io/controller-runtime/pkg/reconcile.Func.Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.3/pkg/reconcile/reconcile.go:102\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.3/pkg/internal/controller/controller.go:121\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.3/pkg/internal/controller/controller.go:320\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.3/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.3/pkg/internal/controller/controller.go:234"}
We also increased the cpu and memory limit to:
ressources:
limits:
memory: 2Gi
cpu: 2
But did not helped.
We run trivy in ClientServer
What did you expect to happen:
That there is no OOM error in the job.
Environment:
- Trivy-Operator version (use
trivy-operator version
): 0.1.3 - Kubernetes version (use
kubectl version
): 1.20
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 1
- Comments: 31 (6 by maintainers)
Thanks for the putting this info , we are investigating the scan job OOM issue (during scanning process) , I’ll update shortly when we will completed the investigation
Hey everybody, I applied all your tips (using a dedicated trivy server instance, setting 1.5G memory limit and reduce parallel scan jobs to 2) and now it works !
PS: I see sometimes more than 2 scan jobs in parallel but I no longer had a OOMKilled so it’s great.
Thanks a lot 😃
I also encountered the same problem, the memory usage of the scan job did exceed the limit setting in the K8s resource quotas
thanks for the update , its still on out radar
Just checked latest trivy-operator 0.3.0 on fresh minikube cluster (3 cpu 6gb ram) with only installed sealed-secrets and trivy-server - still gets OOMKilled with default memory 500M limit and OPERATOR_CONCURRENT_SCAN_JOBS_LIMIT set to 1. Increasing memory limit to 1500M and OOMKilled goes away. Probably there is some short spike in memory consumption
@SergeyBear I have upgraded trivy version to 0.31.3 to be release with next
trivy-operator
version