trivy-operator: OOMKilled in vulnerability scan job

What steps did you take and what happened:

Since we migrate from starboard-operator to trivy-operator, we see now many jobs terminate with OOMKilled in the trivy-operator log:

{"level":"error","ts":1658230025.0035832,"logger":"reconciler.vulnerabilityreport","msg":"Scan job container","job":"security/scan-vulnerabilityreport-5d76c6d6d8","container":"teleport","status.reason":"OOMKilled","status.message":"","stacktrace":"github.com/aquasecurity/trivy-operator/pkg/vulnerabilityreport.(*WorkloadController).reconcileJobs.func1\n\t/home/runner/work/trivy-operator/trivy-operator/pkg/vulnerabilityreport/controller.go:363\nsigs.k8s.io/controller-runtime/pkg/reconcile.Func.Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.3/pkg/reconcile/reconcile.go:102\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.3/pkg/internal/controller/controller.go:121\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.3/pkg/internal/controller/controller.go:320\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.3/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.3/pkg/internal/controller/controller.go:234"}

We also increased the cpu and memory limit to:

ressources:
  limits:
    memory: 2Gi
    cpu: 2

But did not helped.

We run trivy in ClientServer

What did you expect to happen:

That there is no OOM error in the job.

Environment:

  • Trivy-Operator version (use trivy-operator version): 0.1.3
  • Kubernetes version (use kubectl version): 1.20

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 1
  • Comments: 31 (6 by maintainers)

Most upvoted comments

@chen-keinan sorry, forgot to mention - it appears even on single scan job. After increasing trivy limits - OOM is gone and reports completed with no errors. P.S. It is strange that downloading could bring OOM, but checked more than ten times - everytime on donwloading…

Thanks for the putting this info , we are investigating the scan job OOM issue (during scanning process) , I’ll update shortly when we will completed the investigation

Hey everybody, I applied all your tips (using a dedicated trivy server instance, setting 1.5G memory limit and reduce parallel scan jobs to 2) and now it works !

PS: I see sometimes more than 2 scan jobs in parallel but I no longer had a OOMKilled so it’s great.

Thanks a lot 😃

I also encountered the same problem, the memory usage of the scan job did exceed the limit setting in the K8s resource quotas Screen Shot 2022-08-12 at 2 18 34 PM

Just checked latest trivy-operator 0.3.0 on fresh minikube cluster (3 cpu 6gb ram) with only installed sealed-secrets and trivy-server - still gets OOMKilled with default memory 500M limit and OPERATOR_CONCURRENT_SCAN_JOBS_LIMIT set to 1. Increasing memory limit to 1500M and OOMKilled goes away. Probably there is some short spike in memory consumption

thanks for the update , its still on out radar

Just checked latest trivy-operator 0.3.0 on fresh minikube cluster (3 cpu 6gb ram) with only installed sealed-secrets and trivy-server - still gets OOMKilled with default memory 500M limit and OPERATOR_CONCURRENT_SCAN_JOBS_LIMIT set to 1. Increasing memory limit to 1500M and OOMKilled goes away. Probably there is some short spike in memory consumption

@SergeyBear I have upgraded trivy version to 0.31.3 to be release with next trivy-operator version