gatekeeper: OOMKilled - gatekeeper:v3.4.0

gatekeeper-audit seems to be consuming a lot of memory. Initially, we observed that the pod was crashlooping, as it was being OOMKilled. We have bumped the limits couple of times now, but it still ends up using whatever limits we set.

We have used a VerticalPodAutoscaler on the gatekeeper-audit deployment to get insights on the memory consumption and what the target memory should be. We have tried adjusting the resources few times now, but the memory consumption keeps growing. As of now, it looks something like:

    resources:
      limits:
        cpu: "1"
        memory: 850Mi
      requests:
        cpu: 100m
        memory: 850Mi

image

I am a bit curious to know how this actually works. We are deploying this on a shared multi-tenancy cluster, so there will be much more api-resources added as we add new tenants. As of now, we just have a single basic rule K8sRequiredLabels as a POC.

It seems like the way gatekeeper-audit works is that it pretty much loads all resources into memory and performs an audit by using the rules defined.

Are there any recommendations on what we should be doing on our end to improve this memory utilization. I have also reviewed the below related issues and followed the recommendations, but no luck:

  1. https://github.com/open-policy-agent/gatekeeper/issues/339
  2. https://github.com/open-policy-agent/gatekeeper/issues/780

Kubernetes version:

kubectl version --short=true
Client Version: v1.15.0
Server Version: v1.17.17-gke.3000

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 22 (12 by maintainers)

Most upvoted comments

@abhinav454 that’s right, looks like it got added in #1245 and it’s in staging chart. It’ll be available in the helm repo when we cut next minor version release.

I ran into this same issues and implemented the additional arg of --audit-match-kind-only=true. That didn’t stop the OOMkilled pods. I added --audit-chunk-size=500 and that did the trick.