opa: `http.send` possible memory leak?
Short description
Our policy sends many cached http.send requests with low max-age 10s. OPA server is configured with 100M max cache size:
caching:
inter_query_builtin_cache:
max_size_bytes: 100000000
I noticed that memory usage reported by OPA go_memstats_alloc_bytes looks like this:

It is similar to what k8s reports to us. Also gc reports from GODEBUG are similar.
It doesn’t look like the max cache size has any effect and eventually the OPA pod will crash on OOM. We do not experience any errors, all works as expected only the memory is still growing until pod gets killed.
Http send looks like:
res = http.send({"method": "get",
"url": url,
"timeout": "10s",
"force_json_decode": true,
"cache": true,
"raise_error": false,
"headers": {"Authorization": sprintf("Bearer %s", [input.token])}})
Version:
FROM openpolicyagent/opa:0.45.0
Expected behavior
Memory usage should stay consistent around 100M-150M
Any ideas what could be wrong here? Thanks!
Possibly related: https://github.com/open-policy-agent/opa/issues/5320
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 18 (9 by maintainers)
I’ve created this issue for including the cache key in the cache size calculation.
There are two parts to the problem:
c.lhere are only ever removed here, which is only ever happens when the cache is fullc.lallows for duplicate elementsThis means
c.lwill grow and grow until the cache becomes full or OPA reaches its memory limit and becomes oomkilledIt is easier than one might think to be affected by this issue. Consider the following setup:
http.sendhas 10 second cache lifetimehttp.senduses bearer authentication, size of JWT is approx 1KB, JWT is renewed every 5 minuteshttp.sendfetches data for individual users, each user will have a unique http.send request that is cachedhttp.sendis approx 2KBIf OPA in this scenario processes 100 active users, OPA would insert 100KB worth of keys into
c.levery 10 seconds, while the actual cache would insert 200KB of data every 5 minutes. 100KB every 10 seconds means the size ofc.lincreases by 36 MB every hour! Meanwhilec.usageincreases by only 2.4MB per hour. In this case the cache should peak after approx. 4 hours at ac.lsize of 144 MB where one expected the cache to use only 10 MB.It would be great if OPA considered the size of cache keys😄