falco: OOM on physical servers
Describe the bug
Pn 0.34.x releases we do experience mem leak on physical instances, while the same setup on AWS is fine. It could be due node workload, but still its clear mem leak.
Actually as of now RC not identified,
- looking for help to do some mem profile or debug the issue
- anyone with similar behavior?
How to reproduce it
This is bit customised deployment (not helm, etc.)
This is the config falco is given (we do use more rules, but the problem happens with only upstream ones (now the rules from rules repo)
data:
falco.yaml: |
rules_file:
- /etc/falco-upstream/falco_rules.yaml
- /etc/falco/rules.d
plugins:
- name: json
library_path: libjson.so
init_config: ""
load_plugins: []
watch_config_files: true
time_format_iso_8601: false
json_include_output_property: true
json_include_tags_property: true
json_output: true
log_stderr: true
log_syslog: false
# "alert", "critical", "error", "warning", "notice", "info", "debug".
log_level: error
libs_logger:
enabled: false
severity: debug # "info", "debug", "trace".
priority: warning
buffered_outputs: false
syscall_buf_size_preset: 4
syscall_event_drops:
threshold: 0.1
actions:
- log
rate: 0.03333
max_burst: 1
simulate_drops: false
syscall_event_timeouts:
max_consecutives: 1000
webserver:
enabled: true
k8s_healthz_endpoint: /healthz
listen_port: 64765
ssl_enabled: false
ssl_certificate: /volterra/secrets/identity/server.crt
threadiness: 0
#k8s_audit_endpoint: /k8s-audit
output_timeout: 2000
outputs:
rate: 1
max_burst: 1000
syslog_output:
enabled: false
file_output:
enabled: false
keep_alive: false
filename: ./events.txt
stdout_output:
enabled: true
program_output:
enabled: false
keep_alive: false
program: "jq '{text: .output}' | curl -d @- -X POST https://hooks.slack.com/services/XXX"
http_output:
enabled: true
url: "http://falco-sidekick.monitoring.svc.cluster.local:64801/"
user_agent: falcosecurity/falco
grpc:
enabled: false
bind_address: unix:///run/falco/falco.sock
threadiness: 0
grpc_output:
enabled: false
metadata_download:
max_mb: 100
chunk_wait_us: 1000
watch_freq_sec: 1
modern_bpf:
cpus_for_each_syscall_buffer: 2"
Expected behaviour
Drop memory at regular intervals
Screenshots
Cloud instances of falco on AWS: (ok behaviour, screenshot is imo on 0.33.x version)

Instances on physical servers: ( OOM, on 0.34.1, the nodes in the cluster are exactly the same, though, only 2 of 4 are affected by mem increase (could be due specific workload). Surprisingly same metric does not match the pattern from AWS/GCP nodes (above)

Environment
K8s, falco in container Physical server, under load
- Falco version:
{"default_driver_version":"4.0.0+driver","driver_api_version":"3.0.0","driver_schema_version":"2.0.0","engine_version":"16","falco_version":"0.34.1","libs_version":"0.10.4","plugin_api_version":"2.0.0"}
- System info:
{
"machine": "x86_64",
"nodename": "master-1",
"release": "4.18.0-240.10.1.ves1.el7.x86_64",
"sysname": "Linux",
"version": "#1 SMP Tue Mar 30 15:02:49 UTC 2021"
}
- Cloud provider or hardware configuration:
- OS:
/etc/os-releasenot relevant, it’s basically centos but customised - Kernel:
root@master-1:/# uname -a
Linux master-1 4.18.0-240.10.1.ves1.el7.x86_64 #1 SMP Tue Mar 30 15:02:49 UTC 2021 x86_64 GNU/Linux
- Installation method: K8s, custom manifests - described on some older issue here: https://github.com/falcosecurity/falco/issues/1909#issuecomment-1195153675
About this issue
- Original URL
- State: open
- Created a year ago
- Comments: 34 (26 by maintainers)
Simulated a noisy Falco config on my developer Linux box. Enabling most supported syscalls was sufficient to simulate memory issues:
Using valgrind massif heap profiler:
Reading the tbb API docs https://oneapi-src.github.io/oneTBB/main/tbb_userguide/Concurrent_Queue_Classes.html, we use the following variant
... By default, a concurrent_bounded_queue is unbounded. It may hold any number of values, until memory runs out. ...and currently we do not set a safety capacity, or better expose it as parameter.Here is a staging branch to correct this: https://github.com/incertum/falco/tree/queue-capacity-outputs, what do you all think?
However, the root cause is rather the entire event flow being too slow, basically we don’t get to pop in time from the queue in these extreme cases, because we are seeing timeouts and also noticed heavy kernel side drops. Basically the pipe is just not holding up when trying to monitor so many syscalls even just on a more or less idle laptop. I would suggest we should re-audit the entire Falco processing and outputs engine and look for improvement areas, because when I did the same profiling with the libs
sinsp-examplebinary, memory and output logs were pretty stable over time …Yes, upgraded to 0.36.0 last week. The falco container is still getting OOMkilled by kubernetes/cgroups (Last state: Terminated with 137: OOMKilled) with the default
queue capacityconfig.Unfortunately I have a hard time exposing the stats/metrics to our TSDB.
Hi @emilgelman thanks this is great news you have cgroups v2. By the way we now also have the
base_syscallsconfig infalco.yamlfor radical syscalls monitoring control, check it out.However, I think here we need to investigate in different places more drastically (meaning going back to the drawing board) as it has also been reported for plugins only. In that case we merely do event filtering in libsinsp, so most of the libsinsp complexity does not apply which kind of narrows down the search space.
I am going to prioritize 👀 into it, it likely will take some time.
In addition, in case you are curious to learn more about the underlying libs and kernel drivers with respect to memory:
syscall_buf_size_presetandmodern_bpf.cpus_for_each_syscall_buffercan help, again this is just some more insights a bit unrelated to the fact that we are investigating subtle drifts over time in this issue. I am also still hoping to one day meet someone who knows all the answers re Linux kernel memory management and accounting, often it’s not even clear what the right metric is and if the metric is accounting memory in a meaningful way.@incertum the host is running cgroups v2:
I am experimenting with the effect of rules configuration on this. It seems that disabling all rules doesn’t reproduce the issue, so I’m trying to understand if I can isolate it to specific rule/s.