fluent-plugin-google-cloud: High memory consumtion
This issue is strongly connected to google cloud plugin usage in Kubernetes and following problems:
- https://github.com/kubernetes/kubernetes/issues/32762
- https://github.com/kubernetes/kubernetes/issues/29411
- https://github.com/kubernetes/kubernetes/issues/23782
Basically, the problem is when fluentd with google cloud plugin starts to experience some load, memory consumption starts to grow and after some time (minutes, sometimes seconds) it exceeds it’s limits (currently, 200 Mb) and fluentd crashes. It can result in data losses and sometimes even prevent logs to be delivered at all, when fluentd enters infinite crash loop.
Kubernetes fluentd configuration
How to reproduce
We can reproduce it within kubernetes cluster, starting pod with this specs
apiVersion: v1
kind: Pod
metadata:
name: logs-generator-pod
spec:
containers:
- name: logs-generator-cont
image: gcr.io/google_containers/logs-generator
env:
- name: LOGS_GENERATOR_LINES_TOTAL
value: "600000"
- name: LOGS_GENERATOR_DURATION
value: 1m
After some time, you can see these lines in fluentd log
2000-01-01 00:00:00 +0000 [error]: fluentd main process died unexpectedly. restarting.
Here’s the memory consumption plot
And without this plugin (type null in match clauses in configuration) memory consumption does not exceed 60 MB, so the problem is not with fluentd pipeline.
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 35 (6 by maintainers)
Is anyone still working on memory usae of fluent on GKE? On my cluster it’s about 120MB RSS which is lower than stuff reported here but still more memory than any of my other pods, including prometheus server and my postgres database. Even 120MB is lot for a small cluster and it seems to me like quite a lot for shipping logs (unless it’s still caching things in memory).
@discordianfish could you please provide some estimation of your logging volume?
More findings with the following test-case:
I’ve found that running with the default memory allocator on Linux, Fluentd was claiming memory but the allocator keep it reserved even it was not longer required (some fragmentation or caching on libc maybe ?).
Instead, when running the same test with Jemalloc, memory usage hit once 200MB but it was in an average of 150MB.
To perform this test I’ve installed Jemalloc on /opt/ and add it to LD_PRELOAD before starting Fluentd, in my case:
@Crassirostris Would you please give it a try with Jemalloc ?
@repeatedly @tagomoris
I was able to reproduce the problem locally, steps below:
the configuration above generate 10000 records per second
Findings
On my findings I see the following behavior: