fluent-plugin-google-cloud: High memory consumtion

This issue is strongly connected to google cloud plugin usage in Kubernetes and following problems:

Basically, the problem is when fluentd with google cloud plugin starts to experience some load, memory consumption starts to grow and after some time (minutes, sometimes seconds) it exceeds it’s limits (currently, 200 Mb) and fluentd crashes. It can result in data losses and sometimes even prevent logs to be delivered at all, when fluentd enters infinite crash loop.

Kubernetes fluentd configuration

How to reproduce

We can reproduce it within kubernetes cluster, starting pod with this specs

apiVersion: v1
kind: Pod
metadata:
  name: logs-generator-pod
spec:
  containers:
  - name: logs-generator-cont
    image: gcr.io/google_containers/logs-generator
    env:
    - name: LOGS_GENERATOR_LINES_TOTAL
      value: "600000"
    - name: LOGS_GENERATOR_DURATION
      value: 1m

After some time, you can see these lines in fluentd log

2000-01-01 00:00:00 +0000 [error]: fluentd main process died unexpectedly. restarting.

Here’s the memory consumption plot

screenshot from 2016-10-12 14 38 52

And without this plugin (type null in match clauses in configuration) memory consumption does not exceed 60 MB, so the problem is not with fluentd pipeline.

screenshot from 2016-10-12 15 19 25

About this issue

Original URL
State: closed
Created 8 years ago
Comments: 35 (6 by maintainers)

Commits related to this issue

Merge pull request #35216 from edsiper/fluentd-jemalloc Automatic merge from submit-queue cluster-addons: enable Jemalloc for Fluentd based images **What this PR does / why we need it**: This Pull... — committed to kubernetes/kubernetes by deleted user 8 years ago

Most upvoted comments

Is anyone still working on memory usae of fluent on GKE? On my cluster it’s about 120MB RSS which is lower than stuff reported here but still more memory than any of my other pods, including prometheus server and my postgres database. Even 120MB is lot for a small cluster and it seems to me like quite a lot for shipping logs (unless it’s still caching things in memory).

discordianfish on Aug 6, 2017

@discordianfish could you please provide some estimation of your logging volume?

piosz on Aug 9, 2017

More findings with the following test-case:

Generate a log file with 10000 new messages per second
Log generator runs for 5 fixed minutes.
Test different memory allocators

Memory Allocator	Initial Mem Usage	Mem usage after 5 minutes
System malloc (libc)	22MB	488MB
Jemalloc	71MB	155MB

I’ve found that running with the default memory allocator on Linux, Fluentd was claiming memory but the allocator keep it reserved even it was not longer required (some fragmentation or caching on libc maybe ?).

Instead, when running the same test with Jemalloc, memory usage hit once 200MB but it was in an average of 150MB.

To perform this test I’ve installed Jemalloc on /opt/ and add it to LD_PRELOAD before starting Fluentd, in my case:

$ LD_PRELOAD=/opt/jemalloc-4.2.1/lib/libjemalloc.so.2 /home/edsiper/coding/fluentd/bin/fluentd -c fluentd.conf -vv

@Crassirostris Would you please give it a try with Jemalloc ?

edsiper on Oct 14, 2016

@repeatedly @tagomoris

I was able to reproduce the problem locally, steps below:

Create a Fluentd configuration file:

<source>
  type tail
  path ./dummy_es.log
  tag perf_test
  format none
  read_from_head true
</source>
<match **>
   type copy
   <store>
      type elasticsearch
      host localhost
      port 9200
      include_tag_key true
      flush_interval 1s
      buffer_chunk_limit 2M
      buffer_queue_limit 8
      num_threads 8
      index_name fluentd
      type_name fluentd
   </store>
</match>

Make sure you have a Elasticsearch service up and running.
Start Fluentd (make sure dummy_es.log file do not exists
With Dummer tool start generating a dummy log file. Use the following configuration:

configure 'sample' do
  output "dummy_es.log"
  rate 10000
  delimiter " "
  labeled true
  field :id, type: :integer, countup: true, format: "%04d"
  field :time, type: :datetime, format: "[%Y-%m-%d %H:%M:%S]", random: false
  field :level, type: :string, any: %w[DEBUG INFO WARN ERROR]
  field :method, type: :string, any: %w[GET POST PUT]
  field :uri, type: :string, any: %w[/api/v1/people /api/v1/textdata /api/v1/messages]
  field :reqtime, type: :float, range: 0.1..5.0
  field :foobar, type: :string, length: 8
end

the configuration above generate 10000 records per second

Track memory usage on Fluentd. After 1-2 minutes it could go up to 500MB
Stop Dummer Tool

Findings

On my findings I see the following behavior:

Fluentd starts consuming 55MB
Fluentd start consuming the log file, in my computer it uses no more than 10% CPU and usually ingest the 10000 messages per second
Memory usage goes up after one or two minutes
After stop Dummer, I verified that the number of lines in dummer_es.log file are an exact match in Elasticsearch.
Fluentd consumed the log file and ingested data properly, but memory it’s not going down.
File generated dummy_es.log is around 60MB

edsiper on Oct 14, 2016