fluentd: Memory leak

Describe the bug

I noticed a memory issue, what looks to me as a leak. I had a more complex configuration, but boiled it down to this simple config just to rule out the specifics in my config. Even with this (see below) config, I observe the memory leakage.

fluentd_container_memory_usage

I run fluentd 1.12.3 from a container on a Kubernetes cluster via Helm

To Reproduce Deploy Fluentd 1.12.3 using Helm on a Kubernetes cluster using the provided config. Let it run for some time and observe the memory metrics

Expected behavior No memory leak

Your Environment Fluentd 1.12.3 on Kubernetes 1.17.7

Your Configuration

  <system>
      log_level info
      log_event_verbose true
      root_dir /tmp/fluentd-buffers/
    </system>

    <label @OUTPUT>
      <match **>
        @type file
        path /var/log/fluentd-output
        compress gzip
        <buffer>
          timekey 1d
          timekey_use_utc true
          timekey_wait 10m
        </buffer>
      </match>
    </label>

    ###
    # systemd/journald
    ###
    <source>
      @type systemd
      tag systemd
      path /var/log/journal
      read_from_head true

      <entry>
        fields_strip_underscores true
        fields_lowercase true
      </entry>

      @label @OUTPUT
    </source>

Additional context Using my more complex configuration I observe the same memory growth but there seems to be a correlation between the growth in size and the number of logs ingested. In other words, using more inputs (in my case, container logs), the (assumed) memory leak is larger.

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Reactions: 7
  • Comments: 31 (4 by maintainers)

Most upvoted comments

@ashie any update on this?

    # https://www.speedshop.co/2017/12/04/malloc-doubles-ruby-memory.html
    - name: MALLOC_ARENA_MAX
      value: "2"

When you use jemalloc, MALLOC_ARENA_MAX doesn’t make sense, so you can remove it.

After upgrading fluentd v1.7.4 to v1.14.6, the memory usage has been spiked from 400~500Mi to about 4Gi. And it trend keep growing slowly but infinitely day by day, even though there is NO configuration and workload changes. The following graph was regarding several hours.

image

Using plugin list:

Is there any updates to affect memory usage between the both versions ?

for me it was a salvation in setting

            prefer_oj_serializer true
            http_backend typhoeus
            buffer_type memory

for elasticsearch match section.

And the aggregator pod is using all the ram which you give to the limit until ram release the ram. also I added

    extraEnvVars:
    # https://brandonhilkert.com/blog/reducing-sidekiq-memory-usage-with-jemalloc/?utm_source=reddit&utm_medium=social&utm_campaign=jemalloc
    - name: LD_PRELOAD
      value: /usr/lib/x86_64-linux-gnu/libjemalloc.so
    # https://www.speedshop.co/2017/12/04/malloc-doubles-ruby-memory.html
    - name: MALLOC_ARENA_MAX
      value: "2"

This helped to make this usable, but I still have no idea why it is that greedy with the ram

(by the way, forwarders are acting normal)

ofc I had to use manually set elasticsearch version to 7.17

I wasted tons of time on it and I’ll be happy if some hints from these thoughts would be useful

The issue is still here. Please unstall .

It seems related to #3342.

FYI: Recently, fluent-plugin-systemd 1.0.5 was released.

It fixes “Plug a memory leaks on every reload” https://github.com/fluent-plugin-systemd/fluent-plugin-systemd/pull/91

Does it reproducible even though fluent-plugin-systemd 1.0.5?