fluentd: memory usage seems to be growing indefinitely with simple forward configuration

Describe the bug Memory usage is growing indefinitely with very simple @type forward configuration. I’m shipping about 10-30G of logs daily. Mostly during evening hours. td-agent at the beginning is using about 100M of RES memory but after about 12h it’s 500M and half of this time it’s mostly idled because there are not many logs during the night. Memory is never freed. When I switch to any ‘local’ output like file or stdout memory usage is very stable. I’ve seen the same behavior with elasticsearch output so I guess it can be something connected with network outputs … or just my stupidity 😃 There are no errors in the log file. I’ve tried to set RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR=0.9 but it didn’t fix the problem. Memory usage is still growing. I have two workers in my configuration This problem only occurs on the one with a high amount of logs generated during day hours (worker 0). A worker is reading about 50 files at once. It may be relevant that I have a lot of pattern not matched - I’m in the middle of standardizing log format for all apps.

To Reproduce Run with a high amount of not perfectly formate logs.

Expected behavior Stable memory usage.

Your Environment

Fluentd or td-agent version: td-agent 1.9.2
Operating system: NAME="Amazon Linux" VERSION="2" ID="amzn" ID_LIKE="centos rhel fedora" VERSION_ID="2" PRETTY_NAME="Amazon Linux 2" ANSI_COLOR="0;33" CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2" HOME_URL="https://amazonlinux.com/"
Kernel version: 4.14.171-136.231.amzn2.x86_64

Your Configuration

<system>
  workers 2
  log_level warn
</system>

<worker 0>
  <source>
    @type tail
    path "/somepath/*/*.log"
    read_from_head true
    pos_file /var/log/td-agent/mpservice-es-pos-file
    tag mpservice-raw.*
    enable_stat_watcher false
    <parse>
      @type multiline
      time_key time
      time_format %Y-%m-%d %H:%M:%S,%L
      timeout 0.5
      format_firstline /^\d{4}-\d{2}-\d{2}/
      format1 /^(?<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}) \| (?<level>.*) \| (?<class>.*) \| (?<thread>.*) \| (?<message>.*)/

    </parse>
  </source>

<match *.**>
  @type forward
  <server>
    host ***
    port 24224
  </server>
  <buffer>
    @type memory
    flush_interval 2s
    flush_thread_count 2
  </buffer>
</match>

</worker>

# webapps ES
<worker 1>
  <source>
    @type tail
    path "/somepath/*/*[a-z].log"
    read_from_head true
    pos_file /var/log/td-agent/webapps-es-pos-file
    tag webapps-raw.*
    enable_stat_watcher false
    <parse>
      @type multiline
      format_firstline /^\d{4}-\d{2}-\d{2}/
      format1 /^(?<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}) \| (?<level>.*) \| (?<class>.*) \| (?<thread>.*) \| (?<message>.*)/
    </parse>
  </source>


<match *.**>
  @type forward
  <server>
    host ***
    port 24224
  </server>
  <buffer>
    @type memory
    flush_interval 2s
    flush_thread_count 2
  </buffer>
</match>
</worker>

Your Error Log

no errors in logs

About this issue

Original URL
State: closed
Created 4 years ago
Reactions: 4
Comments: 19 (11 by maintainers)

Most upvoted comments

@4ndr4s Does this happen with elasticsearch and file buffer combo? Your graph shows it is not memory leak. Issue author and you use memory buffer. So if incoming speed is faster than output spped, memory usage is growing.

repeatedly on Apr 13, 2020