fluentd: Slow memory leak of Fluentd v0.14 compared to v0.12

Fluentd version: 0.14.25 Environment: running inside a debian:stretch-20180312 based container. Dockerfile: here

We noticed a slow memory leak that built up over a month or so. screen shot 2018-04-12 at 5 54 51 pm

The same setup that ran with Fluentd 0.12.41 have stable memory usage over the same period of time.

Still investigating and trying to narrow down versions. But wanna create a ticket to track this.

Config:

    <source>
      @type tail
      path /var/log/containers/*.log
      pos_file /var/log/k8s-gcp-containers.log.pos
      tag reform.*
      read_from_head true
      format multi_format
      <pattern>
        format json
        time_key time
        time_format %Y-%m-%dT%H:%M:%S.%NZ
      </pattern>
      <pattern>
        format /^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$/
        time_format %Y-%m-%dT%H:%M:%S.%N%:z
      </pattern>
    </source>

    <filter reform.**>
      @type parser
      format /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<log>.*)/
      reserve_data true
      suppress_parse_error_log true
      emit_invalid_record_to_error false
      key_name log
    </filter>

    <match reform.**>
      @type record_reformer
      enable_ruby true
      <record>
        # Extract local_resource_id from tag for 'k8s_container' monitored
        # resource. The format is:
        # 'k8s_container.<namespace_name>.<pod_name>.<container_name>'.
        "logging.googleapis.com/local_resource_id" ${"k8s_container.#{tag_suffix[4].rpartition('.')[0].split('_')[1]}.#{tag_suffix[4].rpartition('.')[0].split('_')[0]}.#{tag_suffix[4].rpartition('.')[0].split('_')[2].rpartition('-')[0]}"}
        # Rename the field 'log' to a more generic field 'message'. This way the
        # fluent-plugin-google-cloud knows to flatten the field as textPayload
        # instead of jsonPayload after extracting 'time', 'severity' and
        # 'stream' from the record.
        message ${record['log']}
      </record>
      tag ${if record['stream'] == 'stderr' then 'stderr' else 'stdout' end}
      remove_keys stream,log
    </match>

    <match fluent.**>
      @type null
    </match>

    # This section is exclusive for k8s_container logs. These logs come with
    # 'stderr'/'stdout' tags.
    # We use a separate output stanza for 'k8s_node' logs with a smaller buffer
    # because node logs are less important than user's container logs.
    <match {stderr,stdout}>
      @type google_cloud

      # Try to detect JSON formatted log entries.
      detect_json true
      # Collect metrics in Prometheus registry about plugin activity.
      enable_monitoring true
      monitoring_type prometheus
      # Allow log entries from multiple containers to be sent in the same request.
      split_logs_by_tag false
      # Set the buffer type to file to improve the reliability and reduce the memory consumption
      buffer_type file
      buffer_path /var/log/k8s-fluentd-buffers/kubernetes.containers.buffer
      # Set queue_full action to block because we want to pause gracefully
      # in case of the off-the-limits load instead of throwing an exception
      buffer_queue_full_action block
      # Set the chunk limit conservatively to avoid exceeding the recommended
      # chunk size of 5MB per write request.
      buffer_chunk_limit 1M
      # Cap the combined memory usage of this buffer and the one below to
      # 1MiB/chunk * (6 + 2) chunks = 8 MiB
      buffer_queue_limit 6
      # Never wait more than 5 seconds before flushing logs in the non-error case.
      flush_interval 5s
      # Never wait longer than 30 seconds between retries.
      max_retry_wait 30
      # Disable the limit on the number of retries (retry forever).
      disable_retry_limit
      # Use multiple threads for processing.
      num_threads 2
      use_grpc false
      # Use Metadata Agent to get monitored resource.
      enable_metadata_agent true
    </match>

About this issue

Original URL
State: closed
Created 6 years ago
Comments: 48 (15 by maintainers)

Commits related to this issue

in_tail: Fix rotation related resource leak. fix #1941 Signed-off-by: Masahiro Nakagawa <repeatedly@gmail.com> — committed to fluent/fluentd by repeatedly 6 years ago
Merge pull request #2105 from fluent/fix-in_tail-resource-leak in_tail: Fix rotation related resource leak. fix #1941 — committed to fluent/fluentd by repeatedly 6 years ago

Most upvoted comments

Released v1.2.5. Thanks for the testing.

repeatedly on Aug 23, 2018

I released v1.2.5.rc1 for testing. You can install this version with --pre option in gem install

repeatedly on Aug 20, 2018

Just a thought, would log rotation contribute to the issue? As I thought about the difference between the two setups (k8s v.s. no k8s), this is the first thing that crossed my mind.

Current GKE log rotation happens when log file exceeds 10MB. At the load of 100kb/s, the log file is rotated every (10 * 1024 / 100 = 102) seconds.

qingling128 on Aug 13, 2018

I am experiencing the same problem. memory usage keeps growing up.

Environment: amazon linux 2 Fluentd version: starting fluentd-1.2.2 pid=1 ruby=“2.4.4”

xenera-zhangtao on Jul 26, 2018