fluent-bit: Fluent-Bit fails to (re)start if too many buffers have been accumulated on the filesystem storage

Bug Report

Describe the bug When using the filesystem storage and have Fluent-Bit accumulate a lot of files (ie.: output down) until thousands of files have been created, Fluent-Bit will be unable to restart/start correctly and will constantly loop with errors. This can increase the issue because a feedback loop of writing to the journal and reading from it will make both td-agent-bit and the journal services use a lot of cpu usage.

This seems related to the number of handles Fluent-Bit has opened:

lsof -p `pidof td-agent-bit` | wc -l
2041

Been looking at the code and seems like the function cb_queue_chunks in plugins/in_storage_backlog/sb.c is trying to load all of the chunks that are on the disk into memory. Some of the issues I found when Fluent-Bit tries to load buffers from the filesystem:

  • “ctx->mem_limit” is set to FLB_STORAGE_BL_MEM_LIMIT if “storage.backlog.mem_limit” is not set and becomes 100MB by default when the documentation (https://docs.fluentbit.io/manual/administration/buffering-and-storage) indicates 5MB.
  • The inner loop in the function cb_queue_chunks never stops if the “total” goes over the ctx->mem_limit threshold.
  • When modifying the loop to break when “total >= ctx->mem_limit” is reached, the function cb_queue_chunks will be called every second, but “flb_input_chunk_total_size(in)” will always return “0” even if the previously loaded chunks have never been processed. Loading another batch of 100MB of buffers each time for example. This could lead to a situation where either the ram will be completely used or the limit of handles it can use will be reached if too many buffers have been accumulated.
  • Calling “cio_chunk_down(chunk_instance->chunk);” before “sb_remove_chunk_from_segregated_backlogs(chunk_instance->chunk, ctx);” (Code) to close the handles make it possible to go through all the buffers, but I noticed a lot of timers have been created by “_mk_event_timeout_create” until this function also gave errors because it created too many. Fluent-Bit did manage to eventually recover with a lot of errors, but I did not inspect that part too much to understand what it does.

To Reproduce Using a configuration that writes buffers to the filesystem, force Fluent-Bit to accumulate a lot of buffers by having the output down or pointing to an invalid address.

Example of bash script used to write to the journal, called multiple times as a background process with a different parameter to simulate logs from different applications which the INPUT will split into different buffers, easily increasing the number of buffers created:

#!/bin/bash
while true
do
    echo "$1 - Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua." | systemd-cat -t $1 -p warning
done

Once a lot of buffers have been accumulated (ie.: ~2000 in my case), restart rd-agent-bit service and during the start up, it will register/queue all the buffers until it fails to load any more buffer and keep on looping indefinitely.

[...]
[error] [storage] cannot open/create /var/log/fluent/buf//systemd.0/1131418-1647664248.725660271.flb
[error] [storage] [cio file] cannot open chunk: systemd.0/1131418-1647664248.725660271.flb
[error] [storage] cannot open/create /var/log/fluent/buf//systemd.0/1131418-1647664248.725660271.flb
[error] [storage] [cio file] cannot open chunk: systemd.0/1131418-1647664248.725660271.flb
[lib/chunkio/src/cio_file.c:432 errno=24] Too many open files
[lib/chunkio/src/cio_file.c:432 errno=24] Too many open files
[error] [storage] cannot open/create /var/log/fluent/buf//systemd.0/1131418-1647664248.725660271.flb
[error] [storage] [cio file] cannot open chunk: systemd.0/1131418-1647664248.725660271.flb
[error] [storage] cannot open/create /var/log/fluent/buf//systemd.0/1131418-1647664248.725660271.flb
[...]

Expected behavior Fluent-Bit should only load as many buffers the storage.max_chunks_up or memory constraints have been configured, then load more when it is possible to do so.

Your Environment

  • Version used: td-agent-bit 1.8.12
  • Configuration:
[SERVICE]
    # Flush records to destinations every 5s
    Flush        5

    # Run in foreground mode
    Daemon       Off

    # Use 'info' verbosity for Fluent Bit logs
    Log_Level    info

    # Standard parsers & plugins
    Parsers_File parsers.conf
    Plugins_File plugins.conf

    # Enable built-in HTTP server for metrics
    # Prometheus metrics: <host>:24231/api/v1/metrics/prometheus
    HTTP_Server  On
    HTTP_Listen  192.168.128.2
    HTTP_Port    24231

    # Persistent storage path for buffering
    storage.path /var/log/fluent/buf/
    storage.max_chunks_up 128

[INPUT]
    Name systemd
    Tag  system.journal.*
    Path /var/log/journal
    DB   /var/log/fluent/journald-cursor.db
    storage.type  filesystem
    mem_buf_limit 64M

# BEGIN Elasticsearch output
[OUTPUT] # elasticsearch destination 1
    Name  es
    Match *
    Retry_Limit False

    Host 192.168.128.200
    Port 9200
    Index filebeat-7.2.0
    tls Off
    tls.verify On
    # elasticsearch output HTTP_User placeholder
    # elasticsearch output HTTP_Passwd placeholder

    Type  _doc
    Generate_ID true

# END Elasticsearch output
  • Environment name and version (e.g. Kubernetes? What version?): td-agent-bit service 1.8.12
  • Operating System and version: AlmaLinux 8

Additional context This could happen in a situation where we update/restart systems or if a crash occurred when a lot of buffers have been accumulated.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 22 (10 by maintainers)

Most upvoted comments

Hi @lecaros, This was tested with both 1.8 and 1.9.

The issue is not just how many chunks but also the size of these chunks that were already present upon start of Fluent-Bit when using filesystem storage.

What I saw when debugging with gdb is that the function loading these chunks from the disk were checking if it was still under a memory limit, but the calculation of memory used so far was always returning zero. So it was loading all chunks into memory and this becomes an issue when you have hundreds of megabytes to gigabytes of accumulated buffer.