fluent-bit: File storage corruption "no available chunk" when using rewrite_tag plugin

Bug Report

I notice this issue when using the rewrite_tag plugin and it could be triggered by a slow output forward destination. If there are down chunks in the fs storage, when the destination recovers, the down chunks will slowly drop (as expected). However, when the fs storage size drops to near 0, i start seeing the following error and fluentbit stop tailing files (the total input records = 0)

[2021/04/27 23:42:30] [error] [input chunk] no available chunk
[2021/04/27 23:42:31] [error] [input chunk] no enough space in filesystem to buffer chunk 6-1619566951.203912627.flb in plugin **forward.0**

To Reproduce

  • Example log message if applicable:
[2021/05/03 02:11:51] [debug] [storage] tail.main:6-1620007911.678294807.flb adjusting size OK
[2021/05/03 02:11:51] [debug] [storage] tail.main:6-1620007911.678294807.flb mapped OK
[2021/05/03 02:11:51] [debug] [storage] [cio file] synced at: tail.main/6-1620007911.678294807.flb
[2021/05/03 02:11:51] [debug] [input chunk] chunk 6-1620007911.678294807.flb required 39088 bytes and 15630971470 bytes left in plugin forward.0
[2021/05/03 02:11:51] [debug] [storage] tail.main:6-1620007911.678294807.flb mapped OK
[2021/05/03 02:11:51] [debug] [storage] [cio file] alloc_size from 4096 to 69632
[2021/05/03 02:11:51] [debug] [input chunk] remove chunk 6-1620007911.678294807.flb with 202 bytes from plugin forward.0, the updated fs_chunks_size is 369028328 bytes
[2021/05/03 02:11:51] [debug] [storage] [cio file] synced at: tail.main/6-1620007911.678294807.flb
  • Steps to reproduce the problem:
  1. Use the config below and generate some load.
  2. Starve the destination and let the filesystem storage grow. I let it grow until there are some significant amount of “down” chunks.
  3. Recover the destination and watch the fs storage size.
  4. When the fs storage size drops to near 0, you will notice the following errors and fluentbit stop tailing files.
[2021/04/27 23:42:30] [error] [input chunk] no available chunk
[2021/04/27 23:42:31] [error] [input chunk] no enough space in filesystem to buffer chunk 6-1619566951.203912627.flb in plugin **forward.0**

Expected behavior Fluentbit will recover gracefully.

Screenshots

Your Environment

  • Version used: 1.7.4
  • Configuration:
[SERVICE]
    Flush                     1
    Log_Level                 info
    Parsers_File              /fluent-bit/etc/parsers.conf
    Parsers_File              /forwarder/etc/parsers_custom.conf
    Plugins_File              /fluent-bit/etc/plugins.conf
    HTTP_Server               On
    storage.path              /var/log/flb-storage/
    storage.max_chunks_up     128
    storage.backlog.mem_limit 256M
    storage.metrics           on
[INPUT]
    Name              tail
    Tag               kubernetes.*
    Path              /var/log/containers/*.log
    Exclude_Path      /var/log/containers/*fluent-bit*
    Parser            cri
    DB                /var/log/flb-tail.db
    DB.sync           normal
    Refresh_Interval  15
    Read_from_Head    On
    Buffer_Chunk_Size 128K
    Buffer_Max_Size   128K
    Skip_Long_Lines   On
    Mem_Buf_Limit     256M
    storage.type      filesystem
[FILTER]
    Name                kubernetes
    Match               kubernetes.var.log.containers.*
    Kube_Tag_Prefix     kubernetes.var.log.containers.
    Annotations         Off
    K8S-Logging.Exclude On
[FILTER]
    Name                  rewrite_tag
    Match                 kubernetes.var.log.containers.*
    Rule                  $kubernetes['namespace_name'] ^(.*)$ kubernetes.$1 false
    Emitter_Name          re_emitted.main
    Emitter_Storage.type  filesystem
    Emitter_Mem_Buf_Limit 128M
[OUTPUT]
    Name                       forward
    Match                      kubernetes.*
    Host                       aggregator
    Port                       24224
    Retry_Limit                False
    Require_ack_response       True
    storage.total_limit_size   16G
    net.keepalive              on
    net.keepalive_max_recycle  300
  • Environment name and version (e.g. Kubernetes? What version?): 1.19.x
  • Server type and version:
  • Operating System and version:
  • Filters and plugins: rewrite_tag

Additional context

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 6
  • Comments: 21 (10 by maintainers)

Most upvoted comments

I’m using the environment below, and I get a similar error.

  • fluent bit 1.8.7

  • [FILTER]
     Name                rewrite_tag
     Match               kube.var.log.containers.*
     # the `false` at the end of rule drops the original `kube.*` event.
     Rule                $kubernetes['namespace_name'] ^(.*)$ nm.$1 false
    
  • log: [2021/10/12 12:35:18] [debug] [storage] tail.0:1-1634042118.929765611.flb adjusting size OK [2021/10/12 12:35:18] [debug] [storage] tail.0:1-1634042118.929765611.flb mapped OK [2021/10/12 12:35:18] [debug] [input chunk] chunk 1-1634042118.929765611.flb required 1163 bytes and 50171932 bytes left in plugin s3.0 [2021/10/12 12:35:18] [error] [input chunk] no enough space in filesystem to buffer chunk 1-1634042118.929765611.flb in plugin s3.0 [2021/10/12 12:35:18] [debug] [storage] [cio file] synced at: tail.0/1-1634042118.929765611.flb [2021/10/12 12:35:18] [error] [input chunk] no available chunk [2021/10/12 12:35:18] [debug] [input:tail:tail.0] inode=15820935 events: IN_MODIFY [2021/10/12 12:35:18] [debug] [storage] tail.0:1-1634042118.930761050.flb adjusting size OK [2021/10/12 12:35:18] [debug] [storage] tail.0:1-1634042118.930761050.flb mapped OK [2021/10/12 12:35:18] [debug] [input chunk] chunk 1-1634042118.930761050.flb required 1011 bytes and 50171932 bytes left in plugin s3.0 [2021/10/12 12:35:18] [error] [input chunk] no enough space in filesystem to buffer chunk 1-1634042118.930761050.flb in plugin s3.0 [2021/10/12 12:35:18] [debug] [storage] [cio file] synced at: tail.0/1-1634042118.930761050.flb [2021/10/12 12:35:18] [error] [input chunk] no available chunk [2021/10/12 12:35:18] [debug] [input:tail:tail.0] inode=15820935 events: IN_MODIFY [2021/10/12 12:35:18] [debug] [storage] tail.0:1-1634042118.931942445.flb adjusting size OK [2021/10/12 12:35:18] [debug] [storage] tail.0:1-1634042118.931942445.flb mapped OK [2021/10/12 12:35:18] [debug] [input chunk] chunk 1-1634042118.931942445.flb required 1162 bytes and 50171932 bytes left in plugin s3.0 [2021/10/12 12:35:18] [error] [input chunk] no enough space in filesystem to buffer chunk 1-1634042118.931942445.flb in plugin s3.0

@edsiper Sounds like anyone who uses the rewrite_tag plugin together with disk based buffering could run into this issue.

@aapodoll I believe there is a change related to counter calculation of input chunk in fluent-bit v1.8.12 release. Could you try with v1.8.12?

ref: https://fluentbit.io/announcements/v1.8.12/

Thanks @panaji for creating this issue. The root cause of this issue is that there are some chunks that shouldn’t go into the chunk queue of the plugin because the rewrite_tag plugin has the rule to drop original log and re-emit a new one. However those dropped logs are still counted by the Fluent Bit and decreases the buffer chunk counter fs_chunks_size which results in a negative fs_chunks_size.