fluent-bit: File storage corruption "no available chunk" when using rewrite_tag plugin
Bug Report
I notice this issue when using the rewrite_tag plugin and it could be triggered by a slow output forward destination. If there are down chunks in the fs storage, when the destination recovers, the down chunks will slowly drop (as expected). However, when the fs storage size drops to near 0, i start seeing the following error and fluentbit stop tailing files (the total input records = 0)
[2021/04/27 23:42:30] [error] [input chunk] no available chunk
[2021/04/27 23:42:31] [error] [input chunk] no enough space in filesystem to buffer chunk 6-1619566951.203912627.flb in plugin **forward.0**
To Reproduce
- Example log message if applicable:
[2021/05/03 02:11:51] [debug] [storage] tail.main:6-1620007911.678294807.flb adjusting size OK
[2021/05/03 02:11:51] [debug] [storage] tail.main:6-1620007911.678294807.flb mapped OK
[2021/05/03 02:11:51] [debug] [storage] [cio file] synced at: tail.main/6-1620007911.678294807.flb
[2021/05/03 02:11:51] [debug] [input chunk] chunk 6-1620007911.678294807.flb required 39088 bytes and 15630971470 bytes left in plugin forward.0
[2021/05/03 02:11:51] [debug] [storage] tail.main:6-1620007911.678294807.flb mapped OK
[2021/05/03 02:11:51] [debug] [storage] [cio file] alloc_size from 4096 to 69632
[2021/05/03 02:11:51] [debug] [input chunk] remove chunk 6-1620007911.678294807.flb with 202 bytes from plugin forward.0, the updated fs_chunks_size is 369028328 bytes
[2021/05/03 02:11:51] [debug] [storage] [cio file] synced at: tail.main/6-1620007911.678294807.flb
- Steps to reproduce the problem:
- Use the config below and generate some load.
- Starve the destination and let the filesystem storage grow. I let it grow until there are some significant amount of “down” chunks.
- Recover the destination and watch the fs storage size.
- When the fs storage size drops to near 0, you will notice the following errors and fluentbit stop tailing files.
[2021/04/27 23:42:30] [error] [input chunk] no available chunk
[2021/04/27 23:42:31] [error] [input chunk] no enough space in filesystem to buffer chunk 6-1619566951.203912627.flb in plugin **forward.0**
Expected behavior Fluentbit will recover gracefully.
Screenshots
Your Environment
- Version used: 1.7.4
- Configuration:
[SERVICE]
Flush 1
Log_Level info
Parsers_File /fluent-bit/etc/parsers.conf
Parsers_File /forwarder/etc/parsers_custom.conf
Plugins_File /fluent-bit/etc/plugins.conf
HTTP_Server On
storage.path /var/log/flb-storage/
storage.max_chunks_up 128
storage.backlog.mem_limit 256M
storage.metrics on
[INPUT]
Name tail
Tag kubernetes.*
Path /var/log/containers/*.log
Exclude_Path /var/log/containers/*fluent-bit*
Parser cri
DB /var/log/flb-tail.db
DB.sync normal
Refresh_Interval 15
Read_from_Head On
Buffer_Chunk_Size 128K
Buffer_Max_Size 128K
Skip_Long_Lines On
Mem_Buf_Limit 256M
storage.type filesystem
[FILTER]
Name kubernetes
Match kubernetes.var.log.containers.*
Kube_Tag_Prefix kubernetes.var.log.containers.
Annotations Off
K8S-Logging.Exclude On
[FILTER]
Name rewrite_tag
Match kubernetes.var.log.containers.*
Rule $kubernetes['namespace_name'] ^(.*)$ kubernetes.$1 false
Emitter_Name re_emitted.main
Emitter_Storage.type filesystem
Emitter_Mem_Buf_Limit 128M
[OUTPUT]
Name forward
Match kubernetes.*
Host aggregator
Port 24224
Retry_Limit False
Require_ack_response True
storage.total_limit_size 16G
net.keepalive on
net.keepalive_max_recycle 300
- Environment name and version (e.g. Kubernetes? What version?): 1.19.x
- Server type and version:
- Operating System and version:
- Filters and plugins: rewrite_tag
Additional context
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 6
- Comments: 21 (10 by maintainers)
I’m using the environment below, and I get a similar error.
fluent bit 1.8.7
log: [2021/10/12 12:35:18] [debug] [storage] tail.0:1-1634042118.929765611.flb adjusting size OK [2021/10/12 12:35:18] [debug] [storage] tail.0:1-1634042118.929765611.flb mapped OK [2021/10/12 12:35:18] [debug] [input chunk] chunk 1-1634042118.929765611.flb required 1163 bytes and 50171932 bytes left in plugin s3.0 [2021/10/12 12:35:18] [error] [input chunk] no enough space in filesystem to buffer chunk 1-1634042118.929765611.flb in plugin s3.0 [2021/10/12 12:35:18] [debug] [storage] [cio file] synced at: tail.0/1-1634042118.929765611.flb [2021/10/12 12:35:18] [error] [input chunk] no available chunk [2021/10/12 12:35:18] [debug] [input:tail:tail.0] inode=15820935 events: IN_MODIFY [2021/10/12 12:35:18] [debug] [storage] tail.0:1-1634042118.930761050.flb adjusting size OK [2021/10/12 12:35:18] [debug] [storage] tail.0:1-1634042118.930761050.flb mapped OK [2021/10/12 12:35:18] [debug] [input chunk] chunk 1-1634042118.930761050.flb required 1011 bytes and 50171932 bytes left in plugin s3.0 [2021/10/12 12:35:18] [error] [input chunk] no enough space in filesystem to buffer chunk 1-1634042118.930761050.flb in plugin s3.0 [2021/10/12 12:35:18] [debug] [storage] [cio file] synced at: tail.0/1-1634042118.930761050.flb [2021/10/12 12:35:18] [error] [input chunk] no available chunk [2021/10/12 12:35:18] [debug] [input:tail:tail.0] inode=15820935 events: IN_MODIFY [2021/10/12 12:35:18] [debug] [storage] tail.0:1-1634042118.931942445.flb adjusting size OK [2021/10/12 12:35:18] [debug] [storage] tail.0:1-1634042118.931942445.flb mapped OK [2021/10/12 12:35:18] [debug] [input chunk] chunk 1-1634042118.931942445.flb required 1162 bytes and 50171932 bytes left in plugin s3.0 [2021/10/12 12:35:18] [error] [input chunk] no enough space in filesystem to buffer chunk 1-1634042118.931942445.flb in plugin s3.0
@edsiper Sounds like anyone who uses the
rewrite_tagplugin together with disk based buffering could run into this issue.@aapodoll I believe there is a change related to counter calculation of input chunk in fluent-bit v1.8.12 release. Could you try with v1.8.12?
ref: https://fluentbit.io/announcements/v1.8.12/
Thanks @panaji for creating this issue. The root cause of this issue is that there are some chunks that shouldn’t go into the chunk queue of the plugin because the
rewrite_tagplugin has the rule to drop original log and re-emit a new one. However those dropped logs are still counted by the Fluent Bit and decreases the buffer chunk counterfs_chunks_sizewhich results in a negativefs_chunks_size.