fluent-bit: in_tail: plugin does not pickup rotated logs under heavy load
Bug Report
Describe the bug tail_fs_event receives IN_Q_OVERFLOW inotify events from time to time, thus missing IN_MOVE_SELF events.
To Reproduce tail a lot of files by pattern with heavy writing to them. The setup I have reads around 30 nginx access log files by pattern.
example cfg:
[SERVICE]
Flush 1
Daemon off
Log_Level error
Parsers_File vertis_parsers.conf
HTTP_Server On
HTTP_Listen 127.0.0.1
HTTP_Port 2020
[INPUT]
Name tail
Tag tail_nginx
Path /var/log/nginx/*/access.log
DB /var/lib/fluent-bit/nginx.sqlite
Parser nginx_vertis
Buffer_Chunk_Size 256kb
Buffer_Max_Size 256kb
Mem_Buf_Limit 64mb
[FILTER]
Name record_modifier
Match *
Record _level INFO
Record _service nginx
Record _container_name ${HOSTNAME}
[OUTPUT]
Name http
Match *
Port 10223
URI /fluent-bit
Format json
Retry_Limit False
Example fluent-bit log with traces (had to add some extra output to tackle the problem)
[2019/02/10 08:55:30] [trace] inotify_mask: 2
[2019/02/10 08:55:30] [debug] [in_tail] file=/var/log/nginx/d1/access.log event
[2019/02/10 08:55:30] [debug] [in_tail] file=/var/log/nginx/d1/access.log read=9428 lines=8
[2019/02/10 08:55:30] [trace] inotify_mask: 2
[2019/02/10 08:55:30] [debug] [in_tail] file=/var/log/nginx/d2/access.log event
[2019/02/10 08:55:30] [trace] inotify_mask: 2
[2019/02/10 08:55:30] [debug] [in_tail] file=/var/log/nginx/d4/access.log event
[2019/02/10 08:55:30] [debug] [in_tail] file=/var/log/nginx/d4/access.log read=1449 lines=2
[2019/02/10 08:55:30] [trace] inotify_mask: 2
[2019/02/10 08:55:30] [debug] [in_tail] file=/var/log/nginx/d3/access.log event
[2019/02/10 08:55:30] [debug] [in_tail] file=/var/log/nginx/d3/access.log read=2277 lines=8
[2019/02/10 08:55:30] [trace] inotify_mask: 2
[2019/02/10 08:55:30] [debug] [in_tail] file=/var/log/nginx/d1/access.log event
[2019/02/10 08:55:30] [debug] [in_tail] file=/var/log/nginx/d1/access.log read=1037 lines=1
[2019/02/10 08:55:30] [trace] inotify_mask: 2
[2019/02/10 08:55:30] [debug] [in_tail] file=/var/log/nginx/d3/access.log event
[2019/02/10 08:55:30] [debug] [in_tail] file=/var/log/nginx/d3/access.log read=761 lines=3
[2019/02/10 08:55:30] [trace] inotify_mask: 2
[2019/02/10 08:55:30] [debug] [in_tail] file=/var/log/nginx/d4/access.log event
[2019/02/10 08:55:30] [debug] [in_tail] file=/var/log/nginx/d4/access.log read=9603 lines=12
[2019/02/10 08:55:30] [trace] inotify_mask: 4000
[2019/02/10 08:55:30] [trace] inotify_mask: 2
[2019/02/10 08:55:30] [debug] [in_tail] file=/var/log/nginx/d3/access.log event
[2019/02/10 08:55:30] [debug] [in_tail] file=/var/log/nginx/d3/access.log read=894 lines=3
[2019/02/10 08:55:30] [trace] inotify_mask: 2
[2019/02/10 08:55:30] [debug] [in_tail] file=/var/log/nginx/d2/access.log event
[2019/02/10 08:55:30] [debug] [in_tail] file=/var/log/nginx/d2/access.log read=438 lines=1
[2019/02/10 08:55:30] [trace] inotify_mask: 2
sysctl params related to inotify:
fs.inotify.max_queued_events = 16384
fs.inotify.max_user_instances = 128
fs.inotify.max_user_watches = 524288
Expected behavior reload files after log-rotate
Your Environment
- Version used: 1.0.3
- Operating System and version: ubuntu 16.04
- Filters and plugins: in_tail
Additional context We try to collect access logs on several nginx machines. Every morning after log-rotate tail plugin fails to pick up most of rename events (more often all of them). Basically, looks like it happens because files are tailed synchronously with receiving inotify events (reading, filling up buffer, processing content) while other events are piling up in the queue.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 12
- Comments: 16 (7 by maintainers)
We had the same issue running fluent-bit on EKS. Logs came at a rate of 50 k req/s. After updating fluent-bit to v1.6.8, bumping up the resources, and using a combination of memory and filesystem for buffering messages, we were able to avoid the log loss. We also modified the log rotation on EKS to rotate files only after reaching 10 GB. Here is the fluent-bit config we have:
@l2dy, no it did not because there’s inotify version of this code which is running in fluent-bit https://github.com/fluent/fluent-bit/blob/f6bb10a07a630243b12f3d5db16e996f30b10786/plugins/in_tail/tail_fs_inotify.c#L159-L160