fluent-bit: in_tail plugin randomly fails with "too many open files" and errno=24 - unless switched from inotify to stat tail
Bug Report
Describe the bug in_tail plugin randomly fails with “too many open files” and errno=24 - unless switched from inotify to stat tail
To Reproduce
- Example log message if applicable:
[TIMESTAMP] [ info] [storage] initializing...
[TIMESTAMP] [ info] [storage] in-memory
[TIMESTAMP] [ info] [storage] normal synchronization mode, checksum disabled
[TIMESTAMP] [ info] [engine] started (pid=14)
[TIMESTAMP] [error] [plugins/in_tail/tail_fs.c:168 errno=24] Too many open files
[TIMESTAMP] [error] Failed initialize input tail.2
[TIMESTAMP] [error] [plugins/in_tail/tail_fs.c:168 errno=24] Too many open files
[TIMESTAMP] [error] Failed initialize input tail.3
[TIMESTAMP] [error] [plugins/in_tail/tail_fs.c:168 errno=24] Too many open files
[TIMESTAMP] [error] Failed initialize input tail.4
[TIMESTAMP] [error] [plugins/in_tail/tail_fs.c:168 errno=24] Too many open files
[TIMESTAMP] [error] Failed initialize input tail.5
[TIMESTAMP] [error] [plugins/in_tail/tail_fs.c:168 errno=24] Too many open files
[TIMESTAMP] [error] Failed initialize input tail.6
[TIMESTAMP] [error] [plugins/in_tail/tail_fs.c:168 errno=24] Too many open files
[TIMESTAMP] [error] Failed initialize input tail.7
[TIMESTAMP] [error] [plugins/in_tail/tail_fs.c:168 errno=24] Too many open files
[TIMESTAMP] [error] Failed initialize input tail.8
[TIMESTAMP] [error] [plugins/in_tail/tail_fs.c:168 errno=24] Too many open files
[TIMESTAMP] [error] Failed initialize input tail.9
[TIMESTAMP] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020
- Steps to reproduce the problem: Happens randomly, only some pods in k8s cluster based on the Fluent-Bit docker image are affected. Ulimit on fluent-bit container:
/ # ulimit -S -n
1048576
/ # ulimit -H -n
1048576
and system max:
/ # cat /proc/sys/fs/file-max
13181250
seem fine. But looking at https://github.com/fluent/fluent-bit/blob/v1.3.2/plugins/in_tail/tail_fs.c then at https://github.com/fluent/fluent-bit/blob/master/plugins/in_tail/tail_fs_inotify.c#L171 and https://linux.die.net/man/2/inotify_init1 - as errno=24 in that context meaning EMFILE - suggests rather inotify-related issue. Forcing cmake to use -DFLB_INOTIFY=Off flag and disabling inotify and using stat tail instead is a working workaround for now. It would beneficial to also have “inotify-disabled flavour” of Docker images of Fluent-Bit available, too.
Expected behavior No errors when using in_tail plugin.
Screenshots n/a
Your Environment
- Version used: 1.0.4-1.3.3
- Configuration: Docker image, both normal and -debug version. Appears both when tailing exact file path and also when using regexp-based path (files inside a certain directory).
Additional context Logs are not tailed, hence no log even processed by Fluent-Bit if that happens at given time.
About this issue
- Original URL
- State: open
- Created 5 years ago
- Reactions: 10
- Comments: 33 (3 by maintainers)
Commits related to this issue
- adding inotify-disabled flavour for Docker images (#1777) — committed to krystiannowak/fluent-bit by krystiannowak 5 years ago
Great hint! In my case /proc/sys/fs/inotify/max_user_instances was only 128.
The problem seems to be solved with setting the sysctl fs.inotify.max_user_instances to 1500.
When you expect this to be released ? I see the workaround PR is closed(Dockerfile one) , but everyone who try to run 1.4 will lose his time looking around this issue.
I am facing this issue with and v1.5.x, and the latest documentation (presumably for 1.6.x) does not mention an option to use stat instead of inotify for in_tail, so I would like to bump this issue.
@aderuwe It might be more related to inotify mechanism - as by default (e.g. when using the default Docker image) this mechanism is used for tailing files in in_tail plugin: https://linux.die.net/man/2/inotify_init1
Instead of looking at ulimit and available files, you might also check more specific inotify related settings such as:
maybe it works for you.
In my case in the end I have rebuilt the Docker image with -DFLB_INOTIFY=Off option off, so that instead of using more performant inofify mechanism, the plugin rather uses the more old-school stat mechanism for tailing files - and it works for me for now as a workaround - see https://github.com/fluent/fluent-bit/pull/1778 - although it might have problems when using with symlinks probably.
The final solution is planned by @edsiper in this ticket to have a configuration option available in the upcoming https://github.com/fluent/fluent-bit/milestone/7 release.
Is there any ETA for this fix? It seems it is still not in 1.4.5 …
I’m facing this issue on k3d k3s cluster with rancher/mirrored-fluent-fluent-bit:1.8.8 .
https://stackoverflow.com/questions/57220658/change-inotify-max-user-instances-limit-in-docker-container helped as well
Then launch the docker image
On the input side you can do this :
On the output side I’m, not sure because it’s more dependent on what you want to achieve and even if you were able to get the filename from the record that
SOMETHING.DEST
part doesn’t seem like something that could be easily done (maybe with lua) but the input side improvement would probably be enough to get you out of trouble.I think that’s enough to get started, I’ll take a look at it, thanks a lot.
I am facing this issue on Rasberry Pi 3A+ on td-agent-bit version: *** 1.8.6 from https://packages.fluentbit.io/raspbian/buster buster/main armhf Packages
restarting the service seems to have resolved the issue (temporarily?)
please advise if any recent releases have resolved this
We deployed the fluent-bit as daemonsets in our kubernetes and faced the problem after two days running. Not only fluent-bit reported: “too many open files”, but all pods on that node reported the same error and even system processes failed outside of the containers. After killing fluent-bit on node (nodeSelector to exclude it from running on selected node), node came back to normal state and error disappeared. So far we are afraid to run it, because of that error it can stop the whole node.
More details added: we have file limits for 5242880 and even this value was exhausted over two days.
@edsiper If that is possible to have it just in Fluent-Bit config file that would be even better!