mtail: fsnotify queue overflow
OS version
Linux 4.15.0-21 #22~16.04.1+2
Distributor ID: Ubuntu
Description: Ubuntu 16.04.3 LTS
Release: 16.04
Codename: xenial
mtail version
mtail version v3.0.0-rc16
git revision 77eeba903db0ffd90e29c1ee778fdefb0112bfab
go version go1.11
what happened
run some times then get a ERROR log as:
Log file created at: 2018/11/23 13:44:28
Running on machine: mesos-lb-sg-test-1
Binary: Built with gc go1.11 for linux/amd64
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
E1123 13:44:28.621495 22659 log_watcher.go:85] fsnotify error: fsnotify queue overflow
E1123 14:08:29.405245 22659 log_watcher.go:85] fsnotify error: fsnotify queue overflow
E1123 14:16:56.198800 22659 log_watcher.go:85] fsnotify error: fsnotify queue overflow
and at then counters are not anymore ++
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 16 (8 by maintainers)
Commits related to this issue
- Merge branch 'fallback-to-poll' Handle a failure to start an fsnotify watcher by falling back to polling mode correctly. mtail still relies on fsnotify for live-reload and new log file addition at t... — committed to google/mtail by jaqx0r 6 years ago
- Let the user disable fsnotify. When the update rate of log files is sufficiently high, the fsnotify event queue can overrun because mtail is not processing the queue fast enough. (It's probable that... — committed to google/mtail by jaqx0r 6 years ago
I am also experiencing the same issue (logfile full of fsnotiy queue overflow events), with similar or bigger logfiles / streams. In my case, rsyslogd prints a udp stream into files which are constantly watched by mtail. I wasn’t able to find weird file descriptors as pointed above, though.
Tuning the inotify queue kernel parameters related didn’t seem to work for me either, so far (i.e. the problem is only partially reduced by this; something, somewhere, must be generating tons of garbage overtime, so enlarging the queue only delays the issue)
In my case, I realized this due to sudden inactivity of the exporter endpoint (so prometheus marks mtail’s job as down). Only a restart makes the problem go away (just for some time).