fluent-bit: Fluent-bit in K8s stops sending logs to elastic after a few hours

Fluentbit stops sending logs to Elasticsearch after several hours without telling me why. Fluent-bit docker is not crashing. It just stops doing what it shoud do. What could be the reason? I can’t figure out why. It happens in every of the 3 cluster.

Situation: 3x (14 nodes k8s cluster (including 3 masters)) Version fluent-bit: 0.13.3 Kubernetes version: 1.8.4 Node OS: Red Hat 7.4 Maipo

Resource limits k8s: "resources": { "limits": { "cpu": "100m", "memory": "1Gi" }, "requests": { "cpu": "100m", "memory": "512Mi" } }

Configuration:

[SERVICE] Flush 1 Daemon Off Log_Level debug

[INPUT] Buffer_Chunk_Size 1MB Buffer_Max_Size 25MB DB /var/log/containers/fluent-bit.db Exclude_Path kube-system.log Mem_Buf_Limit 25MB Name tail Parser json Path /var/log/containers/.log Refresh_Interval 5 Skip_Long_Lines On Tag kube.services.

[FILTER] K8S-Logging.Parser On K8S-Logging.exclude True Kube_URL https://${KUBERNETES_SERVICE_HOST}:443 Match kube.* Merge_JSON_Log On Name kubernetes tls.verify Off

[PARSER] Decode_Field_As escaped log Format json Name json Time_Format %Y-%m-%dT%H:%M:%S %z Time_Key time

[OUTPUT] Host xxx.xxx.xxx.xxx Include_Tag_Key true Logstash_DateFormat %G.%V Logstash_Format On Logstash_Prefix k8s-services-tst Match kube.services.* Name es Port 9200 Retry_Limit False

About this issue

Original URL
State: closed
Created 6 years ago
Reactions: 1
Comments: 22 (4 by maintainers)

Commits related to this issue

scheduler: fix backoff jitter calculation (#798 #670 #649) The backoff jitter algorithm helps to define a suggested 'sleep' time for retries, the current algorithm is broken where in some cases it ge... — committed to fluent/fluent-bit by edsiper 5 years ago
scheduler: fix backoff jitter calculation (#798 #670 #649) The backoff jitter algorithm helps to define a suggested 'sleep' time for retries, the current algorithm is broken where in some cases it ge... — committed to fluent/fluent-bit by edsiper 5 years ago
filter_kubernetes: add option kube_token_ttl (#649) Signed-off-by: Michael Voelker <novecento@gmx.de> Signed-off-by: Michael Voelker <novecento@gmx.de> — committed to rawahars/fluent-bit by novegit 2 years ago

Most upvoted comments

I got same issue at latest version

ndhcoder on Jan 5, 2022

we have a similar issue where fluent-bit stops tracking a few applications after a few hours. we use fluent-bit in eks, to push logs to stackdriver. on further investigation this is what i found out.

our setup (fluent-bit configs attached):

fluent-bit is set up to tail all files matching /var/log/conainers/*.log. kubernetes takes care of setting up symlinks in /var/log/containers/ and logs in /var/lib/docker/containers/ (through /var/log/pods/)
by default, aws eks sets up docker with this log-opts setting {"max-size":"10m","max-file":"10"} (in /etc/docker/daemon.json)

issues:

gaps in container logs: when logs for a container rotates too fast (either for fluent-bit to keep up or for kubernetes to update symlinks), fluent-bit is still tailing one of the rotated files (*.{1…9}). by the time fluent-bit gets around to use the current log file symlink for that container in /var/log/containers/, some rotated files in between are never tailed (because /var/log/containers only has symlinks pointing to the current log file for each container). i think this can also be confirmed by examining the sqlite db, where it keeps a rotated flag for every log file. in cases where the rotated file is .2 or higher, i think logs in between were missed.
container logs stop after few hours: when i inspected the file handles held by fluent-bit for a container where it stopped tailing after a few hours, i noticed that it was holding on to a file handle for the last rotated file (.9) of that container. i suspect docker’s logging system deleted the .9 file before fluent-bit was able to finish tailing it and somehow fluent-bit is unable to recover from this for that particular container (i don’t know if it’s a corner case because of inode reuse).

# sudo lsof -nP -a -p $(pgrep -f fluent-bi[t]) | grep -F b6a4c1b82014
fluent-bi 16157 root   54r      REG  259,1 10000149  814921332 /var/lib/docker/containers/b6a4c1b82014891feb521f973c2823f3d291595b5a28c04d437093425e61ba7a/b6a4c1b82014891feb521f973c2823f3d291595b5a28c04d437093425e61ba7a-json.log.9 (deleted)

workaround (planned):

change eks docker log-opts to {"max-size":"100m","max-file":"2"} to reduce the log file rotation rate.

fluent-bit-configmap.txt fluent-bit-ds.txt

mgsh on Mar 19, 2019

Fixed in v1.1.1 release:

https://fluentbit.io/announcements/v1.1.1/

edsiper on May 20, 2019