fluent-bit: Fluent-bit input plugin tail doesn't process all logs: scan_blog add(): dismissed:
Bug Report
Describe the bug
Fluent Bit is not processing all logs located in /var/log/containers/.
To Reproduce The following messages are displayed:
[2021/10/01 14:40:05] [debug] [input:tail:tail.0] scanning path /var/log/containers/*.log │
│ [2021/10/01 14:40:05] [debug] [input:tail:tail.0] scan_blog add(): dismissed: /var/log/containers/activator-85cd6f6f9-nrncf_knative-serving_activator-3b631f27f6667599ae940f94afe6a65a4d1d488e7979fced513fa910082a5ae1.log, inode 404768 │
│ [2021/10/01 14:40:05] [debug] [input:tail:tail.0] scan_blog add(): dismissed: /var/log/containers/activator-85cd6f6f9-nrncf_knative-serving_activator-ca32320178170fe4198ce1b0bd57d8ea031c7c886a7b0e3d66bb1b78b67613b8.log, inode 921337 │
│ [2021/10/01 14:40:05] [debug] [input:tail:tail.0] scan_blog add(): dismissed: /var/log/containers/antrea-agent-gql5r_kube-system_antrea-agent-63659cdc8e5ddba3eaf729be280661b45fd198e6d2c7195965be85cdca81f41a.log, inode 536837 │
│ [2021/10/01 14:40:05] [debug] [input:tail:tail.0] scan_blog add(): dismissed: /var/log/containers/antrea-agent-gql5r_kube-system_antrea-agent-8726abf73577f597e15716176cfcdce442b159d00ec12f59e439719d824a9585.log, inode 1190181 │
│ [2021/10/01 14:40:05] [debug] [input:tail:tail.0] scan_blog add(): dismissed: /var/log/containers/antrea-agent-gql5r_kube-system_antrea-ovs-08045b767f2f8ee421b3b4d8d5b646b49b4e12199ae957cad178dd3d11670ec6.log, inode 663855
- Steps to reproduce the problem: Configuration details:
ServiceAccount:
rules:
- apiGroups:
- ""
resources:
- namespaces
- pods
verbs:
- get
- list
- watch
ConfigMap:
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-config
namespace: vmware-system
labels:
k8s-app: fluent-bit
apiVersion: v1
data:
filter-kubernetes.conf: |
[FILTER]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc:443
Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token
Kube_Tag_Prefix kube.var.log.containers.
Merge_Log On
Merge_Log_Key log_processed
K8S-Logging.Parser On
K8S-Logging.Exclude Off
[FILTER]
Name modify
Match kube.*
Copy kubernetes k8s
[FILTER]
Name nest
Match kube.*
Operation lift
Nested_Under kubernetes
filter-record.conf: |
[FILTER]
Name record_modifier
Match *
Record tkg_cluster veba-demo.jarvis.tanzu
Record tkg_instance veba-demo.jarvis.tanzu
[FILTER]
Name nest
Match kube.*
Operation nest
Wildcard tkg_instance*
Nest_Under tkg
[FILTER]
Name nest
Match kube_systemd.*
Operation nest
Wildcard SYSTEMD*
Nest_Under systemd
fluent-bit.conf: |
[SERVICE]
Flush 1
Log_Level debug
Daemon off
Parsers_File parsers.conf
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port 2020
@INCLUDE input-kubernetes.conf
@INCLUDE input-systemd.conf
@INCLUDE input-kube-apiserver.conf
@INCLUDE input-auditd.conf
@INCLUDE filter-kubernetes.conf
@INCLUDE filter-record.conf
@INCLUDE output-syslog.conf
input-auditd.conf: |
[INPUT]
Name tail
Tag audit.*
Path /var/log/audit/audit.log
Parser logfmt
DB /var/log/flb_system_audit.db
Mem_Buf_Limit 50MB
Refresh_Interval 10
Skip_Long_Lines On
input-kube-apiserver.conf: |
[INPUT]
Name tail
Tag apiserver_audit.*
Path /var/log/kubernetes/audit.log
Parser json
DB /var/log/flb_kube_audit.db
Mem_Buf_Limit 50MB
Refresh_Interval 10
Skip_Long_Lines On
input-kubernetes.conf: |
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/*.log
Parser docker
DB /var/log/flb_kube.db
Mem_Buf_Limit 50MB
Skip_Long_Lines On
Refresh_Interval 10
input-systemd.conf: |
[INPUT]
Name systemd
Tag kube_systemd.*
Path /var/log/journal
DB /var/log/flb_kube_systemd.db
Systemd_Filter _SYSTEMD_UNIT=kubelet.service
Systemd_Filter _SYSTEMD_UNIT=containerd.service
Read_From_Tail On
Strip_Underscores On
output-syslog.conf: |
[OUTPUT]
Name syslog
Match kube.*
Host 10.197.79.57
Port 514
Mode tcp
Syslog_Format rfc5424
Syslog_Hostname_key tkg_cluster
Syslog_Appname_key pod_name
Syslog_Procid_key container_name
Syslog_Message_key message
Syslog_SD_key k8s
Syslog_SD_key labels
Syslog_SD_key annotations
Syslog_SD_key tkg
[OUTPUT]
Name syslog
Match kube_systemd.*
Host 10.197.79.57
Port 514
Mode tcp
Syslog_Format rfc5424
Syslog_Hostname_key tkg_cluster
Syslog_Appname_key tkg_instance
Syslog_Message_key MESSAGE
Syslog_SD_key systemd
parsers.conf: |
[PARSER]
Name apache
Format regex
Regex ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
[PARSER]
Name apache2
Format regex
Regex ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
[PARSER]
Name apache_error
Format regex
Regex ^\[[^ ]* (?<time>[^\]]*)\] \[(?<level>[^\]]*)\](?: \[pid (?<pid>[^\]]*)\])?( \[client (?<client>[^\]]*)\])? (?<message>.*)$
[PARSER]
Name nginx
Format regex
Regex ^(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
[PARSER]
Name json
Format json
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep On
[PARSER]
Name docker-daemon
Format regex
Regex time="(?<time>[^ ]*)" level=(?<level>[^ ]*) msg="(?<msg>[^ ].*)"
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep On
[PARSER]
# http://rubular.com/r/tjUt3Awgg4
Name cri
Format regex
Regex ^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<logtag>[^ ]*) (?<message>.*)$
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L%z
[PARSER]
Name logfmt
Format logfmt
[PARSER]
Name syslog-rfc5424
Format regex
Regex ^\<(?<pri>[0-9]{1,5})\>1 (?<time>[^ ]+) (?<host>[^ ]+) (?<ident>[^ ]+) (?<pid>[-0-9]+) (?<msgid>[^ ]+) (?<extradata>(\[(.*)\]|-)) (?<message>.+)$
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep On
[PARSER]
Name syslog-rfc3164-local
Format regex
Regex ^\<(?<pri>[0-9]+)\>(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$
Time_Key time
Time_Format %b %d %H:%M:%S
Time_Keep On
[PARSER]
Name syslog-rfc3164
Format regex
Regex /^\<(?<pri>[0-9]+)\>(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$/
Time_Key time
Time_Format %b %d %H:%M:%S
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep On
[PARSER]
Name kube-custom
Format regex
Regex (?<tag>[^.]+)?\.?(?<pod_name>[a-z0-9](?:[-a-z0-9]*[a-z0-9])?(?:\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace_name>[^_]+)_(?<container_name>.+)-(?<docker_id>[a-z0-9]{64})\.log$
DaemonSet:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluent-bit
namespace: vmware-system
labels:
k8s-app: fluent-bit
spec:
revisionHistoryLimit: 10
selector:
matchLabels:
k8s-app: fluent-bit
template:
metadata:
labels:
k8s-app: fluent-bit
spec:
containers:
- image: projects.registry.vmware.com/tkg/fluent-bit:v1.6.9_vmware.1
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /
port: 2020
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
name: fluent-bit
ports:
- containerPort: 2020
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /api/v1/metrics/prometheus
port: 2020
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources:
limits:
cpu: 80m
memory: 200Mi
requests:
cpu: 50m
memory: 100Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/log
name: var-log
- mountPath: /var/log/pods
name: var-log-pods
- mountPath: /var/log/containers
name: var-log-containers
- mountPath: /var/lib/docker/containers
name: var-lib-docker-containers
readOnly: true
- mountPath: /fluent-bit/etc/
name: fluent-bit-config
- mountPath: /run/log
name: systemd-log
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: fluent-bit
serviceAccountName: fluent-bit
terminationGracePeriodSeconds: 10
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/master
operator: Exists
- effect: NoExecute
operator: Exists
- effect: NoSchedule
operator: Exists
volumes:
- hostPath:
path: /var/log
type: ""
name: var-log
- hostPath:
path: /var/log/pods
type: ""
name: var-log-pods
- hostPath:
path: /var/log/containers
type: ""
name: var-log-containers
- hostPath:
path: /var/lib/docker/containers
type: ""
name: var-lib-docker-containers
- hostPath:
path: /run/log
type: ""
name: systemd-log
- configMap:
defaultMode: 420
name: fluent-bit-config
name: fluent-bit-config
updateStrategy:
rollingUpdate:
maxUnavailable: 1
type: RollingUpdate
Expected behavior
All logs in /var/log/containers/ should be processed.
Your Environment
- Version used: 1.6.9, 1.7.9 and 1.8.9
- Configuration: See above
- Environment name and version Kubernetes Version: 1.20.2
- Server type and version: Single K8s instance See: https://github.com/vmware-samples/vcenter-event-broker-appliance
- Operating System and version: VMware PhotonOS v4
- Filters and plugins: @INCLUDE input-kubernetes.conf @INCLUDE input-systemd.conf @INCLUDE input-kube-apiserver.conf @INCLUDE input-auditd.conf @INCLUDE filter-kubernetes.conf @INCLUDE filter-record.conf @INCLUDE output-syslog.conf
Additional context
Running tail -f manually from within the system on a specific pod log, which is writing to stdout, works.
{"log":"10/03/2021 14:47:13 - Handler Processing Completed ...\n","stream":"stdout","time":"2021-10-03T14:47:13.829672574Z"}
{"log":"\n","stream":"stdout","time":"2021-10-03T14:47:13.829772103Z"}
Logs which e.g. aren’t processed:
root@veba-kn [ /var/log/containers ]# ls -rtl
total 376
lrwxrwxrwx 1 root root 100 Sep 13 21:31 antrea-agent-gql5r_kube-system_antrea-agent-8726abf73577f597e15716176cfcdce442b159d00ec12f59e439719d824a9585.log -> /var/log/pods/kube-system_antrea-agent-gql5r_31aa406a-286c-495b-9dcf-e4036c
2a4756/antrea-agent/3.log
lrwxrwxrwx 1 root root 98 Sep 13 21:31 antrea-agent-gql5r_kube-system_antrea-ovs-3f300f1d7b28c069df1f34cf37ff89be95d69fc3dc4ea0f269b5bd07ce5d56c1.log -> /var/log/pods/kube-system_antrea-agent-gql5r_31aa406a-286c-495b-9dcf-e4036c2a
4756/antrea-ovs/3.log
lrwxrwxrwx 1 root root 102 Sep 13 21:31 envoy-89vct_contour-external_shutdown-manager-c8ed97927c25d465f31cce5ab8bd91d02742504f8cf73ad53e493738d0a17f74.log -> /var/log/pods/contour-external_envoy-89vct_1c947a55-2b86-48bd-b442-c6c51e
c2dd3a/shutdown-manager/3.log
lrwxrwxrwx 1 root root 91 Sep 13 21:31 envoy-89vct_contour-external_envoy-0ea7a33d12105058f74eae9653dd0266ac99ef2ba7f6cb3a3b04a8ec3bc02525.log -> /var/log/pods/contour-external_envoy-89vct_1c947a55-2b86-48bd-b442-c6c51ec2dd3a/envo
y/3.log
lrwxrwxrwx 1 root root 104 Sep 13 21:31 contour-5869594b-7jm89_contour-external_contour-803e6591f657fae9539b64ae4f86fa44cce99b409c5f92979c6045cf4b98b52c.log -> /var/log/pods/contour-external_contour-5869594b-7jm89_cc6cf243-7d3f-483
9-91e8-741ab87f6488/contour/3.log
lrwxrwxrwx 1 root root 106 Sep 13 21:31 contour-5d47766fd8-n24mz_contour-internal_contour-ae34a8ae0b8398da294c5061ec5c0ef1e9be8cb2979f07077e5e9df12f2bab67.log -> /var/log/pods/contour-internal_contour-5d47766fd8-n24mz_a87131ad-d73a
-4371-a47b-dcc410f3b6e4/contour/3.log
lrwxrwxrwx 1 root root 100 Sep 13 21:31 coredns-74ff55c5b-mjdlr_kube-system_coredns-60bd5f49def85a0ddc929e2c2da5c793a3c6de00cd6a81bdcfdb21f3d4f45129.log -> /var/log/pods/kube-system_coredns-74ff55c5b-mjdlr_7ef260c1-308e-4162-8a84-2
31d560f8023/coredns/3.log
I’ve also tried running the DS in
securityContext:
privileged: true
Similar issues I found but which doesn’t provide the solution for this issue: #3857 #4014
Your help would be much appreciated. Thanks
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 12
- Comments: 93 (16 by maintainers)
Not stale, the issue is still here and is still the reason most of us do not use fluent bit anymore to this day
I also lose my respect to this product because of this bug
I’m using version 1.9.0 and I’m getting dismissed logs. This doesn’t really seems fixed.
I will try to do so, but tbh I dropped fluentbit out of my stack last July given that noone was helping on the matter.
Most versions of Fluent-bit are affected by this bug. I have tested versions 1.2.0, 1.3.0, 1.4.0, 1.5.0, 1.6.0, 1.7.0, 1.8.0, 1.9.0, 2.0.6, 2.0.8, and 2.0.10, and they all exhibited the same issue. The only versions that worked for me were
1.2.x, and1.3.x. @edsiper @patrick-stephens Can you please review the code for the tail plugin in version 1.3.7? You might find the bug.Hello all,
I am facing the same issue here using v2.0.8,
Runing fluent-bit on k3s on debian 11. All files are on native sata SSD with EXT4 filesystem.
I was able to identify the root cause to this
readlinkcall from https://github.com/fluent/fluent-bit/blob/v2.0.8/plugins/in_tail/tail_file.c#L1543I wrote a small c prog to prove the behaviour as follows
Here’s the result:
Further investigate the issue, I found removing the /proc mount will allow the readlink function to work properly. But the issue is that some other functions needs /proc to be mounted, eg. https://github.com/fluent/fluent-bit/blob/v2.0.8/plugins/in_cpu/cpu.c#L95.
Results:
So in order to allow the tail plugin to work properly, we can’t have other plugins that needs /proc to be mounted. It would be nice if we can mount /proc to a different path like
/host_procto avlid this issue. Maybe the temporary workaround is to have separate fluent-bit pods to collect different metrics?Hard to imagine how so important bug could exist for so long. Main feature of the project just doesn’t work.
I don’t know anything about CephFS but it doesn’t surprise me that disabling inotify helps. I’ll check the error message on Monday to see if it’s related to it but I’m curious about the other users too, I wonder if they are using a similar setup.
I’m having this issue as well on 1.9.6. The problem is that I NEED the long lines to be processed.
Has this been resolved? This seems to be related to not removal of inode during file removal process i.e. (the inode entry not being removed from files_static, files_event[https://github.com/fluent/fluent-bit/blob/master/plugins/in_tail/tail_config.h#L129, https://github.com/fluent/fluent-bit/blob/master/plugins/in_tail/tail_file.c#1108])
I have the same issue https://github.com/fluent/helm-charts/issues/415:
CephFS is a network mounted filesystem, afaik inotify will not work because kernel is not aware of filesystem changes in a directory, so it can’t inform any watch process, maybe it’s the issue. If other users are using a network filesystem this could be the case
@kc-dot-io I guess this would do the trick for you, but I really don’t like this idea; with a stable, fully functioning tool you’ would never have to do that. It was my understanding that Fluent bit was a stable tool, but I realise that it’s not the case. The logs are sooo verbose yet the buffers keep filling up with no warning, lines get dismissed with no explanation whatsoever… The only answer in this issue from a staff member was on March 18th…
I think I’m going to have to get rid of fluent bit
fluent-bit版本:1.9.3 问题:使用fluent-bit 的tail插件收集日志时,fluent-bit运行一段时间后,会停止继续收集日志 问题原因:使用multiline.parser进行日志合并时,cont_state可能将若干条日志合并为一行,导致tail停止采集 解决思路: multiline.parser进行日志合并时,受限使用start_state判断是否为开始行,然后使用cont_state判断后续行是否需要合并,因此state_state和cont_state应采用互斥的逻辑, 正确示范: rule “start_state” “/^(\d{4})(.)/" “cont” rule “cont” "/^(?!\d{4})(.)/” “cont” 错误示范: rule “start_state” “/^(\d{4})(.)/" “cont” rule “cont” "/(.)/” “cont”
目前暂时没有遇到tail停止采集的问题
Hi all, it finally worked for our project after replacing
DockerwithContainerdand byapplying the following config: https://github.com/vmware-samples/vcenter-event-broker-appliance/tree/development/files/configs/fluentbit/templates