fluent-bit: Potential memory leak in v1.8.7 debug
Bug Report
Describe the bug fluent/fluent-bit:1.8.7-debug@sha256:024748e4aa934d5b53a713341608b7ba801d41a170f9870fdf67f4032a20146f
To Reproduce
- Rubular link if applicable:
- Example log message if applicable:
stream of "OOMKilling" warnings
- Steps to reproduce the problem: Deploy fluent/fluent-bit:1.8.7-debug@sha256:024748e4aa934d5b53a713341608b7ba801d41a170f9870fdf67f4032a20146f and wait 10-15 mins. Container will OOM.
Expected behavior Deploying fluent/fluent-bit:1.8.7-debug@sha256:024748e4aa934d5b53a713341608b7ba801d41a170f9870fdf67f4032a20146f with a specified amount of memory will work and not constantly increase / OOM.
Screenshots
Your Environment
- Version used: fluent/fluent-bit:1.8.7-debug@sha256:024748e4aa934d5b53a713341608b7ba801d41a170f9870fdf67f4032a20146f
- Configuration:
fluent-bit.conf: |-
[SERVICE]
Flush 5
Grace 120
Log_Level debug
Daemon off
Parsers_File parsers.conf
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_PORT 3020
@INCLUDE containers.input.conf
@INCLUDE system.input.conf
@INCLUDE filter.conf
@INCLUDE output.conf
containers.input.conf: |-
[INPUT]
Name tail
Alias k8s_container
Tag k8s_container.<namespace_name>.<pod_name>.<container_name>
Tag_Regex (?<pod_name>[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace_name>[^_]+)_(?<container_name>.+)-
Path /var/log/containers/*.log
DB /var/run/google-fluentbit/pos-files/flb_kube.db
Buffer_Max_Size 1MB
Mem_Buf_Limit 50MB
Skip_Long_Lines On
Refresh_Interval 5
Read_from_Head True
system.input.conf: |-
# Example:
# Dec 21 23:17:22 gke-foo-1-1-4b5cbd14-node-4eoj startupscript: Finished running startup script /var/run/google.startup.script
[INPUT]
Name tail
Alias syslog
Parser syslog
Path /var/log/startupscript.log
DB /var/log/startupscript.db
Alias startupscript
Tag startupscript
[INPUT]
Name tail
Alias docker
Path /var/log/docker.log
Tag docker
Parser docker
Mem_Buf_Limit 1MB
Skip_Long_Lines On
Refresh_Interval 1
[INPUT]
Name tail
Alias etcd
Path /var/log/etcd.log
Tag etcd
Mem_Buf_Limit 1MB
Skip_Long_Lines On
Refresh_Interval 1
[INPUT]
Name tail
Alias kubelet
Path /var/log/kubelet.log
Tag kubelet
Multiline off
Parser_Firstline firstline
Parser_1 format1
Mem_Buf_Limit 1MB
Skip_Long_Lines On
Refresh_Interval 1
# Example:
# I1118 21:26:53.975789 6 proxier.go:1096] Port "nodePort for kube-system/default-http-backend:http" (:31429/tcp) was open before and is still needed
[INPUT]
Name tail
Alias kube-proxy
Tag kube-proxy
Path /var/log/kube-proxy.log
DB /var/log/kube-proxy.db
Buffer_Max_Size 1MB
Mem_Buf_Limit 1MB
Refresh_Interval 1
Parser glog
[INPUT]
Name tail
Alias kube-apiserver
Path /var/log/kube-apiserver.log
Tag kube-apiserver
Multiline off
Parser_Firstline firstline
Parser_1 format1
Mem_Buf_Limit 1MB
Skip_Long_Lines On
Refresh_Interval 1
[INPUT]
Name tail
Alias kube-controller-manager
Path /var/log/kube-controller-manager.log
Tag kube-controller-manager
Multiline off
Parser_Firstline firstline
Parser_1 format1
Mem_Buf_Limit 1MB
Skip_Long_Lines On
Refresh_Interval 1
[INPUT]
Name tail
Alias kube-scheduler
Path /var/log/kube-scheduler.log
Tag kube-scheduler
Multiline off
Parser_Firstline firstline
Parser_1 format1
Mem_Buf_Limit 1MB
Skip_Long_Lines On
Refresh_Interval 1
[INPUT]
Name tail
Alias rescheduler
Path /var/log/rescheduler.log
Tag rescheduler
Multiline off
Parser_Firstline firstline
Parser_1 format1
Mem_Buf_Limit 1MB
Skip_Long_Lines On
Refresh_Interval 1
[INPUT]
Name tail
Alias glbc
Path /var/log/glbc.log
Tag glbc
Multiline off
Parser_Firstline firstline
Parser_1 format1
Mem_Buf_Limit 1MB
Skip_Long_Lines On
Refresh_Interval 1
[INPUT]
Name tail
Alias cluster-autoscaler
Path /var/log/cluster-autoscaler.log
Tag cluster-autoscaler
Multiline off
Parser_Firstline firstline
Parser_1 format1
Mem_Buf_Limit 1MB
Skip_Long_Lines On
Refresh_Interval 1
# Logs from systemd-journal for interesting services.
[INPUT]
Name systemd
Alias sysd-docker
Tag docker
Systemd_Filter _SYSTEMD_UNIT=docker.service
Path /var/log/journal
DB /var/log/gcp-journald-docker.db
Read_from_head true
Buffer_Max_Size 1MB
Mem_Buf_Limit 1MB
Refresh_Interval 1
[INPUT]
Name systemd
Alias sysd-container-runtime
Tag container-runtime
Systemd_Filter _SYSTEMD_UNIT=containerd.service
Path /var/log/journal
DB /var/log/gcp-journald-container-runtime.db
Read_from_head true
Buffer_Max_Size 1MB
Mem_Buf_Limit 1MB
Refresh_Interval 1
[INPUT]
Name systemd
Alias sysd-kubelet
Tag kubelet
Systemd_Filter _SYSTEMD_UNIT=kubelet.service
Path /var/log/journal
DB /var/log/gcp-journald-kubelet.db
Read_from_head true
Buffer_Max_Size 1MB
Mem_Buf_Limit 1MB
Refresh_Interval 1
[INPUT]
Name systemd
Alias sysd-node-problem-detector
Tag node-problem-detector
Systemd_Filter _SYSTEMD_UNIT=node-problem-detector.service
Path /var/log/journal
DB /var/log/gcp-journald-node-problem-detector.db
Read_from_head true
Buffer_Max_Size 1MB
Mem_Buf_Limit 1MB
Refresh_Interval 1
filter.conf: |-
[FILTER]
Name parser
Match k8s_container.*
Key_Name log
Reserve_Data True
Parser docker
Parser containerd
[FILTER]
Name modify
Match *
Hard_rename log message
[FILTER]
Name parser
Match k8s_container.*
Key_Name message
Reserve_Data True
Parser glog
Parser json
# level is a common synonym for severity,
# the default field name in libraries such as GoLang's zap.
# populate severity with level, if severity does not exist.
[FILTER]
Name modify
Match k8s_container.*
Copy level severity
output.conf: |-
# handle namespaces in droplist first
{% for namespace in log_droplist %}
[OUTPUT]
Name null
Alias null-{{namespace}}
Match k8s_container.{{namespace}}.*
{% endfor %}
# Single output for all logs, project log routing handled by sinks in host project
[OUTPUT]
Name http
Alias http-export-all
Match *
Host 127.0.0.1
Port 3021
URI /logs
header_tag FLUENT-TAG
Format msgpack
Retry_Limit 2
parsers.conf: |-
[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L%z
[PARSER]
Name containerd
Format regex
Regex ^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L%z
[PARSER]
Name json
Format json
[PARSER]
Name glog
Format regex
Regex ^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source_file>[^ \]]+)\:(?<source_line>\d+)\]\s(?<message>.*)$
Time_Key time
Time_Format %m%d %H:%M:%S.%L
[PARSER]
Name syslog
Format regex
Regex ^\<(?<pri>[0-9]+)\>(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$
Time_Key time
Time_Format %b %d %H:%M:%S
[PARSER]
Name firstline
Format regex
Regex /^\w\d{4}/
- Environment name and version (e.g. Kubernetes? What version?): Kubernetes
- Server type and version:
- Operating System and version: “Debian GNU/Linux 10 (buster)”
- Filters and plugins: See config above
Additional context
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 9
- Comments: 25 (4 by maintainers)
@edsiper our mem_buf_limits are 500mb and the OP’s are 1mb. If this was just a configuration thing, it would be happening in both versions. When we rolled back to 1.5.2, memory use dropped right back to about 4mb per pod vs the 20mb-3gb that the 1.8.8 version pods used. In 1.8.8, one pod out of three would consistently run up to 3gb within hours while the others would slowly rise up and hang around at 20mb.
I have this same issue and I only use the
tail
input.