fluent-bit: Potential memory leak in v1.8.7 debug

Bug Report

Describe the bug fluent/fluent-bit:1.8.7-debug@sha256:024748e4aa934d5b53a713341608b7ba801d41a170f9870fdf67f4032a20146f

To Reproduce

Rubular link if applicable:
Example log message if applicable:

stream of "OOMKilling" warnings

Steps to reproduce the problem: Deploy fluent/fluent-bit:1.8.7-debug@sha256:024748e4aa934d5b53a713341608b7ba801d41a170f9870fdf67f4032a20146f and wait 10-15 mins. Container will OOM.

Expected behavior Deploying fluent/fluent-bit:1.8.7-debug@sha256:024748e4aa934d5b53a713341608b7ba801d41a170f9870fdf67f4032a20146f with a specified amount of memory will work and not constantly increase / OOM.

Screenshots

Your Environment

Version used: fluent/fluent-bit:1.8.7-debug@sha256:024748e4aa934d5b53a713341608b7ba801d41a170f9870fdf67f4032a20146f
Configuration:

  fluent-bit.conf: |-
    [SERVICE]
        Flush         5
        Grace         120
        Log_Level     debug
        Daemon        off
        Parsers_File  parsers.conf
        HTTP_Server   On
        HTTP_Listen   0.0.0.0
        HTTP_PORT     3020

    @INCLUDE containers.input.conf
    @INCLUDE system.input.conf
    @INCLUDE filter.conf
    @INCLUDE output.conf

  containers.input.conf: |-
    [INPUT]
        Name             tail
        Alias            k8s_container
        Tag              k8s_container.<namespace_name>.<pod_name>.<container_name>
        Tag_Regex        (?<pod_name>[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace_name>[^_]+)_(?<container_name>.+)-
        Path             /var/log/containers/*.log
        DB               /var/run/google-fluentbit/pos-files/flb_kube.db
        Buffer_Max_Size  1MB
        Mem_Buf_Limit    50MB
        Skip_Long_Lines  On
        Refresh_Interval 5
        Read_from_Head   True

  system.input.conf: |-
    # Example:
    # Dec 21 23:17:22 gke-foo-1-1-4b5cbd14-node-4eoj startupscript: Finished running startup script /var/run/google.startup.script
    [INPUT]
        Name   tail
        Alias  syslog
        Parser syslog
        Path   /var/log/startupscript.log
        DB     /var/log/startupscript.db
        Alias  startupscript
        Tag    startupscript

    [INPUT]
        Name    tail
        Alias   docker
        Path    /var/log/docker.log
        Tag     docker
        Parser  docker
        Mem_Buf_Limit    1MB
        Skip_Long_Lines  On
        Refresh_Interval 1

    [INPUT]
        Name  tail
        Alias etcd
        Path  /var/log/etcd.log
        Tag   etcd
        Mem_Buf_Limit    1MB
        Skip_Long_Lines  On
        Refresh_Interval 1

    [INPUT]
        Name             tail
        Alias            kubelet
        Path             /var/log/kubelet.log
        Tag              kubelet
        Multiline        off
        Parser_Firstline firstline
        Parser_1         format1
        Mem_Buf_Limit    1MB
        Skip_Long_Lines  On
        Refresh_Interval 1

    # Example:
    # I1118 21:26:53.975789       6 proxier.go:1096] Port "nodePort for kube-system/default-http-backend:http" (:31429/tcp) was open before and is still needed
    [INPUT]
        Name            tail
        Alias           kube-proxy
        Tag             kube-proxy
        Path            /var/log/kube-proxy.log
        DB              /var/log/kube-proxy.db
        Buffer_Max_Size 1MB
        Mem_Buf_Limit   1MB
        Refresh_Interval 1
        Parser          glog

    [INPUT]
        Name             tail
        Alias            kube-apiserver
        Path             /var/log/kube-apiserver.log
        Tag              kube-apiserver
        Multiline        off
        Parser_Firstline firstline
        Parser_1         format1
        Mem_Buf_Limit    1MB
        Skip_Long_Lines  On
        Refresh_Interval 1

    [INPUT]
        Name             tail
        Alias            kube-controller-manager
        Path             /var/log/kube-controller-manager.log
        Tag              kube-controller-manager
        Multiline        off
        Parser_Firstline firstline
        Parser_1         format1
        Mem_Buf_Limit    1MB
        Skip_Long_Lines  On
        Refresh_Interval 1

    [INPUT]
        Name             tail
        Alias            kube-scheduler
        Path             /var/log/kube-scheduler.log
        Tag              kube-scheduler
        Multiline        off
        Parser_Firstline firstline
        Parser_1         format1
        Mem_Buf_Limit    1MB
        Skip_Long_Lines  On
        Refresh_Interval 1

    [INPUT]
        Name             tail
        Alias            rescheduler
        Path             /var/log/rescheduler.log
        Tag              rescheduler
        Multiline        off
        Parser_Firstline firstline
        Parser_1         format1
        Mem_Buf_Limit    1MB
        Skip_Long_Lines  On
        Refresh_Interval 1

    [INPUT]
        Name             tail
        Alias            glbc
        Path             /var/log/glbc.log
        Tag              glbc
        Multiline        off
        Parser_Firstline firstline
        Parser_1         format1
        Mem_Buf_Limit    1MB
        Skip_Long_Lines  On
        Refresh_Interval 1

    [INPUT]
        Name             tail
        Alias            cluster-autoscaler
        Path             /var/log/cluster-autoscaler.log
        Tag              cluster-autoscaler
        Multiline        off
        Parser_Firstline firstline
        Parser_1         format1
        Mem_Buf_Limit    1MB
        Skip_Long_Lines  On
        Refresh_Interval 1

    # Logs from systemd-journal for interesting services.
    [INPUT]
        Name           systemd
        Alias          sysd-docker
        Tag            docker
        Systemd_Filter _SYSTEMD_UNIT=docker.service
        Path           /var/log/journal
        DB             /var/log/gcp-journald-docker.db
        Read_from_head  true
        Buffer_Max_Size 1MB
        Mem_Buf_Limit   1MB
        Refresh_Interval 1

    [INPUT]
        Name           systemd
        Alias          sysd-container-runtime
        Tag            container-runtime
        Systemd_Filter _SYSTEMD_UNIT=containerd.service
        Path           /var/log/journal
        DB             /var/log/gcp-journald-container-runtime.db
        Read_from_head true
        Buffer_Max_Size 1MB
        Mem_Buf_Limit   1MB
        Refresh_Interval 1

    [INPUT]
        Name            systemd
        Alias           sysd-kubelet
        Tag             kubelet
        Systemd_Filter  _SYSTEMD_UNIT=kubelet.service
        Path            /var/log/journal
        DB              /var/log/gcp-journald-kubelet.db
        Read_from_head  true
        Buffer_Max_Size 1MB
        Mem_Buf_Limit   1MB
        Refresh_Interval 1

    [INPUT]
        Name           systemd
        Alias          sysd-node-problem-detector
        Tag            node-problem-detector
        Systemd_Filter _SYSTEMD_UNIT=node-problem-detector.service
        Path           /var/log/journal
        DB             /var/log/gcp-journald-node-problem-detector.db
        Read_from_head  true
        Buffer_Max_Size 1MB
        Mem_Buf_Limit   1MB
        Refresh_Interval 1

  filter.conf: |-

    [FILTER]
        Name         parser
        Match        k8s_container.*
        Key_Name     log
        Reserve_Data True
        Parser       docker
        Parser       containerd

    [FILTER]
        Name        modify
        Match       *
        Hard_rename log message

    [FILTER]
        Name         parser
        Match        k8s_container.*
        Key_Name     message
        Reserve_Data True
        Parser       glog
        Parser       json

    # level is a common synonym for severity,
    # the default field name in libraries such as GoLang's zap.
    # populate severity with level, if severity does not exist.
    [FILTER]
        Name        modify
        Match       k8s_container.*
        Copy        level severity

  output.conf: |-

    # handle namespaces in droplist first
    {% for namespace in log_droplist %}
    [OUTPUT]
        Name  null
        Alias null-{{namespace}}
        Match k8s_container.{{namespace}}.*
    {% endfor %}

    # Single output for all logs, project log routing handled by sinks in host project
    [OUTPUT]
        Name                       http
        Alias                      http-export-all
        Match                      *
        Host                       127.0.0.1
        Port                       3021
        URI                        /logs
        header_tag                 FLUENT-TAG
        Format                     msgpack
        Retry_Limit                2

  parsers.conf: |-
    [PARSER]
        Name        docker
        Format      json
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%L%z

    [PARSER]
        Name        containerd
        Format      regex
        Regex       ^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%L%z

    [PARSER]
        Name        json
        Format      json

    [PARSER]
        Name        glog
        Format      regex
        Regex       ^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source_file>[^ \]]+)\:(?<source_line>\d+)\]\s(?<message>.*)$
        Time_Key    time
        Time_Format %m%d %H:%M:%S.%L

    [PARSER]
        Name        syslog
        Format      regex
        Regex       ^\<(?<pri>[0-9]+)\>(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$
        Time_Key    time
        Time_Format %b %d %H:%M:%S

    [PARSER]
        Name firstline
        Format regex
        Regex  /^\w\d{4}/

Environment name and version (e.g. Kubernetes? What version?): Kubernetes
Server type and version:
Operating System and version: “Debian GNU/Linux 10 (buster)”
Filters and plugins: See config above

Additional context

About this issue

Original URL
State: closed
Created 3 years ago
Reactions: 9
Comments: 25 (4 by maintainers)

Most upvoted comments

@edsiper our mem_buf_limits are 500mb and the OP’s are 1mb. If this was just a configuration thing, it would be happening in both versions. When we rolled back to 1.5.2, memory use dropped right back to about 4mb per pod vs the 20mb-3gb that the 1.8.8 version pods used. In 1.8.8, one pod out of three would consistently run up to 3gb within hours while the others would slowly rise up and hang around at 20mb.

ggallagher0 on Oct 27, 2021

@ggallagher0 can you try reproducing the problem by disabling systemd input ? can you help to isolate the plugin triggering the problem

I have this same issue and I only use the tail input.

[FILTER]
    Name              aws
    Match             *
    imds_version      v1
    az                true
    ec2_instance_id   true
    ec2_instance_type true
    private_ip        true
    ami_id            true
    account_id        true
    hostname          true
    vpc_id            true
[FILTER]
    Name                kubernetes
    Match               ingress-nginx.*
    Kube_URL            https://kubernetes.default.svc:443
    Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token
    Kube_Tag_Prefix     ingress-nginx.
    Use_Kubelet         true
    Buffer_Size         0
    Merge_Log           On
    Keep_Log            False
[SERVICE]
    Flush             5
    Grace             120
    Log_Level         error
    Daemon            off
    Parsers_File      parsers.conf
    HTTP_Server       On
    HTTP_Listen       0.0.0.0
    HTTP_Port         2020
    storage.metrics   On
    storage.path      /var/log/flb-storage/

@INCLUDE input-kubernetes.conf
@INCLUDE filter-kubernetes.conf
@INCLUDE filter-aws.conf
@INCLUDE output-elasticsearch.conf
@INCLUDE output-s3.conf
[INPUT]
    Name              tail
    Alias             ingress_nginx_appdat-system
    Tag               ingress_<namespace_name>_<pod_name>_<container_name>
    Tag_Regex         (?<pod_name>[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace_name>[^_]+)_(?<container_name>.+)-
    Path              /var/log/containers/ingress-nginx-controller*.log
    Parser            docker
    DB                /var/log/flb_ingress.db
    storage.type      filesystem
    Docker_Mode       On
    Skip_Long_Lines   On
    Refresh_Interval  5
    Buffer_Max_Size   1MB
    Mem_Buf_Limt      5MB
[OUTPUT]
    Name                      es
    Match                     *
    Host                      ${ELASTICSEARCH_HOST}
    Port                      ${ELASTICSEARCH_PORT}
    AWS_Auth                  ${ELASTICSEARCH_AWS_AUTH}
    AWS_Region                ${ELASTICSEARCH_AWS_REGION}
    TLS                       On
    Generate_ID               On
    Logstash_Prefix           access-logs
    Logstash_Format           On
    Replace_Dots              On
    Buffer_Size               False
    Retry_Limit               False
    storage.total_limit_size  2048M
[OUTPUT]
    Name                          s3
    Match                         *
    bucket                        ${S3_BUCKET_NAME}
    region                        ${S3_BUCKET_REGION}
    store_dir                     /var/log/flb-storage
    s3_key_format                 ${S3_BUCKET_KEY_FORMAT}
    s3_key_format_tag_delimiters  .-
    upload_timeout                5m
    Retry_Limit                   False
    storage.total_limit_size      2048M
[PARSER]
    Name        docker
    Format      json
    Time_Key    time
    Time_Format %Y-%m-%dT%H:%M:%S.%L
    Time_Keep   On

NeckBeardPrince on Oct 27, 2021