fluent-bit: Output ES fails to flush chunk with status 0
Bug Report
Describe the bug
Constantly (but not always) fails to flush chunks with HTTP status=0 from ES in Kubernetes. Debug/Trace logs don’t show anything specific, even with Trace_Error On
[fluent-bit-5gqwz] [2021/03/29 21:53:42] [ warn] [engine] failed to flush chunk '1-1617054818.967282961.flb', retry in 9 seconds: task_id=0, input=tail.0 > output=es.0 (out_id=0) 
[fluent-bit-mw7sw] [2021/03/29 21:54:03] [ warn] [engine] failed to flush chunk '1-1617054839.820871907.flb', retry in 7 seconds: task_id=0, input=tail.0 > output=es.0 (out_id=0) 
[fluent-bit-5gqwz] [2021/03/29 21:55:32] [error] [output:es:es.0] HTTP status=0 URI=/_bulk 
[fluent-bit-5gqwz] [2021/03/29 21:55:32] [ warn] [engine] failed to flush chunk '1-1617054927.935218441.flb', retry in 10 seconds: task_id=0, input=tail.0 > output=es.0 (out_id=0) 
[fluent-bit-2z84q] [2021/03/29 21:55:47] [error] [output:es:es.0] HTTP status=0 URI=/_bulk 
[fluent-bit-2z84q] [2021/03/29 21:55:47] [ warn] [engine] failed to flush chunk '1-1617054942.558870892.flb', retry in 10 seconds: task_id=0, input=tail.0 > output=es.0 (out_id=0) 
[fluent-bit-mw7sw] [2021/03/29 21:56:23] [error] [output:es:es.0] HTTP status=0 URI=/_bulk 
[fluent-bit-mw7sw] [2021/03/29 21:56:23] [ warn] [engine] failed to flush chunk '1-1617054978.505299097.flb', retry in 11 seconds: task_id=0, input=tail.0 > output=es.0 (out_id=0) 
Your Environment
- Version used: 1.7.0 and 1.7.1
- Configuration:
Tried to remove everything and leave only in_tailandout_es. Results are the same.
  config:
    service: |
      [SERVICE]
          Flush         5
          Grace         120
          Log_Level     info
          Daemon        off
          Parsers_File  parsers.conf
          Parsers_File  custom_parsers.conf
          HTTP_Server   On
          HTTP_Listen   0.0.0.0
          HTTP_PORT     2020
    customParsers: |
        [PARSER]
            Name        CRI
            Format      regex
            Regex       ^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<logtag>[^ ]*) (?<log>.*)$
            Time_Key    time
            Time_Format %Y-%m-%dT%H:%M:%S.%L%z
    filters: |
        # -- Applied to All --
        [FILTER]
            Name                kubernetes
            Match               kube.*
            Merge_Log           Off
            Keep_Log            Off
            K8S-Logging.Parser  On
            K8S-Logging.Exclude On
        [FILTER]
            Name                nest
            Match               kube.*
            Operation           lift
            Nested_Under        kubernetes
        # -- Ingress Nginx --
        [FILTER]
            Name                rewrite_tag
            Match               kube*ingress-nginx*
            Rule                $stream ^(stderr)$  ingress-nginx.error false
            Rule                $stream ^(stdout)$  ingress-nginx.access false
        [FILTER]
            Name                parser
            Match               ingress-nginx.access
            Parser              json
            Key_Name            log
    inputs: |
        [INPUT]
            Name                tail
            Tag                 kube.*
            Path                /var/log/containers/*.log
            DB                  /var/log/flb_kube_containers.db
            Parser              CRI
            Mem_Buf_Limit       5MB
            Buffer_Max_Size     1MB
            Refresh_Interval    10
            Skip_Long_Lines     On
        # -- SystemD --
        [INPUT]
            Name                systemd
            Alias               systemd-kubelet
            Tag                 systemd-kubelet
            Systemd_Filter      _SYSTEMD_UNIT=kubelet.service
            Path                /var/log/journal
            DB                  /var/log/kubelet.db
            Mem_Buf_Limit       5MB
            Buffer_Max_Size     1MB
        [INPUT]
            Name                systemd
            Alias               systemd-kube-node-installation
            Tag                 systemd-kube-node-installation
            Systemd_Filter      _SYSTEMD_UNIT=kube-node-installation.service
            Path                /var/log/journal
            DB                  /var/log/kube-node-installation.db
            Mem_Buf_Limit       5MB
            Buffer_Max_Size     1MB
        [INPUT]
            Name                systemd
            Alias               systemd-kube-node-configuration
            Tag                 systemd-kube-node-configuration
            Systemd_Filter      _SYSTEMD_UNIT=kube-node-configuration.service
            Path                /var/log/journal
            DB                  /var/log/kube-node-configuration.db
            Mem_Buf_Limit       5MB
            Buffer_Max_Size     1MB
        [INPUT]
            Name                systemd
            Alias               systemd-kube-logrotate
            Tag                 systemd-kube-logrotate
            Systemd_Filter      _SYSTEMD_UNIT=kube-logrotate.service
            Path                /var/log/journal
            DB                  /var/log/kube-logrotate.db
            Mem_Buf_Limit       5MB
            Buffer_Max_Size     1MB
        [INPUT]
            Name                systemd
            Alias               systemd-node-problem-detector
            Tag                 systemd-node-problem-detector
            Systemd_Filter      _SYSTEMD_UNIT=node-problem-detector.service
            Path                /var/log/journal
            DB                  /var/log/node-problem-detector.db
            Mem_Buf_Limit       5MB
            Buffer_Max_Size     1MB
        [INPUT]
            Name                systemd
            Alias               systemd-kube-container-runtime-monitor
            Tag                 systemd-kube-container-runtime-monitor
            Systemd_Filter      _SYSTEMD_UNIT=kube-container-runtime-monitor.service
            Path                /var/log/journal
            DB                  /var/log/kube-container-runtime-monitor.db
            Mem_Buf_Limit       5MB
            Buffer_Max_Size     1MB
        [INPUT]
            Name                systemd
            Alias               systemd-kubelet-monitor
            Tag                 systemd-kubelet-monitor
            Systemd_Filter      _SYSTEMD_UNIT=kubelet-monitor.service
            Path                /var/log/journal
            DB                  /var/log/kubelet-monitor.db
            Mem_Buf_Limit       5MB
            Buffer_Max_Size     1MB
    outputs: |
        [OUTPUT]
            Name                   gelf
            Match                  systemd*
            Gelf_Short_Message_Key MESSAGE
            Gelf_Host_Key          _HOSTNAME
            Host                   ${GRAYLOG_HOST}
            PORT                   ${GRAYLOG_PORT}
            Mode                   udp
        [OUTPUT]
            Name                   gelf
            Match                  kube*
            Gelf_Short_Message_Key log
            Gelf_Host_Key          host
            Host                   ${GRAYLOG_HOST}
            PORT                   ${GRAYLOG_PORT}
            Mode                   udp
        # -- Ingress Nginx --
        [OUTPUT]
            Name                   gelf
            Match                  ingress-nginx.error
            Gelf_Short_Message_Key log
            Gelf_Host_Key          host
            Host                   ${GRAYLOG_HOST}
            PORT                   ${GRAYLOG_PORT}
            Mode                   udp
        [OUTPUT]
            Name                   es
            Match                  ingress-nginx.access
            Index                  nginx
            Logstash_Format        On
            Host                   ${ES_HOST}
            Http_User              ${ES_USERNAME}
            Http_Passwd            ${ES_PASSWORD}
            Logstash_Prefix        nginx
            Logstash_DateFormat    %Y%m%d
            Retry_Limit            5
            Type                   _doc
            Port                   443
            Tls                    On
            Buffer_Size            50M
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 31 (6 by maintainers)
Commits related to this issue
- http_client: fail early when ingesting an invalid HTTP status Currently process_data() will generate an error when various header fields are invalid, but not if the HTTP status itself is. This commit... — committed to atheriel/fluent-bit by atheriel 3 years ago
- http_client: fail early when ingesting an invalid HTTP status (#3344) Currently process_data() will generate an error when various header fields are invalid, but not if the HTTP status itself is. Th... — committed to fluent/fluent-bit by atheriel 3 years ago
- http_client: fail early when ingesting an invalid HTTP status (#3344) Currently process_data() will generate an error when various header fields are invalid, but not if the HTTP status itself is. Th... — committed to DrewZhang13/fluent-bit by atheriel 3 years ago
- http_client: fail early when ingesting an invalid HTTP status (#3344) Currently process_data() will generate an error when various header fields are invalid, but not if the HTTP status itself is. Th... — committed to DrewZhang13/fluent-bit by atheriel 3 years ago
Still an issue, should be re-opened. Seeing this with fluent-bit 1.8.12 from the https://fluent.github.io/helm-charts Helm chart version 0.19.19 writing to Elasticsearch.
By adding
Trace_Error Onto the es OUTPUT , I saw the problem was mapping conflict in es. After solving the issue the error got solved completely.This issue was closed because it has been stalled for 5 days with no activity.
I turned off the net.keepalive and I have not seen any errors for a while now.
Seems that the http status=0 and the failed to flush chunk are disappeared. Haven’t seen them for hours now.
Edit: It definitly has something to do with this net.keepalive. No errors anymore after set to false!
I saw the same issue with fluent-bit v1.9.3 and AWS Opensearch version 1.2 … even though I turned off net.keepalive, also set Buffer_Size False, but cannot solve the issue. just like @lifeofmoo mentioned, initially everything went well in OpenSearch then the issue of “failed to flush chunk” came out… I think the issue should be re-opened!
This seems similar to the problem that i am having here: https://github.com/fluent/fluent-bit/issues/3299
I seem to hit the max request size configured in AWS for ES installation. Can you enable this option in ElasticSearch output plugin and see if you see the same error?
Also seeing this with fluent-bit v1.9.3 and AWS Opensearch version 1.1 - initial set of of logs get sent to the correct index in OpenSearch and then I just see lines of flush chunk with status 0
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.