fluent-bit: Output ES fails to flush chunk with status 0
Bug Report
Describe the bug
Constantly (but not always) fails to flush chunks with HTTP status=0
from ES in Kubernetes. Debug/Trace logs don’t show anything specific, even with Trace_Error On
[fluent-bit-5gqwz] [2021/03/29 21:53:42] [ warn] [engine] failed to flush chunk '1-1617054818.967282961.flb', retry in 9 seconds: task_id=0, input=tail.0 > output=es.0 (out_id=0)
[fluent-bit-mw7sw] [2021/03/29 21:54:03] [ warn] [engine] failed to flush chunk '1-1617054839.820871907.flb', retry in 7 seconds: task_id=0, input=tail.0 > output=es.0 (out_id=0)
[fluent-bit-5gqwz] [2021/03/29 21:55:32] [error] [output:es:es.0] HTTP status=0 URI=/_bulk
[fluent-bit-5gqwz] [2021/03/29 21:55:32] [ warn] [engine] failed to flush chunk '1-1617054927.935218441.flb', retry in 10 seconds: task_id=0, input=tail.0 > output=es.0 (out_id=0)
[fluent-bit-2z84q] [2021/03/29 21:55:47] [error] [output:es:es.0] HTTP status=0 URI=/_bulk
[fluent-bit-2z84q] [2021/03/29 21:55:47] [ warn] [engine] failed to flush chunk '1-1617054942.558870892.flb', retry in 10 seconds: task_id=0, input=tail.0 > output=es.0 (out_id=0)
[fluent-bit-mw7sw] [2021/03/29 21:56:23] [error] [output:es:es.0] HTTP status=0 URI=/_bulk
[fluent-bit-mw7sw] [2021/03/29 21:56:23] [ warn] [engine] failed to flush chunk '1-1617054978.505299097.flb', retry in 11 seconds: task_id=0, input=tail.0 > output=es.0 (out_id=0)
Your Environment
- Version used: 1.7.0 and 1.7.1
- Configuration:
Tried to remove everything and leave only
in_tail
andout_es
. Results are the same.
config:
service: |
[SERVICE]
Flush 5
Grace 120
Log_Level info
Daemon off
Parsers_File parsers.conf
Parsers_File custom_parsers.conf
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_PORT 2020
customParsers: |
[PARSER]
Name CRI
Format regex
Regex ^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<logtag>[^ ]*) (?<log>.*)$
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L%z
filters: |
# -- Applied to All --
[FILTER]
Name kubernetes
Match kube.*
Merge_Log Off
Keep_Log Off
K8S-Logging.Parser On
K8S-Logging.Exclude On
[FILTER]
Name nest
Match kube.*
Operation lift
Nested_Under kubernetes
# -- Ingress Nginx --
[FILTER]
Name rewrite_tag
Match kube*ingress-nginx*
Rule $stream ^(stderr)$ ingress-nginx.error false
Rule $stream ^(stdout)$ ingress-nginx.access false
[FILTER]
Name parser
Match ingress-nginx.access
Parser json
Key_Name log
inputs: |
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/*.log
DB /var/log/flb_kube_containers.db
Parser CRI
Mem_Buf_Limit 5MB
Buffer_Max_Size 1MB
Refresh_Interval 10
Skip_Long_Lines On
# -- SystemD --
[INPUT]
Name systemd
Alias systemd-kubelet
Tag systemd-kubelet
Systemd_Filter _SYSTEMD_UNIT=kubelet.service
Path /var/log/journal
DB /var/log/kubelet.db
Mem_Buf_Limit 5MB
Buffer_Max_Size 1MB
[INPUT]
Name systemd
Alias systemd-kube-node-installation
Tag systemd-kube-node-installation
Systemd_Filter _SYSTEMD_UNIT=kube-node-installation.service
Path /var/log/journal
DB /var/log/kube-node-installation.db
Mem_Buf_Limit 5MB
Buffer_Max_Size 1MB
[INPUT]
Name systemd
Alias systemd-kube-node-configuration
Tag systemd-kube-node-configuration
Systemd_Filter _SYSTEMD_UNIT=kube-node-configuration.service
Path /var/log/journal
DB /var/log/kube-node-configuration.db
Mem_Buf_Limit 5MB
Buffer_Max_Size 1MB
[INPUT]
Name systemd
Alias systemd-kube-logrotate
Tag systemd-kube-logrotate
Systemd_Filter _SYSTEMD_UNIT=kube-logrotate.service
Path /var/log/journal
DB /var/log/kube-logrotate.db
Mem_Buf_Limit 5MB
Buffer_Max_Size 1MB
[INPUT]
Name systemd
Alias systemd-node-problem-detector
Tag systemd-node-problem-detector
Systemd_Filter _SYSTEMD_UNIT=node-problem-detector.service
Path /var/log/journal
DB /var/log/node-problem-detector.db
Mem_Buf_Limit 5MB
Buffer_Max_Size 1MB
[INPUT]
Name systemd
Alias systemd-kube-container-runtime-monitor
Tag systemd-kube-container-runtime-monitor
Systemd_Filter _SYSTEMD_UNIT=kube-container-runtime-monitor.service
Path /var/log/journal
DB /var/log/kube-container-runtime-monitor.db
Mem_Buf_Limit 5MB
Buffer_Max_Size 1MB
[INPUT]
Name systemd
Alias systemd-kubelet-monitor
Tag systemd-kubelet-monitor
Systemd_Filter _SYSTEMD_UNIT=kubelet-monitor.service
Path /var/log/journal
DB /var/log/kubelet-monitor.db
Mem_Buf_Limit 5MB
Buffer_Max_Size 1MB
outputs: |
[OUTPUT]
Name gelf
Match systemd*
Gelf_Short_Message_Key MESSAGE
Gelf_Host_Key _HOSTNAME
Host ${GRAYLOG_HOST}
PORT ${GRAYLOG_PORT}
Mode udp
[OUTPUT]
Name gelf
Match kube*
Gelf_Short_Message_Key log
Gelf_Host_Key host
Host ${GRAYLOG_HOST}
PORT ${GRAYLOG_PORT}
Mode udp
# -- Ingress Nginx --
[OUTPUT]
Name gelf
Match ingress-nginx.error
Gelf_Short_Message_Key log
Gelf_Host_Key host
Host ${GRAYLOG_HOST}
PORT ${GRAYLOG_PORT}
Mode udp
[OUTPUT]
Name es
Match ingress-nginx.access
Index nginx
Logstash_Format On
Host ${ES_HOST}
Http_User ${ES_USERNAME}
Http_Passwd ${ES_PASSWORD}
Logstash_Prefix nginx
Logstash_DateFormat %Y%m%d
Retry_Limit 5
Type _doc
Port 443
Tls On
Buffer_Size 50M
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 31 (6 by maintainers)
Commits related to this issue
- http_client: fail early when ingesting an invalid HTTP status Currently process_data() will generate an error when various header fields are invalid, but not if the HTTP status itself is. This commit... — committed to atheriel/fluent-bit by atheriel 3 years ago
- http_client: fail early when ingesting an invalid HTTP status (#3344) Currently process_data() will generate an error when various header fields are invalid, but not if the HTTP status itself is. Th... — committed to fluent/fluent-bit by atheriel 3 years ago
- http_client: fail early when ingesting an invalid HTTP status (#3344) Currently process_data() will generate an error when various header fields are invalid, but not if the HTTP status itself is. Th... — committed to DrewZhang13/fluent-bit by atheriel 3 years ago
- http_client: fail early when ingesting an invalid HTTP status (#3344) Currently process_data() will generate an error when various header fields are invalid, but not if the HTTP status itself is. Th... — committed to DrewZhang13/fluent-bit by atheriel 3 years ago
Still an issue, should be re-opened. Seeing this with fluent-bit 1.8.12 from the https://fluent.github.io/helm-charts Helm chart version 0.19.19 writing to Elasticsearch.
By adding
Trace_Error On
to the es OUTPUT , I saw the problem was mapping conflict in es. After solving the issue the error got solved completely.This issue was closed because it has been stalled for 5 days with no activity.
I turned off the net.keepalive and I have not seen any errors for a while now.
Seems that the http status=0 and the failed to flush chunk are disappeared. Haven’t seen them for hours now.
Edit: It definitly has something to do with this net.keepalive. No errors anymore after set to false!
I saw the same issue with fluent-bit v1.9.3 and AWS Opensearch version 1.2 … even though I turned off net.keepalive, also set Buffer_Size False, but cannot solve the issue. just like @lifeofmoo mentioned, initially everything went well in OpenSearch then the issue of “failed to flush chunk” came out… I think the issue should be re-opened!
This seems similar to the problem that i am having here: https://github.com/fluent/fluent-bit/issues/3299
I seem to hit the max request size configured in AWS for ES installation. Can you enable this option in ElasticSearch output plugin and see if you see the same error?
Also seeing this with fluent-bit v1.9.3 and AWS Opensearch version 1.1 - initial set of of logs get sent to the correct index in OpenSearch and then I just see lines of flush chunk with status 0
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.