fluent-bit: invalid stream_id x, could not append content to multiline context
Bug Report
Describe the bug Since upgrading to 1.8.8 using the fluent/fluent-bit Helm chart, I see these errors in the logs of essentially every daemonset pod.
To Reproduce I’m not actually sure what’s causing this. Is there a way to make FluentBit show me some information about what’s causing this exactly? I would also be ok with a way to disable multiline completely while still parsing structured logs from our services. (We log single JSON lines for the most part, and multiline isn’t necessary.)
Expected behavior FluentBit doesn’t log tons of errors.
Screenshots
Your Environment
- Version used: 1.8.8
- Configuration:
serviceMonitor:
enabled: true
dashboards:
enabled: true
labelKey: grafana_dashboard
annotations: {}
config:
inputs: |
[INPUT]
Name tail
Path /var/log/containers/*.log
multiline.parser docker, cri
Tag kube.*
Mem_Buf_Limit 50MB
Skip_Long_Lines Off
Skip_Empty_Lines On
[INPUT]
Name systemd
Tag host.*
Systemd_Filter _SYSTEMD_UNIT=kubelet.service
Read_From_Tail On
outputs: |
[OUTPUT]
Name es
Match kube.*
Host x
Port 443
Logstash_Format On
Retry_Limit False
Time_Key @timestamp
Replace_Dots On
Logstash_Prefix kubernetes_cluster
Logstash_DateFormat %y.%m.%d.%H
tls on
tls.verify on
tls.debug 1
Trace_Error On
Suppress_Type_Name On
- Environment name and version (e.g. Kubernetes? What version?): Kubernetes v1.21.4
Additional context I’m trying to discover if there’s something wrong on our end emitting logs or a problem with FluentBit. Thanks in advance!
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 1
- Comments: 16
Commits related to this issue
- in_tail: create stream_id by file inode(#4190) If stream_id is created by filename, rotated file id will be same. It causes releasing new multiline instance after file rotation. Signed-off-by: Takah... — committed to nokute78/fluent-bit by nokute78 3 years ago
- in_tail: create stream_id by file inode(#4190) If stream_id is created by filename, rotated file id will be same. It causes releasing new multiline instance after file rotation. Signed-off-by: Takah... — committed to nokute78/fluent-bit by nokute78 3 years ago
- in_tail: create stream_id by file inode(#4190) If stream_id is created by filename, rotated file id will be same. It causes releasing new multiline instance after file rotation. Signed-off-by: Takah... — committed to nokute78/fluent-bit by nokute78 3 years ago
- in_tail: create stream_id by file inode(#4190) If stream_id is created by filename, rotated file id will be same. It causes releasing new multiline instance after file rotation. Signed-off-by: Takah... — committed to fluent/fluent-bit by nokute78 3 years ago
- in_tail: create stream_id by file inode(#4190) If stream_id is created by filename, rotated file id will be same. It causes releasing new multiline instance after file rotation. Signed-off-by: Takah... — committed to fluent/fluent-bit by nokute78 3 years ago
- in_tail: create stream_id by file inode(#4190) If stream_id is created by filename, rotated file id will be same. It causes releasing new multiline instance after file rotation. Signed-off-by: Takah... — committed to 0Delta/fluent-bit by nokute78 3 years ago
We too are seeing this issue on our production server after starting to use
multiline.parser cri
. Using fluent-bit version 1.8.8 on kubernetes 1.21.4. Running fluent-bit as a daemonset. Installed through the official helm chart (0.19.1)The errors always seem to start appearing on log rotation. This is how the typical log looks:
I set
Rotate_Wait
to 15 seconds. That is why theinotify_fs_remove()
entry lags by 15 seconds from the log rotation entries. Immediately after theinotify_fs_remove()
the errors start appearingThe same for another one:
After the errors start happening, no more logs are being processed for the pod the log files were rotated for until the fluent-bit daemon is restarted
I found a root cause. stream_id is calculated from filename, so in_tail creates same stream_id after rotation. Then in_tail tries to delete old stream_id(= new id), it causes deleting new stream_id instance.
I added below diff to print old/new stream_id.
Then the log indicates these stream_ids are same.
Confirming this looks to be fixed, thanks!
I am able to reproduce the issue (both with the 1.8.8 build and on master) using the following config
Configuration
~/fluent-bit.conf:
~/fluent-bit.logrotate:
The logrotate needs to create a new file (inode) on rotation to match the kubelet log rotation, so no
copytruncate
Steps to reproduce the error:
Rotate_Wait
seconds it showsNotes
Added some log statements to the
flb_ml_stream_get
method mentioned by @RalfWenzel and to theflb_ml_stream_id_destroy_all
method. This is what I foundRotate_wait
seconds it showsThe changes to the 2 methods mentioned are (just added the
fprintf
statements):