fluent-bit: invalid stream_id x, could not append content to multiline context
Bug Report
Describe the bug Since upgrading to 1.8.8 using the fluent/fluent-bit Helm chart, I see these errors in the logs of essentially every daemonset pod.
To Reproduce I’m not actually sure what’s causing this. Is there a way to make FluentBit show me some information about what’s causing this exactly? I would also be ok with a way to disable multiline completely while still parsing structured logs from our services. (We log single JSON lines for the most part, and multiline isn’t necessary.)
Expected behavior FluentBit doesn’t log tons of errors.
Screenshots

Your Environment
- Version used: 1.8.8
- Configuration:
serviceMonitor:
  enabled: true
dashboards:
  enabled: true
  labelKey: grafana_dashboard
  annotations: {}
config:
  inputs: |
    [INPUT]
        Name             tail
        Path             /var/log/containers/*.log
        multiline.parser docker, cri
        Tag              kube.*
        Mem_Buf_Limit    50MB
        Skip_Long_Lines  Off
        Skip_Empty_Lines On
    [INPUT]
        Name systemd
        Tag host.*
        Systemd_Filter _SYSTEMD_UNIT=kubelet.service
        Read_From_Tail On
  outputs: |
    [OUTPUT]
        Name                es
        Match               kube.*
        Host                x
        Port                443
        Logstash_Format     On
        Retry_Limit         False
        Time_Key            @timestamp
        Replace_Dots        On
        Logstash_Prefix     kubernetes_cluster
        Logstash_DateFormat %y.%m.%d.%H
        tls                 on
        tls.verify          on
        tls.debug           1
        Trace_Error         On
        Suppress_Type_Name  On
- Environment name and version (e.g. Kubernetes? What version?): Kubernetes v1.21.4
Additional context I’m trying to discover if there’s something wrong on our end emitting logs or a problem with FluentBit. Thanks in advance!
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 1
- Comments: 16
Commits related to this issue
- in_tail: create stream_id by file inode(#4190) If stream_id is created by filename, rotated file id will be same. It causes releasing new multiline instance after file rotation. Signed-off-by: Takah... — committed to nokute78/fluent-bit by nokute78 3 years ago
- in_tail: create stream_id by file inode(#4190) If stream_id is created by filename, rotated file id will be same. It causes releasing new multiline instance after file rotation. Signed-off-by: Takah... — committed to nokute78/fluent-bit by nokute78 3 years ago
- in_tail: create stream_id by file inode(#4190) If stream_id is created by filename, rotated file id will be same. It causes releasing new multiline instance after file rotation. Signed-off-by: Takah... — committed to nokute78/fluent-bit by nokute78 3 years ago
- in_tail: create stream_id by file inode(#4190) If stream_id is created by filename, rotated file id will be same. It causes releasing new multiline instance after file rotation. Signed-off-by: Takah... — committed to fluent/fluent-bit by nokute78 3 years ago
- in_tail: create stream_id by file inode(#4190) If stream_id is created by filename, rotated file id will be same. It causes releasing new multiline instance after file rotation. Signed-off-by: Takah... — committed to fluent/fluent-bit by nokute78 3 years ago
- in_tail: create stream_id by file inode(#4190) If stream_id is created by filename, rotated file id will be same. It causes releasing new multiline instance after file rotation. Signed-off-by: Takah... — committed to 0Delta/fluent-bit by nokute78 3 years ago
We too are seeing this issue on our production server after starting to use
multiline.parser cri. Using fluent-bit version 1.8.8 on kubernetes 1.21.4. Running fluent-bit as a daemonset. Installed through the official helm chart (0.19.1)The errors always seem to start appearing on log rotation. This is how the typical log looks:
I set
Rotate_Waitto 15 seconds. That is why theinotify_fs_remove()entry lags by 15 seconds from the log rotation entries. Immediately after theinotify_fs_remove()the errors start appearingThe same for another one:
After the errors start happening, no more logs are being processed for the pod the log files were rotated for until the fluent-bit daemon is restarted
I found a root cause. stream_id is calculated from filename, so in_tail creates same stream_id after rotation. Then in_tail tries to delete old stream_id(= new id), it causes deleting new stream_id instance.
I added below diff to print old/new stream_id.
Then the log indicates these stream_ids are same.
Confirming this looks to be fixed, thanks!
I am able to reproduce the issue (both with the 1.8.8 build and on master) using the following config
Configuration
~/fluent-bit.conf:
~/fluent-bit.logrotate:
The logrotate needs to create a new file (inode) on rotation to match the kubelet log rotation, so no
copytruncateSteps to reproduce the error:
Rotate_Waitseconds it showsNotes
Added some log statements to the
flb_ml_stream_getmethod mentioned by @RalfWenzel and to theflb_ml_stream_id_destroy_allmethod. This is what I foundRotate_waitseconds it showsThe changes to the 2 methods mentioned are (just added the
fprintfstatements):