fluent-bit: Docker multiline parser errors

Bug Report

Describe the bug When using the docker multiline parser we get a lot of errors in the following format.

[2021/07/29 08:27:45] [error] [multiline] invalid stream_id 1817450727403209240, could not append content to multiline context

To Reproduce Our fluent-bit config which is running via the official Helm chart on an EKS cluster.

[SERVICE]
  Daemon Off
  Flush 1
  Log_Level info
  HTTP_Server On
  HTTP_Listen 0.0.0.0
  HTTP_Port 2020
  storage.path /var/log/flb-storage/
  storage.sync normal
  storage.checksum off
  storage.max_chunks_up 128
  storage.backlog.mem_limit 16M
  storage.metrics on
  Parsers_File parsers.conf
  Parsers_File custom_parsers.conf

[INPUT]
  Name tail
  Path /var/log/containers/*.log
  multiline.parser docker, cri
  Tag kube.*
  Skip_Long_Lines On
  Buffer_Chunk_Size 32k
  Buffer_Max_Size 256k
  DB /var/log/flb-storage/tail.db
  DB.Sync normal
  storage.type filesystem

[FILTER]
  Name kubernetes
  Match kube.*
  Merge_Log On
  Merge_Log_Key log_processed
  Merge_Log_Trim On
  Keep_Log Off
  K8S-Logging.Parser On
  K8S-Logging.Exclude On

[OUTPUT]
  Name forward
  Match *
  Host fluentd.logging.svc
  Port 24224

Expected behavior

Screenshots

Your Environment

  • Version used: v1.8.2
  • Configuration: See above
  • Environment name and version (e.g. Kubernetes? What version?): EKS v1.21
  • Server type and version: n/a
  • Operating System and version: n/a
  • Filters and plugins: See above

Additional context

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 29
  • Comments: 52 (11 by maintainers)

Most upvoted comments

FYI: we are taking a look at this

Fixed by #4197, it has been merged on v1.8 branch for the next release.

Just encountered the same issue as well when running a performance test (app container generating a burst of logs; Fluent Bit tailing that container log file).

Specifically, here is what we observed:

* The error is triggered by a log rotation of the input container log file

* Fluent Bit generates hundreds of error messages like this:
  `[2021/09/03 18:47:40] [error] [multiline] invalid stream_id 4395029823782301522, could not append content to multiline context`

* More importantly, **Fluent Bit seems to drop the input logs!**
  i.e. Shortly after we see the above multiline error messages in Fluent Bit, the logs stop showing up in our output destination -- in fact, it looks like the logs are no longer even making it to our output plugin.

* Interestingly, even though this is a multiline parser issue, none of the logs in our performance test were multiline -- the max length of any of our logs was 1055 characters (~1KB), which is too small to be split into multiple lines by containerd.

* A few people mentioned that they only saw the issue with Docker-formatted logs, but we experienced this issue with CRI (specifically, containerd) formatted logs.

* Also, we tried both the following multiline parser settings, in our _tail_ input plugin, but it made no difference -- we got the same error in both cases:
  
  * `multiline.parser  cri, docker`
  * `multiline.parser  cri`

* We experienced the same error for the following versions of Fluent Bit:
  
  * `1.8.5`
  * `1.8.4`

* Fluent Bit config:
[INPUT]
  name              tail
  tag               kube.*
  path              /var/log/containers/*.log
  path_key          log_file
  read_from_head    false
  refresh_interval  2
  rotate_wait       5
  db                /var/log/fluentbit-container-logs.db
  db.sync           normal
  db.locking        true
  buffer_chunk_size 128KB
  buffer_max_size   1MB
  skip_long_lines   on
  mem_buf_limit     199800KB
  multiline.parser  cri, docker

We’re experiencing the same issue with same conditions. Just after error firstly occurs, logs are dropped from output. None of our config changes made it work as expected.

Just encountered the same issue as well when running a performance test (app container generating a burst of logs; Fluent Bit tailing that container log file).

Specifically, here is what we observed:

  • The error is triggered by a log rotation of the input container log file
  • Fluent Bit generates hundreds of error messages like this: [2021/09/03 18:47:40] [error] [multiline] invalid stream_id 4395029823782301522, could not append content to multiline context
  • More importantly, Fluent Bit seems to drop the input logs! i.e. Shortly after we see the above multiline error messages in Fluent Bit, the logs stop showing up in our output destination – in fact, it looks like the logs are no longer even making it to our output plugin.
  • Interestingly, even though this is a multiline parser issue, none of the logs in our performance test were multiline – the max length of any of our logs was 1055 characters (~1KB), which is too small to be split into multiple lines by containerd.
  • A few people mentioned that they only saw the issue with Docker-formatted logs, but we experienced this issue with CRI (specifically, containerd) formatted logs.
  • Also, we tried both the following multiline parser settings, in our tail input plugin, but it made no difference – we got the same error in both cases:
    • multiline.parser cri, docker
    • multiline.parser cri
  • We experienced the same error for the following versions of Fluent Bit:
    • 1.8.5
    • 1.8.4
  • Fluent Bit config:
[INPUT]
  name              tail
  tag               kube.*
  path              /var/log/containers/*.log
  path_key          log_file
  read_from_head    false
  refresh_interval  2
  rotate_wait       5
  db                /var/log/fluentbit-container-logs.db
  db.sync           normal
  db.locking        true
  buffer_chunk_size 128KB
  buffer_max_size   1MB
  skip_long_lines   on
  mem_buf_limit     199800KB
  multiline.parser  cri, docker

We are encountering the same issue on 1.8.3, I can gladly attach more details if needed.

I’ve pinned image version to 1.8.3 manually and can unfortunately say the error is still happening. Reboot seems to help temporarily. I managed to spot a moment when this error starts for the first time before the buffer gets flooded with error lines:

[2021/08/03 08:44:16] [ info] [input:tail:tail.0] inode=1557613 handle rotation(): /var/log/containers/kafka-4_dev_kafka-f97e589f47fc9ab06afc29de75be59626703adf2a86dbc268cdc6d582e2a3edb.log => /var/lib/docker/containers/f97e589f47fc9ab06afc29de75be59626703adf2a86dbc268cdc6d582e2a3edb/f97e589f47fc9ab06afc29de75be59626703adf2a86dbc268cdc6d582e2a3edb-json.log.1
[2021/08/03 08:44:16] [ info] [input:tail:tail.0] inotify_fs_remove(): inode=1557613 watch_fd=16
[2021/08/03 08:44:16] [ info] [input:tail:tail.0] inotify_fs_add(): inode=1557613 watch_fd=31 name=/var/lib/docker/containers/f97e589f47fc9ab06afc29de75be59626703adf2a86dbc268cdc6d582e2a3edb/f97e589f47fc9ab06afc29de75be59626703adf2a86dbc268cdc6d582e2a3edb-json.log.1
[2021/08/03 08:44:18] [ info] [input:tail:tail.0] inotify_fs_add(): inode=1557743 watch_fd=32 name=/var/log/containers/kafka-4_dev_kafka-f97e589f47fc9ab06afc29de75be59626703adf2a86dbc268cdc6d582e2a3edb.log
[2021/08/03 08:44:23] [ info] [input:tail:tail.0] inotify_fs_remove(): inode=1557613 watch_fd=31
[2021/08/03 08:44:23] [error] [multiline] invalid stream_id 9275124730216209964, could not append content to multiline context

Seeing the error starts happening just after log rotation is detected by the tail input, I think they must be related?

EDIT 1: Could be linked to https://github.com/fluent/fluent-bit/issues/3883?

EDIT 2: Also, I’ve just seen another log rotation happening but it didn’t trigger the issue so the problem might not be that easy to reproduce.

EDIT 3: Another error started happening just after log rotation:

[fluent-bit-jm7sq] [2021/08/03 09:50:23] [ info] [input:tail:tail.0] inotify_fs_remove(): inode=3072067 watch_fd=6
[fluent-bit-jm7sq] [2021/08/03 09:50:23] [ info] [input:tail:tail.0] inotify_fs_add(): inode=3072067 watch_fd=8 name=/var/lib/docker/containers/489e96776a48bce950c0c760a966299f966143303490004b0b3b1eb0fe05b73f/489e96776a48bce950c0c760a966299f966143303490004b0b3b1eb0fe05b73f-json.log.1
[fluent-bit-jm7sq] [2021/08/03 09:50:23] [ info] [input:tail:tail.0] inotify_fs_add(): inode=3072065 watch_fd=9 name=/var/log/containers/cluster-autoscaler-9bbbb844b-qg8vk_kube-system_cluster-autoscaler-489e96776a48bce950c0c760a966299f966143303490004b0b3b1eb0fe05b73f.log
[fluent-bit-jm7sq] [2021/08/03 09:50:28] [ info] [input:tail:tail.0] inotify_fs_remove(): inode=3072067 watch_fd=8
[fluent-bit-jm7sq] [2021/08/03 09:50:31] [error] [multiline] invalid stream_id 1629899175750051748, could not append content to multiline context

I have had 1.8.9 deployed for a few days now. Didn’t occur again, would happen multiple times per day before. Looks like it’s fixed. Thanks @nokute78

@PettitWesley (or anyone else) have you tested that this does actually fix the issue?

I’ve also opened fluent/helm-charts#175 to update the Helm chart to use this version.

My test rig where I was using td-agent-bit from the command line (without Kubernetes or Docker), and pushing large files through it interspersed with manual rotations while using the tail input … well, on 1.8.5 this would fail very fast, very consistently. As in within 2-3 rotations of large files. On 1.8.9, I have yet to encounter the “invalid streamid” error after twenty rotations. That may seem like a little, but actually it was more than enough to see multiple failures before.

I just deployed 1.8.9 in our cluster, now monitoring over the day if it’s fixed.

Do we have an ETA for this one?

We’re having the same issue with FluentBit v1.8.3.

[2021/08/23 15:40:48] [error] [multiline] invalid stream_id 6842456611372491590, could not append content to multiline context
[2021/08/23 15:40:48] [error] [multiline] invalid stream_id 4350310108622390529, could not append content to multiline context
[INPUT]
    Name              tail
    Tag               kube.*
    Path              /var/log/containers/*.log
    multiline.parser docker, cri
    Exclude_Path *_kube-system_*.log,*_kube-public*.log,*_kube-node-lease*.log,*_octo-canary*.log,*_stsrelay*.log
    DB                /run/fluent-bit/flb_kube.db
    DB.locking        true
    DB.journal_mode   WAL
    storage.type      filesystem
    Mem_Buf_Limit     4MB
    Skip_Long_Lines   On
    Refresh_Interval  30
    Rotate_Wait       30
    Buffer_Max_Size   2MB
    Buffer_Chunk_Size 32K
    Read_from_Head    true
    Ignore_Older      24h

Kubernetes versions

  • v1.19.7
  • v1.20.8

The error come up gradually and no logs are collected by FluentBit eventually. We had to restart the FluentBit Pods.

Pod spec

 apiVersion: v1
 kind: Pod
 metadata:
   name: multiline-demo
   labels:
     app: web
 spec:
   containers:
     - name: test
       image: busybox
       args: [/bin/sh, -c, 'while true; do echo "START$(head -c 170000 /dev/zero | tr "\0" "+")END"; sleep 1; done']
       ports:
         - containerPort: 88
       resources:
          limits:
            cpu: 100m
            memory: 2Gi
          requests:
            cpu: 100m
            memory: 2Gi
     - name: test-2
       image: busybox
       args: [/bin/sh, -c, 'while true; do echo "START$(head -c 170000 /dev/zero | tr "\0" "+")END"; sleep 1; done']
       ports:
         - containerPort: 88
       resources:
          limits:
            cpu: 100m
            memory: 2Gi
          requests:
            cpu: 100m
            memory: 2Gi

Fluent-bit config

   [INPUT]
       Name tail
       Tag kube.*
       DB /var/log/flb_kube.db
       Mem_Buf_Limit 10MB
       multiline.parser cri
       Path /var/log/containers/*.log
       Refresh_Interval 10
       Rotate_Wait 30
       Skip_Long_Lines On

    [OUTPUT]
        Name stdout
        Match * 

Tail plugin works fine until the log-rotation. Once log rotates then we see could not append content to multiline context

        2021-08-02T14:07:51.231-07:00	{"log":"[2021/08/02 21:07:51] [ info] [input:tail:tail.0] inotify_fs_add(): inode=1727284 watch_fd=2 name=/var/log/containers/multiline-demo_default_test-2-a22fdaa97fbcad64fd66ad10c390198c083d38d993d86f9df891941d759f7b65.log"}
	2021-08-02T14:07:55.968-07:00	{"log":"time=\"2021-08-02T21:07:55Z\" level=info msg=\"[cloudwatch 0] Created log stream from-fluent-bit-kube.var.log.containers.multiline-demo_default_test-2-a22fdaa97fbcad64fd66ad10c390198c083d38d993d86f9df891941d759f7b65.log in group filter-cm-test\""}
	2021-08-02T17:32:10.275-07:00	{"log":"[2021/08/03 00:32:10] [ info] [input:tail:tail.0] inode=1727102 handle rotation(): /var/log/containers/multiline-demo_default_test-531a293af516b7148c09b5316c5c99161e2ce31a1a7ad0636b44b8bf8ddd9127.log =\u003e /var/log/pods/default_multiline-demo_1ea2d294-9b72-4e55-b41b-0949c55faf4d/test/0.log.20210803-003210"}
	2021-08-02T17:32:10.275-07:00	{"log":"[2021/08/03 00:32:10] [ info] [input:tail:tail.0] inotify_fs_remove(): inode=1727102 watch_fd=1"}
	2021-08-02T17:32:10.276-07:00	{"log":"[2021/08/03 00:32:10] [ info] [input:tail:tail.0] inotify_fs_add(): inode=1727102 watch_fd=3 name=/var/log/pods/default_multiline-demo_1ea2d294-9b72-4e55-b41b-0949c55faf4d/test/0.log.20210803-003210"}
	2021-08-02T17:32:10.941-07:00	{"log":"[2021/08/03 00:32:10] [ info] [input:tail:tail.0] inotify_fs_add(): inode=1727311 watch_fd=4 name=/var/log/containers/multiline-demo_default_test-531a293af516b7148c09b5316c5c99161e2ce31a1a7ad0636b44b8bf8ddd9127.log"}
	2021-08-02T17:32:40.897-07:00	{"log":"[2021/08/03 00:32:40] [ info] [input:tail:tail.0] inotify_fs_remove(): inode=1727102 watch_fd=3"}
	2021-08-02T17:32:42.854-07:00	{"log":"[2021/08/03 00:32:42] [error] [multiline] invalid stream_id 8649363796039376046, could not append content to multiline context"}
	2021-08-02T17:32:42.854-07:00	{"log":"[2021/08/03 00:32:42] [error] [multiline] invalid stream_id 8649363796039376046, could not append content to multiline context"}

To reproduce the issue, apply a pod spec that writes bunch of chars to the stdout, kubelet will rotate the log files once the limit is reached (10MB). I was able to reproduce the issue consistently under 2mins.