fluent-bit: FluentBit spam itself with "error registering chunk with tag"

Bug Report

I see these errors in the aggregator few seconds after it starts. Usually, I see this error after reaching Emitter_Mem_Buf_Limit.

Our forwarder also tail fluentbit logs which exacerbates the problem. These errors will be tailed and re-forwarded to the aggregator which then again generate the same error, and the cycle continues. I don’t see how to repro this, but is there any feature to suppress error messages, for example, don’t generate this error more than x times per min?

To Reproduce

2022-07-20 18:43:21.7122020 | [2022/07/20 18:43:21] [ info] [fluent bit] version=1.9.4, commit=08de43e474, pid=1
2022-07-20 18:43:21.7122250 | [2022/07/20 18:43:21] [ info] [storage] version=1.2.0, type=memory-only, sync=normal, checksum=disabled, max_chunks_up=128
2022-07-20 18:43:21.7122330 | [2022/07/20 18:43:21] [ info] [cmetrics] version=0.3.1
2022-07-20 18:43:21.7123510 | [2022/07/20 18:43:21] [ info] [input:forward:input.forward] listening on 0.0.0.0:24224
2022-07-20 18:43:21.7141520 | [2022/07/20 18:43:21] [ info] [output:forward:forward.mdsd] worker #0 started
2022-07-20 18:43:21.7357780 | [2022/07/20 18:43:21] [ info] [output:forward:forward.mdsd] worker #1 started
2022-07-20 18:43:21.7358770 | [2022/07/20 18:43:21] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020
2022-07-20 18:43:21.7358880 | [2022/07/20 18:43:21] [ info] [sp] stream processor started
2022-07-20 18:43:56.7986370 | [2022/07/20 18:43:56] [error] [input:emitter:re_emitted.container_log] error registering chunk with tag: mdsd.container.log
2022-07-20 18:43:56.7986420 | [2022/07/20 18:43:56] [error] [input:emitter:re_emitted.container_log] error registering chunk with tag: mdsd.container.log
2022-07-20 18:43:56.7986500 | [2022/07/20 18:43:56] [error] [input:emitter:re_emitted.container_log] error registering chunk with tag: mdsd.container.log
2022-07-20 18:43:56.7986540 | [2022/07/20 18:43:56] [error] [input:emitter:re_emitted.container_log] error registering chunk with tag: mdsd.container.log
...
  • Steps to reproduce the problem:
  • N/A

Expected behavior FluentBit should not generate the same error messages more than n times per sec/min.

Screenshots

Your Environment

  • Version used: 1.9.4
  • Configuration:
  • Environment name and version (e.g. Kubernetes? What version?): Kubernetes 1.22.11
  • Server type and version:
  • Operating System and version:
  • Filters and plugins: tail, kubernetes, rewrite_tag, lua, forward

Additional context

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Reactions: 6
  • Comments: 48 (10 by maintainers)

Most upvoted comments

We ran into the same issue with systemd multiline parsing using 1.9.10:

[2023/01/17 15:07:45] [error] [input:emitter:emitter_for_systemd_linkerd_multiline] error registering chunk with tag: development.linkerd

Perhaps caused by a large batch of unsent logs due to an output misconfiguration. It is extremely concerning that Fluent Bit fails in this manner.

Hey folks, an update here we have merged a “Log Suppression feature” https://github.com/fluent/fluent-bit/pull/6435 which should be released soon. This should help with errors that continue to show up

I’ve been hitting this issue as well in k8s where the fluent-bit pod frequently becomes OOM and crashes.

On a 30 node cluster, with fluent-bit deployed as a daemonset, there was only one fluent-bit pod repeatably crashing. That pod was on the same node as a very log spammy pod. To test it was the spammy pod, I excluded it and the fluent-bit pod from the path in the tail config. That stopped the crashing, but that’s hardly a fix, as I want those logs as well.

I managed to replicate the errors on a testing environment, where I had to

  1. turn off the fluent-bit daemonset
  2. deployed log-spamming pods to build up several large log files
  3. Wait 10 minutes for the log files to grow larg
  4. turn on the fluent-bit daemonset again.

Then I got the error logs right away the tens of thousands

[2022/11/24 16:45:29] [error] [input:emitter:emitter_for_rewrite_tag.1] error registering chunk with tag: kubernetes.log-spammer [2022/11/24 16:45:29] [error] [input:emitter:emitter_for_rewrite_tag.1] error registering chunk with tag: kubernetes.log-spammer [2022/11/24 16:45:29] [error] [input:emitter:emitter_for_rewrite_tag.1] error registering chunk with tag: kubernetes.fluent-bit-forwarder [2022/11/24 16:45:29] [error] [input:emitter:emitter_for_rewrite_tag.1] error registering chunk with tag: kubernetes.fluent-bit-forwarder

Although I didn’t manage to crash it on the testing environment, it was able to make the error message.

The problem I had on the large k8s cluster seems to have fixed on it’s own the the components got redeployed and there wasn’t a massive backlog of logs for the fluent-bit pods to process.

Hope this can help anyone else.

@vwbusguy recognized the death spiral description as being caused by Fluent Bit both reading to the systemd journal and also reading from it… which happens to be my configuration. In practice, most of the time that configuration works, but once Fluent Bit starts spamming its own logs at some point a threshold is crossed and Fluent Bit can’t keep up with processing as input the logs it’s generating as output…

So I’ll update my configuration to break that loop as a workaround, while this issue can stay focused on preventing Fluent Bit from generating the same log entries repeatedly.

I built from master and deployed to a k8s cluster, and it did not seem to have any affect on the “error registering chunk with tag”.

How does one go about estimating the precise value for emitter_mem_buf_limit. Unlike the mem_buf_limits and chunk size, there’s corresponding Prometheus metrics that can help estimate these values. Is there something similar for the emitter? If not, is something like this possible?

Adjusting the emitter_mem_buf_limit worked in my case.

I had the same problem when. Problems were issued when emitter_mem_buf_limit exceeded. I’ve changed

[FILTER]
        buffer                 On
        emitter_mem_buf_limit 128MB

also, add workers to the output

[OUTPUT]
        Name            es
        Workers        2 

I’ve seen this happen when the emitter_mem_buf_limit is exceeded

@agup006 correct me if I’m wrong, the log suppression feature works only for output plugins, unlike fluentd where the rewrite-tag is an output plugin, the rewrite-tag filter in fluent-bit is a filter plugin and as such can’t use the log suppression feature.

This should be specifically for the log files that Fluent Bit generates itself, adding @lecaros @RicardoAAD who might have some sample config we can leverage to set it up

Hello @agup006 as @stebbib has mentioned, this option only acts on messages from output plugins that look similar within an interval of time.

We have created a public FR https://github.com/fluent/fluent-bit/issues/6873 to extend this functionality to input plugins, and other fluent-bit components such as storage, engine, etc.

@markstos Thanks for assisting in that issue, and understand we all have to keep our services up and running 😃. All the best and appreciate your contributions

I’m going to be evaluating Vector instead. This is a critical issue.

https://vector.dev/

Mark Stosberg (he/him)

Director of Systems & Security

@.*** | 765.277.1916

https://www.rideamigos.com https://rideamigos.com/

Changing the way the world commutes.

https://www.linkedin.com/company/rideamigos https://www.twitter.com/rideamigos https://www.facebook.com/rideamigos https://www.instagram.com/rideamigos https://rideamigos.com/newsletter-sign-up/

It would be good to have a metric exposed when memory limits like Emitter_Mem_Buf_Limit are reached so people can alerts or act on that.