fluent-bit: FluentBit spam itself with "error registering chunk with tag"

Bug Report

I see these errors in the aggregator few seconds after it starts. Usually, I see this error after reaching Emitter_Mem_Buf_Limit.

Our forwarder also tail fluentbit logs which exacerbates the problem. These errors will be tailed and re-forwarded to the aggregator which then again generate the same error, and the cycle continues. I don’t see how to repro this, but is there any feature to suppress error messages, for example, don’t generate this error more than x times per min?

To Reproduce

2022-07-20 18:43:21.7122020 | [2022/07/20 18:43:21] [ info] [fluent bit] version=1.9.4, commit=08de43e474, pid=1
2022-07-20 18:43:21.7122250 | [2022/07/20 18:43:21] [ info] [storage] version=1.2.0, type=memory-only, sync=normal, checksum=disabled, max_chunks_up=128
2022-07-20 18:43:21.7122330 | [2022/07/20 18:43:21] [ info] [cmetrics] version=0.3.1
2022-07-20 18:43:21.7123510 | [2022/07/20 18:43:21] [ info] [input:forward:input.forward] listening on 0.0.0.0:24224
2022-07-20 18:43:21.7141520 | [2022/07/20 18:43:21] [ info] [output:forward:forward.mdsd] worker #0 started
2022-07-20 18:43:21.7357780 | [2022/07/20 18:43:21] [ info] [output:forward:forward.mdsd] worker #1 started
2022-07-20 18:43:21.7358770 | [2022/07/20 18:43:21] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020
2022-07-20 18:43:21.7358880 | [2022/07/20 18:43:21] [ info] [sp] stream processor started
2022-07-20 18:43:56.7986370 | [2022/07/20 18:43:56] [error] [input:emitter:re_emitted.container_log] error registering chunk with tag: mdsd.container.log
2022-07-20 18:43:56.7986420 | [2022/07/20 18:43:56] [error] [input:emitter:re_emitted.container_log] error registering chunk with tag: mdsd.container.log
2022-07-20 18:43:56.7986500 | [2022/07/20 18:43:56] [error] [input:emitter:re_emitted.container_log] error registering chunk with tag: mdsd.container.log
2022-07-20 18:43:56.7986540 | [2022/07/20 18:43:56] [error] [input:emitter:re_emitted.container_log] error registering chunk with tag: mdsd.container.log
...

Steps to reproduce the problem:
N/A

Expected behavior FluentBit should not generate the same error messages more than n times per sec/min.

Screenshots

Your Environment

Version used: 1.9.4
Configuration:
Environment name and version (e.g. Kubernetes? What version?): Kubernetes 1.22.11
Server type and version:
Operating System and version:
Filters and plugins: tail, kubernetes, rewrite_tag, lua, forward

Additional context

About this issue

Original URL
State: open
Created 2 years ago
Reactions: 6
Comments: 48 (10 by maintainers)

Most upvoted comments

We ran into the same issue with systemd multiline parsing using 1.9.10:

[2023/01/17 15:07:45] [error] [input:emitter:emitter_for_systemd_linkerd_multiline] error registering chunk with tag: development.linkerd

Perhaps caused by a large batch of unsent logs due to an output misconfiguration. It is extremely concerning that Fluent Bit fails in this manner.

jstaffans-relex on Jan 17, 2023

Hey folks, an update here we have merged a “Log Suppression feature” https://github.com/fluent/fluent-bit/pull/6435 which should be released soon. This should help with errors that continue to show up

agup006 on Jan 18, 2023

I’ve been hitting this issue as well in k8s where the fluent-bit pod frequently becomes OOM and crashes.

On a 30 node cluster, with fluent-bit deployed as a daemonset, there was only one fluent-bit pod repeatably crashing. That pod was on the same node as a very log spammy pod. To test it was the spammy pod, I excluded it and the fluent-bit pod from the path in the tail config. That stopped the crashing, but that’s hardly a fix, as I want those logs as well.

I managed to replicate the errors on a testing environment, where I had to

turn off the fluent-bit daemonset
deployed log-spamming pods to build up several large log files
Wait 10 minutes for the log files to grow larg
turn on the fluent-bit daemonset again.

Then I got the error logs right away the tens of thousands

[2022/11/24 16:45:29] [error] [input:emitter:emitter_for_rewrite_tag.1] error registering chunk with tag: kubernetes.log-spammer [2022/11/24 16:45:29] [error] [input:emitter:emitter_for_rewrite_tag.1] error registering chunk with tag: kubernetes.log-spammer [2022/11/24 16:45:29] [error] [input:emitter:emitter_for_rewrite_tag.1] error registering chunk with tag: kubernetes.fluent-bit-forwarder [2022/11/24 16:45:29] [error] [input:emitter:emitter_for_rewrite_tag.1] error registering chunk with tag: kubernetes.fluent-bit-forwarder

Although I didn’t manage to crash it on the testing environment, it was able to make the error message.

The problem I had on the large k8s cluster seems to have fixed on it’s own the the components got redeployed and there wasn’t a massive backlog of logs for the fluent-bit pods to process.

Hope this can help anyone else.

stebbib on Nov 24, 2022

@vwbusguy recognized the death spiral description as being caused by Fluent Bit both reading to the systemd journal and also reading from it… which happens to be my configuration. In practice, most of the time that configuration works, but once Fluent Bit starts spamming its own logs at some point a threshold is crossed and Fluent Bit can’t keep up with processing as input the logs it’s generating as output…

So I’ll update my configuration to break that loop as a workaround, while this issue can stay focused on preventing Fluent Bit from generating the same log entries repeatedly.

markstos on Nov 22, 2022

I built from master and deployed to a k8s cluster, and it did not seem to have any affect on the “error registering chunk with tag”.

stebbib on Jan 19, 2023

How does one go about estimating the precise value for emitter_mem_buf_limit. Unlike the mem_buf_limits and chunk size, there’s corresponding Prometheus metrics that can help estimate these values. Is there something similar for the emitter? If not, is something like this possible?

alekhrycaiko on Feb 28, 2024

Adjusting the emitter_mem_buf_limit worked in my case.

selmison on Feb 21, 2024

@srikanth-burra I have an estimation calculation here: https://github.com/aws-samples/amazon-ecs-firelens-examples/tree/mainline/examples/fluent-bit/oomkill-prevention

PettitWesley on Jun 12, 2023

I had the same problem when. Problems were issued when emitter_mem_buf_limit exceeded. I’ve changed

[FILTER]
        buffer                 On
        emitter_mem_buf_limit 128MB

also, add workers to the output

[OUTPUT]
        Name            es
        Workers        2

shustrik on Jun 9, 2023

I’ve seen this happen when the emitter_mem_buf_limit is exceeded

PettitWesley on Jun 9, 2023

@agup006 correct me if I’m wrong, the log suppression feature works only for output plugins, unlike fluentd where the rewrite-tag is an output plugin, the rewrite-tag filter in fluent-bit is a filter plugin and as such can’t use the log suppression feature.

This should be specifically for the log files that Fluent Bit generates itself, adding @lecaros @RicardoAAD who might have some sample config we can leverage to set it up

Hello @agup006 as @stebbib has mentioned, this option only acts on messages from output plugins that look similar within an interval of time.

We have created a public FR https://github.com/fluent/fluent-bit/issues/6873 to extend this functionality to input plugins, and other fluent-bit components such as storage, engine, etc.

RicardoAAD on Mar 15, 2023

@markstos Thanks for assisting in that issue, and understand we all have to keep our services up and running 😃. All the best and appreciate your contributions

agup006 on Mar 15, 2023

I’m going to be evaluating Vector instead. This is a critical issue.

https://vector.dev/

Mark Stosberg (he/him)

Director of Systems & Security

@.*** | 765.277.1916

https://www.rideamigos.com https://rideamigos.com/

Changing the way the world commutes.

https://www.linkedin.com/company/rideamigos https://www.twitter.com/rideamigos https://www.facebook.com/rideamigos https://www.instagram.com/rideamigos https://rideamigos.com/newsletter-sign-up/

markstos on Mar 15, 2023

It would be good to have a metric exposed when memory limits like Emitter_Mem_Buf_Limit are reached so people can alerts or act on that.

AlessioCasco on Nov 24, 2022