fluentd: buffer space has too many data errors on k8s cluster
- fluentd version 1.3
- Your configuration
<match service>
@type file
path /fluentd/log/service.%Y-%m-%d.%H.log
append true
format single_value
message_key log
add_newline false
<buffer>
@type file
path /fluentd/buffer/ps
timekey 1h
timekey_use_utc true
timekey_wait 1m
# chunk_limit_records 16777216
# chunk_limit_size 256Mb
flush_mode interval
flush_interval 30s
</buffer>
compress gzip
symlink_path /fluentd/log/service-latest.log
</match>
- Problem description We run a daemonset of fluentd on our kubernetes cluster. We frequently see errors such as
fluentd-nrdqd fluentd 2019-05-12 13:40:30 +0000 [warn]: #0 emit transaction failed: error_class=Fluent::Plugin::Buffer::BufferOverflowError error="buffer space has too many data" location="/fluentd/vendor/bundle/ruby/2.3.0/gems/fluentd-1.3.0/lib/fluent/plugin/buffer.rb:269:in `write'" tag="kubernetes.var.log.containers.service-94596bc68-5hdj5_default_service-f5539e3f3956a646dcee49bcd57aeb1036a82baf0f210e1ad31f64d7cdd6471f.log"
The fluentd server itself runs on a dedicated system outside of the kubernetes cluster. We do see a few warnings on it from time to time
#0 buffer flush took longer time than slow_flush_log_threshold: elapsed_time=24.349883934482932 slow_flush_log_threshold=20.0 plugin_id="object:3f9090f98e80"
We’ve tried setting every setting for chunk_limit
and flush settings to get rid of this error but it doesn’t seem to go away. Is there an obvious error in our configuration that we’re missing?
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 5
- Comments: 16 (1 by maintainers)
BufferOverflowError happens when output speed is slower than incoming traffic. So there are several approaches:
I have the same issue, surprisingly restart of fluend works for a while.
I would appreciate a guidance as well. It’s not clear what buffer is over and how to set size (for buffer/chunk/queue limit) properly. In my case,
fluentbit forwards to fluentd that forwards to another fluentd
. (the buffere overflow errors I see the most in the last fluentd in the row)[328] kube.var.log.containers.fluentd-79cc4cffbd-d9cdg_sre_fluentd-dccc4f286753b75a53c464446af44ffcbeba5ad3a21c9a947a11e94f4c6892b2.log: [1560431258.193260514, {"log"=>"2019-06-13 13:07:38 +0000 [warn]: #0 emit transaction failed: error_class=Fluent::Plugin::Buffer::BufferOverflowError error="buffer space has too many data" location="/usr/lib/ruby/gems/2.5.0/gems/fluentd-1.2.6/lib/fluent/plugin/buffer.rb:269:in `write'" tag="raw.kube.app.obelix" [330] kube.var.log.containers.fluentd-79cc4cffbd-d9cdg_sre_fluentd-dccc4f286753b75a53c464446af44ffcbeba5ad3a21c9a947a11e94f4c6892b2.log: [1560431258.193283014, {"log"=>"2019-06-13 13:07:38 +0000 [warn]: #0 emit transaction failed: error_class=Fluent::Plugin::Buffer::BufferOverflowError error="buffer space has too many data" location="/usr/lib/ruby/gems/2.5.0/gems/fluentd-1.2.6/lib/fluent/plugin/buffer.rb:269:in `write'" tag="kube.var.log.containers.obelix-j6h2n_ves-system_obelix-74bc7f7ecbcb9981c5f39eab9d85b855c5145f299d71d68ad4bef8f223653327.log"
This solution worked for me.
Is there any solution for this? If you continuously loose data this can’t be used in production.
Hoping to get some guidance on our setup… I am using elastic search for the logs… Initially Fluentd pod was throwing the following error: Worker 0 finished unexpectedly with signal SIGKILL Which was resolved after increasing the memory limit to 2Gi… Then we started getting a different fluentd error: [_cluster-elasticsearch_cluster-elasticsearch_elasticsearch] failed to write data into buffer by buffer overflow action=:throw_exception Attempted to resolve the error by tweaking the buffer settings, now we have the following: buffer: timekey: 1m timekey_wait: 30s timekey_use_utc: true chunk_limit_size: 16MB flush_mode: interval flush_interval: 5s flush_thread_count: 8 But I can still see that the buffers size on fluentd is 5.3G (not increasing since last two days) and every so often see the following error: [_cluster-elasticsearch_cluster-elasticsearch_elasticsearch] failed to write data into buffer by buffer overflow action=:throw_exception Buffers size seems to suggest that there are still logs waiting to be pushed to Elastic Search as well as an indication that fluentd is struggling to cope with the logs coming from fluentbit… Please note that I do see some recent logs but not all in elastic search… Appreciate any suggestions…