kubernetes: BufferQueueLimitError in the fluentd container of the master

I have a cluster in GCE. Major:"1", Minor:"2+", GitVersion:"v1.2.0-alpha.3". I have enabled logging to GCP. Updated from a previous 1.1 version in November.

Yesterday the fluentd container of the master started emitting these errors twice per second approximately:

[warn]: emit transaction failed: error_class=Fluent::BufferQueueLimitError error="queue size exceeds limit" tag="fluent.warn"
[warn]: emit transaction failed: error_class=Fluent::BufferQueueLimitError error="queue size exceeds limit" tag="fluent.warn"
[warn]: emit transaction failed: error_class=Fluent::BufferQueueLimitError error="queue size exceeds limit" tag="kubelet"
[warn]: suppressed same stacktrace

It kept that way until I manually restarted the Docker container today. It didn’t send any logs to the console in the meantime.

Some logged errors had more details:

[error]: failed to emit fluentd's log event tag="fluent.warn" event={"error_class"=>"Fluent::BufferQueueLimitError", "error"=>"queue size exceeds limit", "tag"=>"kubelet", "message"=>"emit transaction failed: error_class=Fluent::BufferQueueLimitError error=\"queue size exceeds limit\" tag=\"kubelet\""} error_class=Fluent::BufferQueueLimitError error=#<Fluent::BufferQueueLimitError: queue size exceeds limit>

Tags are always kubelet, kube-apiserver or fluent.warn in all errors I have seen.

There were some Faraday::Timeout and Faraday::SSLError some hours before, but no other sign of malfunction. I have the minions fluentd pods in a permanent Terminating status which expired in November, but the containers are still alive and working and I think it has been solved in this other issue: https://github.com/kubernetes/kubernetes/issues/17929

If the real cause of this error can’t be found at least the fluentd container of the master should be prepared to restart itself if it cannot work correctly anymore.

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 30 (12 by maintainers)

Commits related to this issue

Most upvoted comments

For who come up with this issue like me, just truncate the docker logs to size 0. sudo salt '*' cmd.run 'truncate -s 0 /var/lib/docker/containers/*/*-json.log'

Reason for this situation in my test env case is that the elasticsearch stateful set was down for some days , when the es pods come back online, fluentd pods will be overwhelmed by sending all the the stashed logs.