kubernetes: fluentd-gcp crashing because of `JournalError: Bad message`
This happened in a 1.6 test: https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gke-serial-release-1-6/929
fluentd crashed/restarted ~40 times during the test because it could not handle the bad message in journal.
2017-11-14 15:40:52 +0000 [error]: unexpected error error_class=Systemd::JournalError error=#<Systemd::JournalError: Bad message>
2017-11-14 15:40:52 +0000 [error]: /var/lib/gems/2.1.0/gems/systemd-journal-1.2.3/lib/systemd/journal.rb:284:in `enumerate_helper'
2017-11-14 15:40:52 +0000 [error]: /var/lib/gems/2.1.0/gems/systemd-journal-1.2.3/lib/systemd/journal.rb:106:in `current_entry'
2017-11-14 15:40:52 +0000 [error]: /var/lib/gems/2.1.0/gems/fluent-plugin-systemd-0.0.8/lib/fluent/plugin/in_systemd.rb:88:in `watch'
2017-11-14 15:40:52 +0000 [error]: /var/lib/gems/2.1.0/gems/fluent-plugin-systemd-0.0.8/lib/fluent/plugin/in_systemd.rb:70:in `run'
2017-11-14 15:40:52 +0000 [info]: shutting down fluentd
This issue fixed in fluentd-0.14.x according to https://github.com/reevoo/fluent-plugin-systemd/issues/16 Not sure if the latest fluentd-gcp image contains the fix or not.
/cc @crassirostris
About this issue
- Original URL
- State: open
- Created 7 years ago
- Comments: 15 (8 by maintainers)
Had this problem today on 3 of my 6 nodes.
Node version 1.8.4-gke.0 fluentd: gcr.io/google-containers/fluentd-gcp:2.0.9
I did fix the problem by deleting the bad messages. I did the following on each node that had errors.
First I identifed the bad messages
Then i moved the offending files (I tend to mv before delete)
And then finally I restarted the systemd-journald.service
sudo systemctl restart systemd-journaldSeeing the same thing
Node version 1.8.6-gke.0 fluentd image: gcr.io/google-containers/fluentd-gcp:2.0.9