moby: Container hangs when using fluentd logger
Description Docker container hangs when using fluentd logger with fluentd-async-connect=true and unreachable (turned off) fluentd server.
Steps to reproduce the issue:
- Create docker-compose.yml with any container and logger set to
logging:
driver: fluentd
options:
fluentd-address: 10.147.49.79:24224
fluentd-async-connect: 'true'
fluentd-retry-wait: '1s'
fluentd-max-retries: '30'
tag: your_app
docker-compose up --build -d
Describe the results you received:
Docker container hangs, does not respond to docker-compose down -f
, docker stop
, docker kill
.
Describe the results you expected: Container exits with error message.
Additional information you deem important (e.g. issue happens only occasionally): Issue happens every time the remote fluentd server is down or unreachable.
Output of docker version
:
Client: Docker Engine - Community
Version: 19.03.2
API version: 1.40
Go version: go1.12.8
Git commit: 6a30dfc
Built: Thu Aug 29 05:28:55 2019
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 19.03.2
API version: 1.40 (minimum version 1.12)
Go version: go1.12.8
Git commit: 6a30dfc
Built: Thu Aug 29 05:27:34 2019
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.2.6
GitCommit: 894b81a4b802e4eb2a91d1ce216b8817763c29fb
runc:
Version: 1.0.0-rc8
GitCommit: 425e105d5a03fabd737a126ad93d62a9eeede87f
docker-init:
Version: 0.18.0
GitCommit: fec3683
Output of docker info
:
Client:
Debug Mode: false
Server:
Containers: 30
Running: 29
Paused: 0
Stopped: 1
Images: 47
Server Version: 19.03.2
Storage Driver: overlay2
Backing Filesystem: xfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 894b81a4b802e4eb2a91d1ce216b8817763c29fb
runc version: 425e105d5a03fabd737a126ad93d62a9eeede87f
init version: fec3683
Security Options:
seccomp
Profile: default
Kernel Version: 3.10.0-957.27.2.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 64
Total Memory: 125.3GiB
Name: server55
ID: SHMQ:YUKH:MZOS:EEYR:WZI6:JYVH:NPPJ:CEEQ:ILXV:6JNU:BVBG:2NTE
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled```
**Additional environment details (AWS, VirtualBox, physical, etc.):**
physical
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 30
- Comments: 35 (11 by maintainers)
Commits related to this issue
- Possible fix for #40063. Refer https://github.com/fluent/fluent-logger-golang/pull/77 Signed-off-by: JamesJJ <jj@fcg.fyi> — committed to JamesJJ/moby by JamesJJ 5 years ago
- Try to asynchronously connect to Fluentd before writing PR #77 introduced a new parameter named ForceStopAsyncSend. It can be used to tell the logger to not try to send all the log messages in its bu... — committed to akerouanton/fluent-logger-golang by deleted user 4 years ago
- Try to asynchronously connect to Fluentd before writing PR #77 introduced a new parameter named ForceStopAsyncSend. It can be used to tell the logger to not try to send all the log messages in its bu... — committed to akerouanton/fluent-logger-golang by deleted user 4 years ago
- Try to asynchronously connect to Fluentd before writing PR #77 introduced a new parameter named ForceStopAsyncSend. It can be used to tell the logger to not try to send all the log messages in its bu... — committed to akerouanton/fluent-logger-golang by deleted user 4 years ago
- Try to asynchronously connect to Fluentd before writing PR #77 introduced a new parameter named ForceStopAsyncSend. It can be used to tell the logger to not try to send all the log messages in its bu... — committed to akerouanton/fluent-logger-golang by deleted user 4 years ago
- Try to asynchronously connect to Fluentd before writing PR #77 introduced a new parameter named ForceStopAsyncSend. It can be used to tell the logger to not try to send all the log messages in its bu... — committed to akerouanton/fluent-logger-golang by deleted user 4 years ago
- Try to asynchronously connect to Fluentd before writing PR #77 introduced a new parameter named ForceStopAsyncSend. It can be used to tell the logger to not try to send all the log messages in its bu... — committed to akerouanton/fluent-logger-golang by deleted user 4 years ago
- Properly stop logger during (re)connect failure PR #77 introduced a new parameter named ForceStopAsyncSend. It can be used to tell the logger to not try to send all the log messages in its buffer bef... — committed to akerouanton/fluent-logger-golang by deleted user 4 years ago
- Properly stop logger during (re)connect failure PR #77 introduced a new parameter named ForceStopAsyncSend. It can be used to tell the logger to not try to send all the log messages in its buffer bef... — committed to akerouanton/fluent-logger-golang by deleted user 4 years ago
- Properly stop logger during (re)connect failure PR #77 introduced a new parameter named ForceStopAsyncSend. It can be used to tell the logger to not try to send all the log messages in its buffer bef... — committed to akerouanton/fluent-logger-golang by deleted user 4 years ago
- Properly stop logger during (re)connect failure PR #77 introduced a new parameter named ForceStopAsyncSend. It can be used to tell the logger to not try to send all the log messages in its buffer bef... — committed to akerouanton/fluent-logger-golang by deleted user 4 years ago
- Properly stop logger during (re)connect failure PR #77 introduced a new parameter named ForceStopAsyncSend. It can be used to tell the logger to not try to send all the log messages in its buffer bef... — committed to akerouanton/fluent-logger-golang by deleted user 4 years ago
- Properly stop logger during (re)connect failure PR #77 introduced a new parameter named ForceStopAsyncSend. It can be used to tell the logger to not try to send all the log messages in its buffer bef... — committed to akerouanton/fluent-logger-golang by deleted user 4 years ago
- Properly stop logger during (re)connect failure PR #77 introduced a new parameter named ForceStopAsyncSend. It can be used to tell the logger to not try to send all the log messages in its buffer bef... — committed to akerouanton/fluent-logger-golang by akerouanton 4 years ago
- Properly stop logger during (re)connect failure PR #77 introduced a new parameter named ForceStopAsyncSend. It can be used to tell the logger to not try to send all the log messages in its buffer bef... — committed to akerouanton/fluent-logger-golang by akerouanton 4 years ago
- feat(logging): add fluentd based logging - all containers (except `fluentd` and `chaosgenius-init`) use the new fluentd logger - fluentd itself is a service in the docker compose - logs are stored... — committed to chaos-genius/chaos_genius by Samyak2 3 years ago
- fluentd: add force-stop-async-send option This new flag tells fluent logger lib to discard any queued events when closing (in async mode). It defaults to true when async mode is enabled This new fla... — committed to akerouanton/docker by akerouanton 3 years ago
- feat(logging): add fluentd based logging - a new docker-compose.fluentd.yml - all containers (except `fluentd` and `chaosgenius-init`) use the new fluentd logger - fluentd itself is a service in t... — committed to chaos-genius/chaos_genius by Samyak2 3 years ago
- fluentd: add force-stop-async-send option This new flag tells fluent logger lib to discard any queued events when closing (in async mode). It defaults to true when async mode is enabled This new fla... — committed to akerouanton/docker by akerouanton 3 years ago
+1 on this issue
I have the same problem too
Anything new on this? Thanks
+1 When will this be resolved? It looks like PR has not been merged yet
Is there a plan to bump fluent-logger-golang version to 1.5.0 so that this issue is resolved in Docker version 19.03? Or is there any workaround to make this work in Docker version 19.0.3?
I was going to open a new issue to post my full bug report on this issue but I think we experience the same issue so I’m going to post it here.
Some weeks ago, a teammate experienced docker freezes when using fluentd logger whereas fluentd was down when containers were created and despite the
mode=non-blocking
andfluentd-async-connect=true
options, with docker v19.03. To reproduce the bug you have to:After generating and analysing the goroutine stacktraces, it appears that there’re some goroutine blocked in
semacquire
state, in thefluent-logger-golang
package:After a more thorough analysis, I found out that:
client.processEvent()
-> (beginning of the goroutine)eventQ.Append()
*Daemon.ProcessEvent()
)libcontainerdtypes.EventExit()
container.Reset(false)
(cf. https://github.com/docker/docker-ce/blob/v19.03.4/components/engine/daemon/monitor.go#L61)container.Reset(lock bool)
:Reset puts a container into a state where it can be restarted again.
LogDriver.Close()
->(*Fluent).Close()
In 688e67e1d3, @thaJeztah bumped fluent/fluent-logger-golang from v1.3.0 to v1.4.0. By looking at the diff between v1.3.0 and v1.4.0, it appears that
(*Fluent).Close()
method changed: there were no call towg.Wait()
in previous versions.Since we run with async option, the wait group is incremented at this line. The HTTP call shown above triggers that select, and even if the channel
f.pending
is closed there, since the first log triggered the select, it’s still blocked on that call tof.write()
and then it retries to connect to the fluentd server which is down for a long period of time (actually using an exponential back-off retry strategy). And this is the reason of the freezes we experience.Finally, we had to downgrade to v18.09 to get a working version of Docker and fluentd logger.
EDIT: Actually it looks like this PR on fluent-logger-golang (and PR #40293 on this repo) are trying to solve this exact issue.
When can we expect this issue to be fixed. It is a blocker for me
Agree that it will not be backward compatible. However, currently there is no solution/workaround in sight. Even simple commands like stop containers lead to a hung state. Request for prioritising this for the major release.
How do you create a plugin with the latest fluentd for 19.03 docker? Will this mean that we don’t need a new docker release?
The only solution that works for me on this case is:
Then start the containers as usual.
any updates or workaround guys, I think that I couldn’t wait until 19.04, and so could my penguin 🐧
Thanks for the update @akerouanton, I can see that tests are now passing thanks to the changes you made to CI
PR under review: https://github.com/fluent/fluent-logger-golang/pull/82
@NiR How do I swap the broken fluentd logging driver currently packaged in Docker with your v0.1 of akerouanton/fluentd-async-logger? I’m hoping Docker will fix this soon but until then happy to use your fix.
I added some test cases (
TestCloseOnFailingAsyncConnect
&&TestCloseOnFailingAsyncReconnect
) to fluent-logger-golang. My two PRs are still in review, but I published a v0.1 ofakerouanton/fluentd-async-logger
based on my fixes. Feel free to use it if you face the issue described here.I’m currently working on a logging driver based on the fixed version of fluent-logger-golang and I’m wondering if
ForceStopAsyncSend
should be set to true wheneverRequestAck
isfalse
(as this is already somehow encoding delivery expectations).I think the issue described here is quite expectable in the case of
RequestAck=true
+Async=true
whenever the fluentd server is down. In the other hand, withRequestAck=false
(the default) andAsync=true
, people are actually expecting the logger to stop as soon as possible.Moreover, it means no additional option have to be exposed to Docker users thus, I guess, this could be backported to v19.03. If these assumptions seem reasonable to Docker maintainers, I’m willing to send a patch.
EDIT: Actually
ForceStopAsyncSend
does not work as it’s used by a select run after the select that try to write the log (and the write method is actually the one using exponential back-off retry to lazily and asynchronously connect to fluentd). But I’m gonna send a patch tofluent-logger-golang
to get it working properly.I’ve commented on that issue. Sorry for missing it.