moby: docker events 'hangs' / stops working on unix socket (fifo)
[UPDATE] This problem occurs reading from fifo mounted inside docker container. For some reason unix sockets are not working. Works with a txt file / log file.
Calling the docker events
command --> while read …
loop. Via a FIFO (unix socket). It is meant to prints to stdout a new line of infos, whenever a container starts.
My code is here:
https://github.com/dreamcat4/docker-images/blob/master/pipework/entrypoint.sh#L368
To begin with, it works as it should. If I start 1 container, the while loop is triggered 1 time. Soon after that, I may restart the container (to generate a 2nd start event). And docker events
will print out another line, triggering my while read …
loop correctly once again. That is great. Well done.
However if I go away and leave the docker containers alone for “a while” (such overnight, or just 1 hour, or perhaps less < 1hour). Then I come back and start a container. There is nothing received in the unix pipe. And it seems almost like docker events is hung, or not flush anymore it’s stdout. Or else not printed a newline \n
for to trigger the next read line
.
If I kill $PID
of docker events
command’s process id. On a regular kill (which is a SIGTERM
). Then sometimes doing that will prompt the hung docker events
process to finally spit out whatever missing line(s) e.g. like a flush. Presumably during it’s cleanup, and then quit. Else sometimes it may not print anything and just dies / go away. Sorry that sounds very vague. But it is difficult to understand why this inconsistent behaviour.
My main point is that I really need some help here. Because I am not sure how to progress from this point, and debug further the problem going forwards. It is an important bug for me to solve, one way or another (and without dropping the functionality!).
Is there any new tests I can make to catch it red-handed? What else can I do? Perhaps write some test example script (not my ‘real’ code), with a loop, to print out ‘SUCCESS’ / ‘FAIL’ record. And a constantly re-starting container in another loop? E.g. with some date / time elapsed recorded?
How else to get more infos about the bug? What about strace
command on docker events
? Or how about if we try to attach gdb and get a live stacktrace when it stops responding? What can be honed in on first of all?
$ docker info
Containers: 4
Images: 70
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 78
Execution Driver: native-0.2
Kernel Version: 3.16.0-33-generic
Operating System: Ubuntu 14.10
CPUs: 2
Total Memory: 3.792 GiB
About this issue
- Original URL
- State: closed
- Created 9 years ago
- Comments: 25 (8 by maintainers)
I am guessing in the docker source code it is at here:
https://github.com/docker/docker/blob/v1.6.0-rc5/api/client/commands.go#L1877
Sorry I don’t understand that code where it goes into the function
cli.stream
. It is under the surface. Probably relying upon whatever standard GOlang I/O libraries to achieve that.In addition, when I add a unix pipe on command line (to the end of that chain). Then somehow we end up in trouble and something breaks. As per the
strace
output. Where thefutex(0x134e1b8, FUTEX_WAIT, 0, NULL
. Does not finish printing thatstrace
message.I guess this is the relevant part of (new) strace output: