moby: Docker logging blocks and never recovers

You can permanently block a production container by listening too slowly on a log socket with any client. Docker eventually blocks the container process, and it never recovers. The container will remain blocked on stdout/stderr unless it’s handling output asynchronously and dealing with output overruns on its own. This is new behavior introduced in November, which utterly changes the behavior of logging in Docker. It is not even reflected in the changelogs.

Output of docker version:

Client:
 Version:      1.10.3
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   20f81dd
 Built:        Thu Mar 10 15:54:52 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.10.3
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   20f81dd
 Built:        Thu Mar 10 15:54:52 2016
 OS/Arch:      linux/amd64

Output of docker info:

Containers: 20
 Running: 20
 Paused: 0
 Stopped: 0
Images: 26
Server Version: 1.10.3
Storage Driver: devicemapper
 Pool Name: docker-202:16-4407302-pool
 Pool Blocksize: 65.54 kB
 Base Device Size: 10.74 GB
 Backing Filesystem: ext4
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 9.992 GB
 Data Space Total: 107.4 GB
 Data Space Available: 97.38 GB
 Metadata Space Used: 11.25 MB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.136 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Data loop file: /mnt/var/lib/docker/devicemapper/devicemapper/data
 WARNING: Usage of loopback devices is strongly discouraged for production use. Either use `--storage-opt dm.thinpooldev` or use `--storage-opt dm.no_warn_on_loop_devices=true` to suppress this warning.
 Metadata loop file: /mnt/var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.02.77 (2012-10-15)
Execution Driver: native-0.2
Logging Driver: json-file
Plugins:
 Volume: local
 Network: null host bridge
Kernel Version: 3.13.0-83-generic
Operating System: Ubuntu 14.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 16
Total Memory: 29.4 GiB
Name: ip-10-3-13-191

Additional environment details (AWS, VirtualBox, physical, etc.):

Running on AWS, repeatable on VMWare

Steps to reproduce the issue:

  1. Simulate a high throughput production app with some heavy timestamp logging: docker run -d busybox sh -c "while true; do date; done" . Note the container ID for later.
  2. Fetch blockingreader.go from here . This connects up to a container, sets up some pipes, and then never reads from them, letting Docker buffer internally.
  3. go get github.com/fsouza/go-dockerclient
  4. docker logs --tail=2 -f <your_container_id> in another terminal
  5. go run blockingreader.go <your_container_id>
  6. Watch the log output for a while and the app will stop logging once it fills the buffer and blocks. You can’t tell from just busybox but the app is not just not logging, it’s actually totally blocked. You could hook GDB up to it if you really want to see. We took stack dumps of our app to prove it. This is bad because you’ve now blocked a production app with a client connection to the logs.
  7. CTRL-C or kill or your blockingreader process which will release the Docker client socket and you would expect to see Docker unblock the container. In extensive testing, this does not happen. Even when the blocked reader is gone, Docker continues to block the container.

Describe the results you received: Docker eventually blocks on high output logging if any client is slow and the internal logging buffer fills. This was not expected. It then also does not release the container when the socket is disconnected. This was really not expected.

Describe the results you expected: Docker would continue to accept logging output from containerized applications without blocking (or leaking GBs of RAM like it used to). I would have expected a ring buffer implementation or some kind of sliding window without blocking at all.

Additional information you deem important (e.g. issue happens only occasionally): 100% repeatable. We tripped into this issue because we use Mozilla’s Heka to log all of our container output. Heka connects to the Docker logs endpoint and sometimes becomes slow under heavy logging. In this way you can take out an entire server of Docker containers because of the permanent blocking behavior of the logs.

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Reactions: 9
  • Comments: 35 (18 by maintainers)

Commits related to this issue

Most upvoted comments

correctness > speed indeed, a production application thats blocked and unusuable because a peripheral system is having a bad time is not correct - it’s completely dead and unusable in a way that might impact customers.

logging is peripheral to the primary function of the application.

This behaviour leads to a system that as soon as you flag an app in docker to enable debug level logging for a while then in the case where the debugging is volumous the system under debug becomes unusable - and impossible to debug. Operationally you’d much rather sacrifice logging integrity.

Systems where logging is so important as to be a critical function of the app will/should not be doing that logging to STD{OUT,ERR}, so sacrificing integrity of STD{OUT,ERR} logs seems reasonable to me over sacrificing the entire app.

Something consuming the data and getting the bogus input is potentially far more dangerous than the decrease in speed.

Someone able to cause a denial of service by using a slow log reader is even more dangerous, as “blocked” in this case is just another way to say DoS. In the field, one simply never assumes that syslogs are 100% complete because syslog, like ICMP, is one of the first things that is dropped to ensure the responsiveness of the service.

Some systems note “## syslog events were dropped” so that purposeful gaps can be differentiated from gaps caused by communication interruption. That may be helpful here to some extent.

+1 @ripienaar @relistan

I understand there are applications where it is absolutely critical that no log is lost. I would argue that is by far outweighed by applications that are more than happy to shed a few logs to survive a heavy load event.

If it were configurable you could give people the option of 100% log capture but I would say the default should be to throw logs on the floor for a consumer with a full buffer.

The fundamental misunderstanding here is that it was not clear to me (and I think also at least the authors of Logspout, and Heka until recently) that /attach and /logs do not hit the same buffers inside the daemon. It doesn’t help when, even with code to look at @aaronlehmann continued to refer to “logs” when he can see that we’re using /attach:

I don’t think dropping data from the log stream would necessarily be the correct behavior. This could break the formatting of the logs and lead to unexpected gaps in the log data.

I was not aware of that and I guess no one at Docker (at least none of you guys) knew that at least two of the major third party logging systems were until not very long ago using the /attach endpoint to consume logs. I can assure you because at New Relic we ran Heka in production for quite some time, and at Nitro for a few months on v1.9, that this did not block containers until v1.10. It did leak memory like crazy when the reader got slow. At New Relic we ran a custom patched build that resolved the memory leak issue that @didip and I patched here.

I am a major contributor on the Heka Docker plugin and we’ve been running it at Nitro just fine until we upgraded to Docker v1.10 in production and the change took out our site not long after. We’ll pull in Rob’s recent changes to the Heka Docker plugin that switches it to the /logs endpoint and expect that it resolves this issue at least.

Nonetheless, this functional change to the behavior of the /attach endpoint really needs to go into the release notes for v1.10. It certainly would have raised a red flag for me. And I suggest adding some clarity to the Remote API documentation for the /attach endpoint which in various places refers to pulling in logs.

I would suggest that a 1MB buffer (not quite, it’s 1e6) is also not anywhere near big enough. With log line average length around 100-150 bytes , this is 6666 to 10k lines of buffer. A high throughput production app can blow that buffer very quickly, especially in debug logging mode.

Finally, an explicit warning that docker attach can block a container is a critical addition. It’s not mentioned in the docs. And anyone with experience prior to v1.10 will be reasonably certain that it doesn’t.

@tonistiigi we’re talking about logs here, right? Having worked on production systems for over 20 years I can’t remember a time when I would have preferred to have complete logs over having the application up and running. The only exception being audit logging, something generally not entrusted to generalized logging anyway. This is the reason many people don’t log to disk, and why traditional stream logging was done over UDP, for example.

I know you said you don’t think it’s about logging speed, but it is. The tool I provided above is the fastest way to make it happen so it doesn’t read anything. In the production environment where this hit us, Heka was consuming the stream, but it had gotten backed up and was slow. I know this because we replaced our log appender with an asynchronous one. We then attached metrics to the remaining buffer size for our asynchronous appender. You can see it go up and down as Heka consumes messages. Of course since we’re on 1.10.3 it eventually sits at low water mark forever because the Docker engine stops consuming.

Making the buffer bigger would make this happen less frequently perhaps, but it is not a solution. Also no one has addressed that this was a fundamental change in the Docker engine and that it wasn’t in the changelogs. I know this is hard to get right all the time, but it should be retro-actively be added to the appropriate changelog so that anyone having not already upgraded can see it.

If there are actually people who prefer this behavior, can we have a CLI flag to switch it? If not, I’m not sure what logging solution I’ll recommend in production that leverages the Docker daemon, including in “Docker: Up and Running”. The only other way around it is to use a shim like we used to with the Spotify logger or some other supervisory process in the container that does the right thing.

Just fyi, for 1.14 we’ve added a new logging mode that you can pass through via log-opt. --log-opt mode=non-blocking --log-opt max-buffer-size=4m, for example. See #28762 for details.