moby: If a remote TCP syslog server is down, docker does not start.

If you try to start docker with a remote tcp syslog, which is currently unavailable, docker will fail to start.

This is especially problematic in a production environment where you are trying to ship logs to a central place. You could have network issues to your syslog server, and this will stop your production apps from running. I would say that having logs be delayed is optimal to not being able to start your server.

If a node is unresponsive: $ docker run --log-opt syslog-address=tcp://google.com:1212 --log-driver=syslog nginx wait about a minute docker: Error response from daemon: Failed to initialize logging driver: dial tcp 216.58.216.142:1212: getsockopt: connection timed out.

Or if it flat our rejects the connection: docker run --log-opt syslog-address=tcp://localhost:1212 --log-driver=syslog nginx docker: Error response from daemon: Failed to initialize logging driver: dial tcp 127.0.0.1:1212: getsockopt: connection refused.

In both of these cases, your container doesn’t launch. We could use UDP, which would avoid this problem, but also it causes issues if you have intermittent network issues, you will lose logs entirely.

I would suggest that docker still starts, but just caches the logs internally and still attempts to connect and dump its logs, instead of crashing.

Output of docker version:

Client:
 Version:      1.10.3
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   20f81dd
 Built:        Thu Mar 10 21:49:11 2016
 OS/Arch:      darwin/amd64

Server:
 Version:      1.10.3
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   20f81dd
 Built:        Thu Mar 10 21:49:11 2016
 OS/Arch:      linux/amd64

Output of docker info:

Containers: 223
 Running: 15
 Paused: 0
 Stopped: 208
Images: 552
Server Version: 1.10.3
Storage Driver: aufs
 Root Dir: /mnt/sda1/var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 1037
 Dirperm1 Supported: true
Execution Driver: native-0.2
Logging Driver: json-file
Plugins:
 Volume: local
 Network: null host bridge
Kernel Version: 4.1.19-boot2docker
Operating System: Boot2Docker 1.10.3 (TCL 6.4.1); master : 625117e - Thu Mar 10 22:09:02 UTC 2016
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 3.858 GiB
Name: default
ID: WIKN:KOFK:L2P5:I5VU:JGGX:UPHB:4PCB:L7FC:PX3N:F42A:XC4N:AEGA
Debug mode (server): true
 File Descriptors: 137
 Goroutines: 215
 System Time: 2016-04-12T20:46:33.827097672Z
 EventsListeners: 1
 Init SHA1:
 Init Path: /usr/local/bin/docker
 Docker Root Dir: /mnt/sda1/var/lib/docker
Username: tecnobrat
Registry: https://index.docker.io/v1/
Labels:
 provider=virtualbox

About this issue

  • Original URL
  • State: open
  • Created 8 years ago
  • Reactions: 35
  • Comments: 43 (7 by maintainers)

Most upvoted comments

…cuz we’re at a point now where we are having to tell customers… “well, if your remote TCP endpoint for logging isn’t available at the time container is started (for whatever reason), your darn container just won’t start”. People do want to use more sophisticated logging AND they shouldn’t have to rely on syslog UDP (Which isn’t that great a multiline logs), so they’re left with logging to a filesystem, or using some other logging API’s within their app… taking us back to 2005. We’ve got to get to a point where the container will always start if possible the logging drivers are a reliable option.

You can use logspout as a workaround.

@port22 try to run like this:

docker run -d --log-driver=fluentd --log-opt="fluentd-address=localhost:24224" --log-opt="fluentd-async-connect=true" nginx

worked for me

Having the same issue with containers not being able to start if for example the Splunk logging driver has a connection issue. Any news on when we can expect a fix for this? You would expect a soft fail in such situations instead you get a really hard one.

btw, for the Splunk Logging Driver we also have implemented retry logic, so in case of --log-driver splunk --log-opt splunk-verifyconnection=false you will be able to start your container, and in case if Splunk is not available - we will keep trying to send logs. See https://github.com/docker/docker/blob/master/docs/admin/logging/splunk.md#advanced-options for details

@LaurentDumont the PR was continued in https://github.com/docker/docker/pull/25786, which is on the 1.13 milestone

I like this discussion.

@michaelwilde in case if link between Splunk and Host will die while container is running - it will keep running, all logs will go to the log configured for the Docker Daemon as error messages similar to Failed to send log "{MESSAGE}".

We have seen that some of our customers have issues with logging driver, which fails the start when driver cannot connect to the remote host.

In 99% of causes this issue is caused by misconfiguration of logging driver, wrong url or wrong route to the Splunk host. We verify connection intentionally just to inform user that splunk logging driver is configured wrong.

In case of big failure, when driver cannot connect to Splunk for some reason (let’s say link between Docker Host and Splunk cluster is down) but customer need to scale out right now - we still want to identify user that logging does not work and customer needs to take an action - the easiest action is to switch to json logging driver. Later when link between docker host and client will be fixed - customer can index json log as well to keep all the logs in one place.

@nickperry can this workflow be applied to you? Do you see any issues with it? Better way.

Btw, installing Splunk Forwarder locally with Docker daemon is still a good solution. It can give you a lot, including retries and all other benefits.

Btw, I am working right now on some improvements for the Splunk Logging Driver, one of them will be --log-opt splunk-verifyconnection=true|false (which is Discard logic mentioned by @nickperry). If somebody is interesting they can ping me on DockerCon 2016 in Seattle (Splunk will have a booth and a small presentation) and I can show these improvements.

@ionutalexandruisac We’ve seen it. Logging is tricky, change something and you piss off half the user base, don’t change something and you piss of the other half.

Right now docker tries to ensure you never lose logs, unfortunately in this case that means you can’t even start your container… and perhaps worse after your container is started if the remote endpoint goes down for an extended period your container will be blocked on I/O.

Probably need an option to allow for lossy logs that works across all drivers, protocols, etc.

@michaelwilde Docker does not support multiline logging anyway, so syslog UDP would be fine here.

It appears any logging driver (including Splunk) has an issue if a TCP connection occurs and the remote node is not available. It would be preferred for the container to launch and some modicum of retry was available so services were not affected. I’m not exactly sure what happens to the container if it starts and then the remote log receiver goes down… i’d hope the container doesn’t die.

Same issue, same requirement.

I was hoping the mode and max-buffer-size variable could be helpful, if the remote syslog server would come online after some time. But no such luck! 😢 😞

# docker --version
Docker version 18.03.0-ce, build 0520e24

# docker run -it --log-driver syslog --log-opt syslog-address=tcp://logs.mydomain.com:514 --log-opt mode=non-blocking --log-opt max-buffer-size=4m alpine ping 127.0.0.1
docker: Error response from daemon: failed to initialize logging driver: dial tcp aaa.bbb.ccc.ddd:514: getsockopt: connection refused.

Any more news on this? We are still experiencing this with 2017.15.0-CE.

Weirdly enough, I get unknown log opt 'splunk-verify-connection' for splunk log driver with Docker 1.12.1 and Docker-Compose.

Scratch that, just saw it was for the 1.13 milestone 😦

Or at least, it seems that the PR was never merged into 1.12 https://github.com/docker/docker/pull/24652

😢

I’ve just run into this with the Docker Splunk Driver. If for some reason, the remote Splunk server doesn’t answer, the container fails to start. I’m a bit surprised that there isn’t a fallback method where it could default to some other logging method instead of just stopping.

One work around for Splunk customers who want spooling would be to run Splunk universal forwarder on the Docker host with a TCP:// input and point the Splunk Docker log driver at that.

EDIT - sorry I was confused. We’d need a HEC input, not a TCP input and we can’t run the HEC app on a universal forwarder.