logspout: Logspout stops posting logs from some containers

Hello there,

I use logspout in docker cloud where I have multiple app containers in the single server. However logspout sometimes stops sending logs to cloudwatch for some of them. I see no error, they are just missing. Logspout container restart helps. This is my configuration:

logspout:
  autoredeploy: true
  command: 'cloudwatch://eu-west-1?DEBUG=1'
  deployment_strategy: every_node
  environment:
    - ALLOW_TTY=true
    - AWS_ACCESS_KEY_ID=*redacted*
    - AWS_SECRET_ACCESS_KEY=*redacted*
    - INACTIVITY_TIMEOUT=1m
    - LOGSPOUT=ignore
    - LOGSPOUT_GROUP=$DOCKERCLOUD_NODE_HOSTNAME
  image: 'mdsol/logspout:latest'
  restart: on-failure
  volumes:
    - '/var/run/docker.sock:/tmp/docker.sock'

About this issue

Original URL
State: open
Created 6 years ago
Reactions: 1
Comments: 15

Most upvoted comments

I was able to solve this by re-creating the pumps when metrics detect the log stream has halted un-expectedly.

This is a graph of time intervals and whether or not they contained at least 1 log message from each of our container instances. As you can see prior to my changes the logs from certain containers would drop off over time.

When I get time (when im not at work), I can push a slightly-cleaned-up version of my code to a fork for you guys.

(Edit: over longer intervals of time my metrics based alert and re-create system proved to be imperfect. Usually on containers which log infrequently. Its possible for the alert to get confused about the difference between a container that logs infrequently and a container that died. when this happens it does not restart the pump and things can still die permanently)

forestjohnsonpeoplenet on Oct 2, 2019

Please recognize that this is not meant to be a perfect contribution ready for merge. I am just sharing what I did in hopes that someone else can learn from it, take and run with it or use this as a stepping stone to discover a better solution.

Note that I did not test this at all beyond running make so its possible I jacked something up when I was patching this in.

https://github.com/forestjohnsonpeoplenet/logspout/commit/d927709f80fbef1ada6ce1f07d94ca674f646432

My code will attempt to send InfluxDB formatted metrics to a configurable endpoint as well (telegraf or influxdb). Remove this yourself if you wish. Otherwise you can leave it on in UDP mode and it will just scream into the void maybe??

Note that I also changed the default value for the inactivity timeout to 10s. And I added this sleep to prevent accidental DOS attacks against the docker API when the Docker API is being buggy: https://github.com/forestjohnsonpeoplenet/logspout/commit/d927709f80fbef1ada6ce1f07d94ca674f646432#diff-c035a4a8cbe5cb4d97ac075d044b8b84R303

Also, note that my code will adjust sinceTime, pushing it into the past sufficiently to ensure that no log messages will be skipped when the pump is restarted. As a side effect of this, some log messages may be duplicated when this happens. Keep that in mind if you chose to use this code. 🤷‍♂️

Alternatively, you may be able to fix this issue by updating to the the latest verison of docker. I say that because I was not able to re-produce the issue on my local machine when using the latest version of Docker CE.

forestjohnsonpeoplenet on Oct 2, 2019