compose: Logs stop consuming for container, blocking writes to stderr
Description of the issue
Log output of a postgres container stops after running for a while which is causing subsequent database queries to hang due to failing to write to stderr.
Context information (for bug reports)
Output of "docker-compose version"
$ docker-compose version
docker-compose version 1.21.2, build a133471`
$ python -c 'import docker; print docker.version'
3.3.0
Output of "docker version"
Client:
Version: 18.03.1-ce
API version: 1.37
Go version: go1.9.5
Git commit: 9ee9f40
Built: Thu Apr 26 07:17:20 2018
OS/Arch: linux/amd64
Experimental: false
Orchestrator: swarm
Server:
Engine:
Version: 18.03.1-ce
API version: 1.37 (minimum version 1.12)
Go version: go1.9.5
Git commit: 9ee9f40
Built: Thu Apr 26 07:15:30 2018
OS/Arch: linux/amd64
Experimental: false
Output of "docker-compose config"
Here’s a snippet of the relevant parts:
networks:
default:
ipam:
config:
- subnet: 192.168.42.0/24
services:
web:
build:
context: /usr/src/app
environment:
POSTGRES_1_PORT: tcp://postgres:5432
expose:
- '8000'
links:
- postgres
ports:
- 8000:8000/tcp
tmpfs:
- /tmp:rw,noexec,nosuid,nodev,size=1g,mode=777
volumes:
- /usr/src/app:/app:rw
- state:/data:rw
- runtime:/run/app:rw
postgres:
environment:
PGDATA: /data/postgres
image: postgres:9.3
volumes:
- state:/data:rw
- runtime:/run/postgresql:rw
version: '2.2'
volumes:
runtime: {}
state: {}
Steps to reproduce the issue
- Run
docker-compose up
- Run application test suite from the
web
container (this will connect to the postgres container, and many queries executed will cause postgres to write out to stderr)
Observed result
The application test suite will eventually hang. strace
on the postgres process running the query in the container shows that it’s blocked trying to write to stderr:
12:46:20 write(2, "ERROR: "..., 119
If I run sudo tail -F /proc/<pid>/fd/2
on the same process, the application test suite is no longer hung and resumes.
Expected result
The application test suite should complete without hanging.
Stacktrace / full error message
N/A
Additional information
OS version / distribution, docker-compose
install method, etc.
- Ubuntu 16.04 64-bit
pip install docker-compose
The last known good version combination where I cannot reproduce the issue is docker-compose 1.20.1 + docker-py 3.1.4. I start to see the issue if I upgrade docker-py to 3.2.0. A bisect points to this commit as the culprit: https://github.com/docker/docker-py/commit/e9f31e1e277a39eb275cdd3cc29a42adf0bc5094
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 10
- Comments: 27
Commits related to this issue
- Only attach services we'll read logs from in up When 'up' is run with explicit list of services, compose will start them together with their dependencies. It will attach to all started services, but ... — committed to asterite3/compose by asterite3 4 years ago
- Only attach services we'll read logs from in up When 'up' is run with explicit list of services, compose will start them together with their dependencies. It will attach to all started services, but ... — committed to docker/compose by asterite3 4 years ago
We also encountered a similar problem that happens both with python 2 and python 3. It happens when there is a dependency between services and
docker-compose up
is run with only dependent service explicitly specified. In this case, dependency will also be started, but it’s logs will not be printed to console - and, if that dependency logs a lot, it will be eventually stuck similarly to previous case. Test case:It should be run with
docker-compose up dependent
. Here,links
can be replaced withdepends_on
, effect will be the same. The problem won’t happen withdocker-compose up
ordocker-compose up dependent logger_dependency
. Again,yes
process can be seen stuck withhtop
/strace
ordocker logs --tail 100 -f
. This happens both on python 2 and python 3. Killingdocker-compose
process will, again, fix that, killingyes
won’t stop the container and shim will stay running.Neither of described cases reproduces with
docker-compose run
.Compose version:
(docker version is the same as for previous test)
Here is a minimal test case (courtesy of @bschlinker).
Program versions:
OS information:
Test environment:
docker-compose.yml
:testcontainer/Dockerfile
:testcontainer/entrypoint.sh
:Starting the container:
We will never see another log message print from the container here. However, we can see that the container continues to print via
docker-compose logs
:After messages stop printing from the initial call to
docker-compose up
, we can see a queue begin to form on the underlying unix socket:Capturing the PID of
docker-compose
:Recv-Q
shows 68:A few minutes later,
Recv-Q
shows 102:I’ve updated compose (and dependencies), and I’m still seeing the same problem and the same fix seems to work.
I’ve also developed a simple synthetic test I can share; I believe this triggers the same problem as my application. I’ve run it with
docker-compose up
, and it will run forever with the fix, but without the fix the python process will exit when stderr can no longer be written to without blocking. The container shim and compose will remain running so you have to watch the process list to see python exit; I don’t know if that may be indicative of a separate problem somewhere in the Docker ecosystem or not. The container will stop after compose exits, however.compose-stderr-test.zip