compose: Logs stop consuming for container, blocking writes to stderr

Description of the issue

Log output of a postgres container stops after running for a while which is causing subsequent database queries to hang due to failing to write to stderr.

Context information (for bug reports)

Output of "docker-compose version"
$ docker-compose version
docker-compose version 1.21.2, build a133471`

$ python -c 'import docker; print docker.version'
3.3.0
Output of "docker version"
Client:
 Version:      18.03.1-ce
 API version:  1.37
 Go version:   go1.9.5
 Git commit:   9ee9f40
 Built:        Thu Apr 26 07:17:20 2018
 OS/Arch:      linux/amd64
 Experimental: false
 Orchestrator: swarm

Server:
 Engine:
  Version:      18.03.1-ce
  API version:  1.37 (minimum version 1.12)
  Go version:   go1.9.5
  Git commit:   9ee9f40
  Built:        Thu Apr 26 07:15:30 2018
  OS/Arch:      linux/amd64
  Experimental: false
Output of "docker-compose config"

Here’s a snippet of the relevant parts:

networks:
  default:
    ipam:
      config:
      - subnet: 192.168.42.0/24
services:
  web:
    build:
      context: /usr/src/app
    environment:
      POSTGRES_1_PORT: tcp://postgres:5432
    expose:
    - '8000'
    links:
    - postgres
    ports:
    - 8000:8000/tcp
    tmpfs:
    - /tmp:rw,noexec,nosuid,nodev,size=1g,mode=777
    volumes:
    - /usr/src/app:/app:rw
    - state:/data:rw
    - runtime:/run/app:rw
  postgres:
    environment:
      PGDATA: /data/postgres
    image: postgres:9.3
    volumes:
    - state:/data:rw
    - runtime:/run/postgresql:rw
version: '2.2'
volumes:
  runtime: {}
  state: {}

Steps to reproduce the issue

  1. Run docker-compose up
  2. Run application test suite from the web container (this will connect to the postgres container, and many queries executed will cause postgres to write out to stderr)

Observed result

The application test suite will eventually hang. strace on the postgres process running the query in the container shows that it’s blocked trying to write to stderr:

12:46:20 write(2, "ERROR: "..., 119

If I run sudo tail -F /proc/<pid>/fd/2 on the same process, the application test suite is no longer hung and resumes.

Expected result

The application test suite should complete without hanging.

Stacktrace / full error message

N/A

Additional information

OS version / distribution, docker-compose install method, etc.

  • Ubuntu 16.04 64-bit
  • pip install docker-compose

The last known good version combination where I cannot reproduce the issue is docker-compose 1.20.1 + docker-py 3.1.4. I start to see the issue if I upgrade docker-py to 3.2.0. A bisect points to this commit as the culprit: https://github.com/docker/docker-py/commit/e9f31e1e277a39eb275cdd3cc29a42adf0bc5094

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 10
  • Comments: 27

Commits related to this issue

Most upvoted comments

We also encountered a similar problem that happens both with python 2 and python 3. It happens when there is a dependency between services and docker-compose up is run with only dependent service explicitly specified. In this case, dependency will also be started, but it’s logs will not be printed to console - and, if that dependency logs a lot, it will be eventually stuck similarly to previous case. Test case:

version: '3'
services:
  dependent:
    image: ubuntu
    command: bash -c 'while true; do date && echo just a note to show container is alive && sleep 1; done'
    links: [logger_dependency]
    stop_signal: SIGKILL
  logger_dependency:
    image: ubuntu
    command: 'yes'
    stop_signal: SIGKILL

It should be run with docker-compose up dependent. Here, links can be replaced with depends_on, effect will be the same. The problem won’t happen with docker-compose up or docker-compose up dependent logger_dependency. Again, yes process can be seen stuck with htop/strace or docker logs --tail 100 -f. This happens both on python 2 and python 3. Killing docker-compose process will, again, fix that, killing yes won’t stop the container and shim will stay running.

Neither of described cases reproduces with docker-compose run.

Compose version:

$ docker-compose version
docker-compose version 1.24.0, build 0aa5906
docker-py version: 3.7.2
CPython version: 3.6.7
OpenSSL version: OpenSSL 1.1.0g  2 Nov 2017

(docker version is the same as for previous test)

Here is a minimal test case (courtesy of @bschlinker).

Program versions:

bschlinker@[red]:~/docker-compose-broken-1.20/$ docker-compose -v
docker-compose version 1.20.0, build ca8d3c6
bschlinker@[red]:~/docker-compose-broken-1.20/$ docker -v
Docker version 17.05.0-ce, build 89658be

OS information:

bschlinker@[red]:~/docker-compose-broken-1.20/$ lsb_release -dri
Distributor ID:	Debian
Description:	Debian GNU/Linux 8.10 (jessie)
Release:	8.10

Test environment:

docker-compose.yml:

version: '2.1'

services:
  testcontainer:
    build: ./testcontainer

testcontainer/Dockerfile:

FROM debian:jessie
ADD entrypoint.sh .
RUN chmod 0755 entrypoint.sh
ENTRYPOINT ["./entrypoint.sh"]

testcontainer/entrypoint.sh:

#!/bin/bash
while true
do
    echo $(date +"%T")
    sleep 120
done

Starting the container:

bschlinker@[red]:~/docker-compose-broken-1.20$ /usr/bin/python /usr/local/bin/docker-compose -f docker-compose.yml up --no-deps --exit-code-from testcontainer testcontainer
WARNING: using --exit-code-from implies --abort-on-container-exit
Starting dockercomposebroken120_testcontainer_1 ... done
Attaching to dockercomposebroken120_testcontainer_1
testcontainer_1  | 17:55:15

We will never see another log message print from the container here. However, we can see that the container continues to print via docker-compose logs:

bschlinker@[red]:~/docker-compose-broken-1.20$ docker-compose logs
Attaching to dockercomposebroken120_testcontainer_1
testcontainer_1  | 17:55:15
testcontainer_1  | 17:57:15
testcontainer_1  | 17:59:15
testcontainer_1  | 18:01:15
testcontainer_1  | 18:03:15
testcontainer_1  | 18:05:15
testcontainer_1  | 18:07:16
testcontainer_1  | 18:09:16
testcontainer_1  | 18:11:16

After messages stop printing from the initial call to docker-compose up, we can see a queue begin to form on the underlying unix socket:

Capturing the PID of docker-compose:

bschlinker@[red]:~/docker-compose-broken-1.20/testcontainer$ sudo ps aux | grep testcontainer
bschlin+ 10537  0.5  1.1 232068 45616 pts/0    Sl+  17:55   0:03 /usr/bin/python /usr/local/bin/docker-compose -f docker-compose.yml up --no-deps --exit-code-from testcontainer testcontainer
...

Recv-Q shows 68:

bschlinker@[red]:~/docker-compose-broken-1.20/testcontainer$ sudo ss -xp | grep 10537
Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port
u_str ESTAB 68 0 * 120919 * 119928  users:(("python",pid=10537,fd=10))

A few minutes later, Recv-Q shows 102:

bschlinker@[red]:~/docker-compose-broken-1.20/testcontainer$ sudo ss -xp | tr -s " " | grep 10537
Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port
u_str ESTAB 102 0 * 120919 * 119928 users:(("python",pid=10537,fd=10))

I’ve updated compose (and dependencies), and I’m still seeing the same problem and the same fix seems to work.

$ docker-compose version
docker-compose version 1.23.2, build 1110ad0
docker-py version: 3.7.0
CPython version: 2.7.5
OpenSSL version: OpenSSL 1.0.2k-fips  26 Jan 2017

I’ve also developed a simple synthetic test I can share; I believe this triggers the same problem as my application. I’ve run it with docker-compose up, and it will run forever with the fix, but without the fix the python process will exit when stderr can no longer be written to without blocking. The container shim and compose will remain running so you have to watch the process list to see python exit; I don’t know if that may be indicative of a separate problem somewhere in the Docker ecosystem or not. The container will stop after compose exits, however.

$ docker-compose config
services:
  tester:
    command: python /tester.py
    image: python:2
    stop_signal: SIGINT
    volumes:
    - /PATH/TO/PROJECT/tester.py:/tester.py:ro
version: '3.0'

compose-stderr-test.zip