moby: Upgrade to 20.10 breaks log tailing: unexpected EOF errors

Description

Our E2E test suite uses testcontainers-java to run our microservices and their dependencies in containers for the duration of our tests. All of the tests were passing with 19.03.14 until we updated our build nodes to Docker 20.10.1, at which point we started repeatedly getting STDOUT: $Error grabbing logs: unexpected EOF errors, and our Wait.forLogMessage calls started timing out.

Note however, that this issue seems sporadic as about 1 out of 10 builds still passes. After we reverted to Docker 19.03.14 the issue went away. We’re using the latest version of Testcontainers (1.15.1 at the time of reporting the issue).

After contacting the developers I was told this is most likely a Docker issue.

Steps to reproduce the issue: Unfortunately our projects are not OSS, so I cannot give you the exact containers used. I’ll try to describe a matching scenario.

  1. Set up a project that uses the latest version of testcontainers-java (1.15.1).
  2. Set up a docker image that produces a lot of standard output.
  3. Start the image using testcontainers-java and wait for a log message using Wait.forLogMessage().
  4. A huge percentage of the times this will result in the error message STDOUT: $Error grabbing logs: unexpected EOF errors and the wait policy will not trigger even though the messages are present in the log.

Describe the results you received: Error message: STDOUT: $Error grabbing logs: unexpected EOF errors, wait policy for log message not triggering even though the message was present in the log.

Describe the results you expected: No EOF error, wait policy for log message working.

Additional information you deem important (e.g. issue happens only occasionally): As mentioned in the description the issue seems sporadic, but it happens in about 80+% of our CI runs.

Output of docker version: Previous version where the tests were passing:

Client: Docker Engine - Community
 Version:           19.03.14
 API version:       1.40
 Go version:        go1.13.15
 Git commit:        5eb3275d40
 Built:             Tue Dec  1 19:20:26 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.14
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       5eb3275d40
  Built:            Tue Dec  1 19:18:53 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.3.9
  GitCommit:        ea765aba0d05254012b0b9e595e995c09186427f
 runc:
  Version:          1.0.0-rc10
  GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

Current version where the issue is present:

Client:
 Version:           20.10.1
 API version:       1.41
 Go version:        go1.15.6
 Git commit:        831ebeae96
 Built:             Tue Dec 15 22:25:01 2020
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server:
 Engine:
  Version:          20.10.1
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.15.6
  Git commit:       f0014860c1
  Built:            Tue Dec 15 22:24:28 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.4.3
  GitCommit:        269548fa27e0089a8b8278fc4fc781d7f65a939b.m
 runc:
  Version:          1.0.0-rc92
  GitCommit:        ff819c7e9184c13b7c2607fe6c30ae19403a7aff
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Output of docker info:

Previous version where the tests were passing:

Client:
 Debug Mode: false

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 92
 Server Version: 19.03.14
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: ea765aba0d05254012b0b9e595e995c09186427f
 runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
 init version: fec3683
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 5.4.0-58-generic
 Operating System: Ubuntu 20.04.1 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 3
 Total Memory: 15.64GiB
 Name: build1
 ID: QWYR:EE5C:DWTE:MY65:MQ7Y:FERG:F7J2:IDTT:7LVR:VHSE:REFY:6AQH
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No swap limit support

Current version where the issue is present:

Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Build with BuildKit (Docker Inc., v0.5.1-tp-docker)

Server:
 Containers: 134
  Running: 8
  Paused: 0
  Stopped: 126
 Images: 1390
 Server Version: 20.10.1
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 269548fa27e0089a8b8278fc4fc781d7f65a939b.m
 runc version: ff819c7e9184c13b7c2607fe6c30ae19403a7aff
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 5.9.14-arch1-1
 Operating System: Arch Linux
 OSType: linux
 Architecture: x86_64
 CPUs: 8
 Total Memory: 31.24GiB
 Name: RE02WL03
 ID: YC3H:JCWA:APFN:VOQ7:CWPG:B4VO:5FYW:UPVL:2H5B:PO4H:SXO7:LSGH
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No blkio weight support
WARNING: No blkio weight_device support

The current version and info parts were copied from an Arch box, while the previous version and info parts were copied from an Ubuntu box, but we could also reproduce the issue on Ubuntu after upgrading to 20.10.

Additional environment details (AWS, VirtualBox, physical, etc.):

  • Could reproduce using both Ubuntu 20.10 and Arch Linux.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 30
  • Comments: 57 (21 by maintainers)

Commits related to this issue

Most upvoted comments

https://github.com/moby/moby/pull/42104 should take care of this, but will need to be backported after it is merged.

I’ve upgraded the docker version on a test machine and still have the same warnings:

Aug 19 09:13:44 test-docker-0 dockerd[17267]: time="2021-08-19T09:13:44.285053897Z" level=warning msg="got error while decoding json" error="unexpected EOF" retries=0
Client: Docker Engine - Community
 Version:           v20.10.8
 API version:       1.41
 Go version:        go1.16.6
 Git commit:        aa949f2ad
 Built:             Thu Aug 12 15:10:34 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          v20.10.8
  API version:      1.42 (minimum version 1.12)
  Go version:       go1.16.6
  Git commit:       91dc595e96
  Built:            Thu Aug 12 15:10:34 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.9
  GitCommit:        e25210fe30a0a703442421b0f60afac609f950a3
 runc:
  Version:          1.0.1
  GitCommit:        v1.0.1-0-g4144b63
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Okay, I have come up with an easy way to reproduce it:

docker run -d --rm --name teh-logs ubuntu seq 10000000
docker logs -f teh-logs > logs.txt

In my case, I get the output:

% docker run -d --rm --name teh-logs ubuntu seq 10000000
14cabbd7f8515e5e0631a39c29535177bfaeba8c97e76269e5318f49c3b306e6
% docker logs -f teh-logs > logs.txt
error from daemon in stream: Error grabbing logs: unexpected EOF

I checked a few of my instances at random and they are running version 20.10.6 where the bug should be fixed, but instead I still see lots of warning messages when journalctl -afu docker.service is executed:

level=warning msg="got error while decoding json" error="unexpected EOF" retries=X

Maybe there’s something else causing it too?

@cpuguy83 did you run ˋdocker logs -fˋ? Do you use the json-file logging?

We’ve hit the same bug in the https://github.com/trinodb/trino while running our product tests (testcontainers based). When Trino is starting, it generates a lot of logs. During the startup, 20% of runs fail with EOF.

It’s pretty straightforward to reproduce it locally:

git clone https://github.com/trinodb/trino
cd trino
./mvnw clean install -DskipTests
./testing/trino-product-tests-launcher/bin/run-launcher test run --no-bind --environment  multinode-tls-kerberos -- -t io.trino.tests.cli.TestTrinoCli

I can’t see it in the 20.10.6 milestone, where I think a lot of us would hope to.

Thanks @tobiasstadler for the steps to reproduce. I’ve also tried your change and the log output no longer exits with the error 👍. The daemon log is still filled with errors still, though this could be resolved by better placement of the logrus error to the bottom of the function.

I’ve found that the issue won’t appear until the logs are queried for the running container. To do this I needed to run multiple terminals.

The following will reproduce the error in development (docker-dev:dry-run-test): Terminal 1:

  • make run

Terminal 2 (nyancat):

  • docker exec -it $(docker ps -q --filter ancestor=docker-dev:dry-run-test) bash
  • docker run -it --rm --name nyancat 06kellyjac/nyancat:alpine

Terminal 3 (docker logs):

  • docker exec -it $(docker ps -q --filter ancestor=docker-dev:dry-run-test) bash
  • docker logs --tail 1 -f nyancat

Maybe

d.rdr = io.MultiReader(d.dec.Buffered(), d.rdr)
d.dec = json.NewDecoder(d.rdr)

should be

reader := io.MultiReader(d.dec.Buffered(), d.rdr)
d.dec = json.NewDecoder(reader)

in func (d *decoder) Decode() (msg *logger.Message, err error)?

@cpuguy83 I think it is related to the json-file logging driver. docker logs -f foo on docker run --log-driver=json-file -it --name foo ubuntu sh -c "apt-get update && apt-get install -y nyancat && nyancat" reproduces the error, while docker run --log-driver=local -it --name foo ubuntu sh -c "apt-get update && apt-get install -y nyancat && nyancat" doesn’t.

Also I see a lot of level=warning msg="got error while decoding json" error="unexpected EOF" retries=... in ~/Library/Containers/com.docker.docker/Data/log/vm/dockerd.log (on macOS)

I seems like it is not able to recover from a json parsing error

I am still encountering these errors, while running the latest docker version: 20.10.6. Can somebody else confirm?

How to reproduce (tested on Ubuntu 20.04.2 LTS & MacOS): Terminal 1: docker run -it --name foo ubuntu sh -c "apt-get update && apt-get install -y nyancat && nyancat" Terminal 2: docker logs -f foo (expect the same “You have nyaned for x seconds” screen as in terminal 1) Terminal 3 (Ubuntu): journalctl -u docker -f Terminal 3 (MacOS): tail -f ~/Library/Containers/com.docker.docker/Data/log/vm/dockerd.log Terminal 3 (other OS): see https://stackoverflow.com/a/30970134/9103163 how to view the logs of the docker daemon process on your OS.

The error is showing up immediately.

+1 to @tobiasstadler 's last comment, the issue is only present if you run docker logs -f on an already running container.

Thanks, I’ll give the nyan reproducer a try tomorrow.

Thumbs down because it’s not “just output a lot of logs”, otherwise it would have been reproduced.

The thread is full of one-liners that people used to reproduce the issue, I’m not sure I can provide anything better. As multiple people pointed out, just use a container that produces a lot of logs, then tail it’s logs.

This issue prevents us from updating Docker since early December, as it makes all of our tests flaky. Can we please get some feedback from the developer team if this was successfully reproduced/is being looked into?

I wasn’t able to reproduce using @pete-woods’s approach either, but it always seems to happen in a few seconds if I run

docker run -it --name foo ubuntu sh -c "apt-get update && apt-get install -y nyancat && nyancat"

in one terminal window, and then

docker logs -f foo

in a separate terminal window.

For what it’s worth, it also happens if I tail the logs over Docker’s HTTP API.

Running Docker 20.10.2 on MacOS on a Macbook Pro.

Indeed we do handle the case where there can be a race between reading/writing a json entry. It seems like we have messed up the byte stream somehow.

Working on getting more debugging in here (at least adding container ID to these error logs from the daemon).

Do you have log rotation or anything else enabled? I have not been able to reproduce this trying in a few different ways.

I also can confirm this is happening. Worked fine before upgrade of docker engine from 19 to 20.

I have multiple containers(about 7 running), 3 of them produce generous output.

Usually, for one or two containers the logging fails with the EOF error during integration testing.

docker info
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Build with BuildKit (Docker Inc., v0.4.2-docker)
  scan: Docker Scan (Docker Inc., v0.5.0)

Server:
 Containers: 7
  Running: 7
  Paused: 0
  Stopped: 0
 Images: 20
 Server Version: 20.10.0
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 269548fa27e0089a8b8278fc4fc781d7f65a939b
 runc version: ff819c7e9184c13b7c2607fe6c30ae19403a7aff
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 4.19.121-linuxkit
 Operating System: Docker Desktop
 OSType: linux
 Architecture: x86_64
 CPUs: 6
 Total Memory: 4.825GiB
 Name: docker-desktop
 ID: QDG2:L34D:MF6P:RWBK:ULT5:CTNT:GNSN:UCNB:K2EU:U4GP:ZVE6:VIA7
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 HTTP Proxy: gateway.docker.internal:3128
 HTTPS Proxy: gateway.docker.internal:3129
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Logs in the dockerd are filled with the following:

2021-01-04T17:58:55Z dockerd time="2021-01-04T17:58:55.781629156Z" level=warning msg="got error while decoding json" error="unexpected EOF" retries=6765
2021-01-04T17:58:55Z dockerd time="2021-01-04T17:58:55.781650830Z" level=warning msg="got error while decoding json" error="unexpected EOF" retries=6766
2021-01-04T17:58:55Z dockerd time="2021-01-04T17:58:55.781666688Z" level=warning msg="got error while decoding json" error="unexpected EOF" retries=6767
2021-01-04T17:58:55Z dockerd time="2021-01-04T17:58:55.782293790Z" level=warning msg="got error while decoding json" error="unexpected EOF" retries=6768
2021-01-04T17:58:55Z dockerd time="2021-01-04T17:58:55.782321359Z" level=warning msg="got error while decoding json" error="unexpected EOF" retries=6769
2021-01-04T17:58:55Z dockerd time="2021-01-04T17:58:55.782337211Z" level=warning msg="got error while decoding json" error="unexpected EOF" retries=6770
2021-01-04T17:58:55Z dockerd time="2021-01-04T17:58:55.782352359Z" level=warning msg="got error while decoding json" error="unexpected EOF" retries=6771
2021-01-04T17:58:55Z dockerd time="2021-01-04T17:58:55.782382550Z" level=warning msg="got error while decoding json" error="unexpected EOF" retries=6772

Best regards, Sorin