moby: Concurrently starting many containers from same checkpoint leads to error: "context deadline exceeded"
Description
I’m using checkpoint/restore to restore multiple containers from the same checkpoint. Sometimes these restores happen concurrently. In particular, as part of our bootstrap process, we restore 45 containers from the same checkpoint when a new EC2 machine is launched; these restores happen concurrently. What I’m finding is that doing many concurrent restores usually leads to a number of these restores failing. The failure mode is:
"Error response from daemon: shim error: containerd-shim: opening /var/run/docker/libcontainerd/f7bbb5a85645fb1d34b2ab4966b2893c4507c900d77e0104515451f440181d29/init-stdout failed: context deadline exceeded"
In the daemon logs, I always see several occurrences of the following error prior to any occurrences of the “context deadline exceeded” error:
level=error msg="stream copy error: reading from a closed fifo\ngithub.com/docker/docker/vendor/github.com/tonistiigi/fifo.(*fifo).Read\n\t/go/src/github.com/docker/docker/vendor/github.com/tonistiigi/fifo/fifo.go:142\nbufio.(*Reader).fill\n\t/usr/local/go/src/bufio/bufio.go:97\nbufio.(*Reader).WriteTo\n\t/usr/local/go/src/bufio/bufio.go:472\nio.copyBuffer\n\t/usr/local/go/src/io/io.go:380\nio.Copy\n\t/usr/local/go/src/io/io.go:360\ngithub.com/docker/docker/pkg/pools.Copy\n\t/go/src/github.com/docker/docker/pkg/pools/pools.go:60\ngithub.com/docker/docker/container/stream.(*Config).CopyToPipe.func1.1\n\t/go/src/github.com/docker/docker/container/stream/streams.go:119\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:2086"
The application I’m running in my container has a large memory footprint (~1.1G when idle), and so it’s not too surprising that a few dozen concurrent restores of this application leads to some kind of issue. In fact, I’ve confirmed that adding sleep calls in between the restores in our bootstrap prevents the issue from occuring. So, resource overload/exhaustion is definitely at least one part of this. However, I was curious if there’s anything to be done on the Docker/criu side to better support use cases like this.
I’ve been looking into how memory and I/O usage play into this. When I tried doing 45+ concurrent restores of a simple Ubuntu container running a bash loop, I was unable to reproduce the issue. I have, however, been able to reproduce the issue when I run a simple C program in a container which just allocates several hundred MB of memory. I’ve confirmed that the issue doesn’t seem to be with exhausting the host’s memory. I’ve measured memory usage on the host during the bootstrap of our actual application, and the “context deadline exceeded” error starts happening when there’s still 90GB of free memory on the host. Experiments with the toy container have also supported this.
As for I/O, this seems more likely to be where the problem is, and we’ve had issues before with being throttled due to doing too much I/O on our EC2 instance’s root device. However, I/O throttling doesn’t seem to be the issue here. I’ve tried setting the exec-root, docker root (“graph”), and checkpoint-dir all to locations on an instance’s ephemeral storage, which isn’t subject to throttling by AWS and which should achieve better performance, and the issue did not go away or improve. I also tried increasing the number of IOPS provisioned for the root device from 100 to 1000, and this also did not seem to really make a difference.
The other thing I’ve noticed is that the number of concurrent restores is a factor, independent of e.g. memory consumption. I’ve tried playing around with the size of my toy C program and the number of concurrent restores, and I’ve found that the latter perhaps matters more than the former for reproducting the “context deadline exceeded” issue. With a 1.1GB application running in the container, doing 100 concurrent restores will rarely lead to any failures (total of 110GB of memory usage). However, doing 180 concurrent restores of a 550MB application will lead to at least 16 failed restores (total of 99GB of memory usage). Furthermore, doing 250 concurrent restores of a 300MB application will lead to at least 44 failed restores (total of just 75GB of memory usage). It seems like concurrent restores don’t quite scale as well as maybe they should.
Although in my experiments I’ve often done 100-200+ concurrent restores to reproduce the issue, keep in mind that with our actual application, the issue is seen when doing just 45 concurrent restores, which doesn’t seem to be an unreasonably high number.
One last thing is that while doing these experiments, in addition to the error provided above, I’ve also occasionally seen the following criu error: “Error (criu/mount.c:414): mnt: No root found for mountpoint 978 (@./sys/fs/pstore)”. The actual mountpoint flagged varies, but the message is the same.
I’m attaching the docker daemon logs from the bootstrapping of my actual application, as well as the sample C program, Dockerfile, and shell script I used to reproduce the issue more simply. docker_cr_concurrent_restores.zip
Steps to reproduce the issue:
docker build -t test:latest .
docker run -d --security-opt=seccomp:unconfined --name cr test
docker checkpoint create --checkpoint-dir /var/lib/docker/checkpoints cr test_cp
./stress_cr.sh 250
- Vary the number of concurrent restores (argument to
stress_cr.sh
) and the application size inlots_of_memory.c
Describe the results you received: Many restores fail with “context deadline exceeded”
Describe the results you expected: Restores succeed
Output of docker version
:
Client:
Version: 1.13.0-rc3
API version: 1.25
Go version: go1.7.3
Git commit: 4d92237
Built: Tue Dec 6 01:15:44 2016
OS/Arch: linux/amd64
Server:
Version: 1.13.0-rc3
API version: 1.25 (minimum version 1.12)
Go version: go1.7.3
Git commit: 4d92237
Built: Tue Dec 6 01:15:44 2016
OS/Arch: linux/amd64
Experimental: true
Output of docker info
:
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 35
Server Version: 1.13.0-rc3
Storage Driver: overlay
Backing Filesystem: extfs
Supports d_type: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 03e5862ec0d8d3b3f750e19fca3ee367e13c090e
runc version: 51371867a01c467f08af739783b8beafc154c4d7
init version: 949e6fa
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.4.26-040426-generic
Operating System: Ubuntu 14.04.5 LTS
OSType: linux
Architecture: x86_64
CPUs: 16
Total Memory: 120.1 GiB
Name: ip-10-97-5-247
ID: TEID:T5YB:6YAX:E4VG:4QTD:QQQ3:RWRS:6KQU:ALGA:SQTT:TPS6:GZ2I
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
Registry: https://index.docker.io/v1/
Experimental: true
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Additional environment details (AWS, VirtualBox, physical, etc.): AWS EC2. Using r3.4xlarge instance type. 20GB root device with 100 IOPS. Also tried using 1000 IOPS.
Have tried using criu version 2.8 and 2.9.
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 26 (23 by maintainers)
Commits related to this issue
- At most 10 `docker run` commands are allowed in parallel Docker < 1.13.1 has a known problem: if more than 10 containers are created (`docker run`) concurrently, there is a good chance that some of t... — committed to sven-lange-last/openwhisk by sven-lange-last 7 years ago
- At most 10 `docker run` commands are allowed in parallel Docker < 1.13.1 has a known problem: if more than 10 containers are created (`docker run`) concurrently, there is a good chance that some of t... — committed to sven-lange-last/openwhisk by sven-lange-last 7 years ago
- At most 10 `docker run` commands are allowed in parallel. (#2995) Docker < 1.13.1 has a known problem: if more than 10 containers are created (`docker run`) concurrently, there is a good chance that ... — committed to apache/openwhisk by sven-lange-last 7 years ago
- At most 10 `docker run` commands are allowed in parallel. (#2995) Docker < 1.13.1 has a known problem: if more than 10 containers are created (`docker run`) concurrently, there is a good chance that ... — committed to BillZong/openwhisk by sven-lange-last 7 years ago
@tswift242 I think we figured it out. Here is our assumption:
I had forgotten, but
containerd
only allow starting 10 containers at a time, the other ones are queued. Sincedocker
doesn’t account for this, the fifo opening on its side timeouts beforecontainerd-shim
even have the time to start. This is the cause of thestream copy error
. Then later oncontainerd-shim
finally starts and timeout on its side becausedocker
doesn’t have the other end open anymore. This occurs because restoring a checkpoint slows down the whole container start time quite a bit.As a work around could you update your system to only have 10 restore requests at a time? That should prevent the issue without having too big of a performance cost.