moby: Docker hang after intensive run/remove operations
Problem description: Create a container (https://github.com/prasmussen/glot-containers) to compile/execution a source file, read out the output and remove this container. Normally each container will exit in seconds, and we will kill this container if it cannot finish in 30 seconds. We use three threads to repeat the whole process in parallel. Docker hang after thousands of normal operations. One process called ‘exe’ consumes 100% CPU of one core. I have to reboot the machine to make docker daemon back to work.
uname -a
Linux xuyun-workpc 4.2.0-25-generic #30-Ubuntu SMP Mon Jan 18 12:31:50 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
docker version
Client:
Version: 1.9.1
API version: 1.21
Go version: go1.4.2
Git commit: a34a1d5
Built: Fri Nov 20 13:20:08 UTC 2015
OS/Arch: linux/amd64
Server:
Version: 1.9.1
API version: 1.21
Go version: go1.4.2
Git commit: a34a1d5
Built: Fri Nov 20 13:20:08 UTC 2015
OS/Arch: linux/amd64
docker info
Containers: 4
Images: 131
Server Version: 1.9.1
Storage Driver: overlay
Backing Filesystem: extfs
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 4.2.0-25-generic
Operating System: Ubuntu 15.10
CPUs: 4
Total Memory: 7.672 GiB
Name: xuyun-workpc
ID: OI5K:OD3L:U44C:4ZWM:N3LZ:BARZ:KDEQ:5DIY:LLBE:UKHP:VBEN:AY6L
WARNING: No swap limit support
After I reboot the machine, I found there are some containers left,
docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
f72001c02ff1 glot/clang:latest "/home/glot/runner" 15 hours ago Created amazing_engelbart
7806233fbb4f glot/java:latest "/home/glot/runner" 15 hours ago Exited (137) 14 hours ago amazing_tesla
59694504b3e8 glot/java:latest "/home/glot/runner" 15 hours ago Dead elated_galileo
64e63df44436 glot/php:latest "/home/glot/runner" 15 hours ago Exited (137) 14 hours ago thirsty_knuth
And from container 596 I found some clue related to overlay driver.
docker logs 596
Error response from daemon: open /var/lib/docker/overlay/59694504b3e88365c9323237c4418c51350941e7498de05584cc37d69f139d09/lower-id: no such file or directory
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Reactions: 1
- Comments: 37 (13 by maintainers)
Commits related to this issue
- Use cpuguy83' patch and add a global cleanup lock I got stuck in #5618 and #17443 for a week without finding a good solution, except for the great cpuguy83's PR #23178 that unlocked at least the read... — committed to etamponi/docker by etamponi 8 years ago
- soft lockup https://github.com/docker/docker/issues/19758 appeared to be unrelated to overlay — committed to AkihiroSuda/issues-docker by AkihiroSuda 8 years ago
I also reproduced with Docker 1.10.0, Ubuntu 15.10 (overlay).
How to reproduce
Run
for f in $(seq 1 1000);do docker run -it --rm ubuntu echo $f; done
concurrently in 3 terminals. (Still not sure whether this is related to concurrency, but I could not reproduce when ran 10k times in a single terminal)dmesg
Had a similar issue to this including
However I was not yet able to reproduce on this kernel using the for loop mentioned above. I even worsened the load by putting
-d
and starting 3000 containers concurrently.In the mean time I recommend setting the kernel to panic on soft lockup as well as reboot on panic to allow your systems to auto-recover to a certain degree. If you started your containers with
restart=always
they should come back up automatically.Here is how you enable the flags:
I’ve created a repo that reproduces this: https://github.com/CVTJNII/docker_lockup_test I had to wind up the concurrency from @AkihiroSuda’s numbers to get it to reproduce consistently, but it is consistent. Note that when the VirtualBox VM goes down it goes down hard. I recommend pre-instrumenting as after it hangs often the only thing to do is a hard reset. This is slightly different then the production failures where the host is usually still up but highly crippled, I’m chalking it up to the VirtualBox VM being smaller and loaded heavier to reduce the time to failure.