moby: [1.11.0] Possible deadlock on container object
Originally reported by @mblaschke in https://github.com/docker/docker/issues/13885#issuecomment-210639112
Creating a different issue because it may be a 1.11 regression.
https://gist.github.com/tonistiigi/9d79de62b2f7919f33a9e987619b9de8 goroutine trace seems to point that lots of goroutines are waiting on a container lock. No obvious goroutine that would keep a lock in that trace so possibly we have a codepath that returns without releasing.
Original report:
Since we updated to 1.11.0 running rspec docker image tests (~10 parallel containers running these tests on a 4 cpu machine) sometimes freezes and fails with a timeout. Docker freezes completely and doesn’t respond (eg. docker ps). This is happening on vserver with Debian strech (btrfs) and with (vagrant) Parallels VM Ubuntu 14.04 (backported kernel 3.19.0-31-generic, ext4).
Filesystem for /var/lib/docker on both servers was cleared (btrfs was recreated) after first freeze. The freeze happens randomly when running these tests.
Stack trace is attached from both servers: docker-log.zip
strace from docker-containerd and docker daemons:
# strace -p 21979 -p 22536
Process 21979 attached
Process 22536 attached
[pid 22536] futex(0x219bd90, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid 21979] futex(0xf9b170, FUTEX_WAIT, 0, NULL
Docker info (Ubuntu 14.04 with backported kernel)
Client:
Version: 1.11.0
API version: 1.23
Go version: go1.5.4
Git commit: 4dc5990
Built: Wed Apr 13 18:34:23 2016
OS/Arch: linux/amd64
Server:
Version: 1.11.0
API version: 1.23
Go version: go1.5.4
Git commit: 4dc5990
Built: Wed Apr 13 18:34:23 2016
OS/Arch: linux/amd64
root@DEV-VM:/var/lib/docker# docker info
Containers: 11
Running: 1
Paused: 0
Stopped: 10
Images: 877
Server Version: 1.11.0
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 400
Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge null host
Kernel Version: 3.19.0-31-generic
Operating System: Ubuntu 14.04.2 LTS
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 3.282 GiB
Name: DEV-VM
ID: KCQP:OGCT:3MLX:TAQD:2XG6:HBG2:DPOM:GJXY:NDMK:BXCK:QEIT:D6KM
Docker Root Dir: /var/lib/docker
Debug mode (client): false
Debug mode (server): false
Registry: https://index.docker.io/v1/
Docker version (Ubuntu 14.04 with backported kernel)
Client:
Version: 1.11.0
API version: 1.23
Go version: go1.5.4
Git commit: 4dc5990
Built: Wed Apr 13 18:34:23 2016
OS/Arch: linux/amd64
Server:
Version: 1.11.0
API version: 1.23
Go version: go1.5.4
Git commit: 4dc5990
Built: Wed Apr 13 18:34:23 2016
OS/Arch: linux/amd64
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 26 (10 by maintainers)
This is confirmed fixed in 1.11.2, as far as I’m concerned.
On Mon, Jun 20, 2016, 03:10 Daniel Huhn notifications@github.com wrote:
I updated our prod cluster to 1.11.2. Now our monitoring (Datadog) reports the daemons going down sometimes but they become responsive again after a minute or two:
However this does now apply to all hosts, even they all run Ubuntu 14.04.4 LTS (with KVM) (3.13.0-88-generic)