moby: Containers cannot be stopped / removed due to rpc error code 2 (setns process caused "exit status 15")
Description
Steps to reproduce the issue:
- Unfortunately we haven’t found a way to reproduce the issue
Describe the results you received:
docker exec
results in an error message
root@docker-linux-1-dh:~# docker exec -it prod_m1af_appserver_paf-as1 ping 8.8.8.8 -c 2
rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "process_linux.go:83: executing setns proc
ess caused \"exit status 15\""
docker stop
anddocker restart
run forever (until Ctrl+C is pressed)docker attach
locks the terminal (Ctrl + C, Ctrl + Z and Ctrl + D don’t work)
This behavior was present only for this container
Describe the results you expected: Docker exec, restart, stop, and attach working fine
Additional information you deem important (e.g. issue happens only occasionally):
- This issue seems to be related to https://github.com/moby/moby/issues/29794, but I wasn’t sure, so I have created a new ticket
- Output of
docker top
was empty:
root@docker-linux-1-dh:~# docker top prod_m1af_appserver_paf-as1
UID PID PPID C STIME TTY TIME CMD
- The container had 3 shims related to it
root@docker-linux-1-dh:~# ps -ef | grep 0ce69c426d21
root 1030 1 0 Aug03 ? 00:00:16 docker-containerd-shim 0ce69c426d213b4ad3e07ba6da934555a6ec36a7edcb3050b2951b1b4a4ca445 /var/run/docker/libcontainerd/0ce69c426d213b4ad3e07ba6da934555a6ec36a7edcb3050b2951b1b4a4ca445 docker-runc
root 4250 1 0 Aug01 ? 00:00:00 docker-containerd-shim 0ce69c426d213b4ad3e07ba6da934555a6ec36a7edcb3050b2951b1b4a4ca445 /var/run/docker/libcontainerd/0ce69c426d213b4ad3e07ba6da934555a6ec36a7edcb3050b2951b1b4a4ca445 docker-runc
root 63381 1 0 Aug03 ? 00:00:00 docker-containerd-shim 0ce69c426d213b4ad3e07ba6da934555a6ec36a7edcb3050b2951b1b4a4ca445 /var/run/docker/libcontainerd/0ce69c426d213b4ad3e07ba6da934555a6ec36a7edcb3050b2951b1b4a4ca445 docker-runc
root 93186 25181 0 23:39 pts/229 00:00:00 grep --color=auto 0ce69c426d21
- We decided to
kill -9
the shim processes and restart the daemon. This unlocked the container and it could be restarted - Some other (potentially) useful info: – Output of ptrace
root@docker-linux-1-dh:~# strace -p 1030
strace: Process 1030 attached
futex(0x7abf70, FUTEX_WAIT, 0, NULL^Cstrace: Process 1030 detached
<detached ...>
root@docker-linux-1-dh:~# strace -p 4250
strace: Process 4250 attached
futex(0x7abf70, FUTEX_WAIT, 0, NULL^Cstrace: Process 4250 detached
<detached ...>
root@docker-linux-1-dh:~# strace -p 30069
strace: Process 30069 attached
futex(0x11604d0, FUTEX_WAIT, 0, NULL^Cstrace: Process 30069 detached
<detached ...>
root@docker-linux-1-dh:~# strace -p 63381
strace: Process 63381 attached
futex(0x7abf70, FUTEX_WAIT, 0, NULL^Cstrace: Process 63381 detached
<detached ...>
– Output of top
root@docker-linux-1-dh:~# top
top - 00:04:54 up 20 days, 8:51, 13 users, load average: 98.12, 97.76, 84.41
Tasks: 6674 total, 16 running, 5690 sleeping, 4 stopped, 964 zombie
%Cpu(s): 41.7 us, 12.0 sy, 0.0 ni, 40.2 id, 3.5 wa, 0.0 hi, 2.7 si, 0.0 st
KiB Mem : 10073190+total, 11610988 free, 79294969+used, 20275830+buff/cache
KiB Swap: 18749896+total, 18439233+free, 31066464 used. 20504608+avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
60084 root 20 0 12.384g 8.853g 70512 S 608.6 0.9 1814:56 java
17747 root 20 0 75.994g 0.012t 0.996g S 356.8 1.3 36915:24 java
54170 ubuntu 20 0 24.525g 1.339g 7972 S 198.8 0.1 139055:24 java
47273 1100 20 0 59.725g 3.919g 9984 S 149.4 0.4 3236:15 java
3754 root 20 0 38.225g 0.022t 306736 S 128.4 2.3 267:14.08 java
44864 root 20 0 9869424 1.039g 4908 S 103.7 0.1 28554:27 java
7981 root 20 0 4404252 421344 3924 S 98.8 0.0 8878:45 prometheus-node
57728 root 20 0 49508 10664 3156 R 98.8 0.0 0:06.39 top
88137 991 20 0 92.637g 4.614g 116568 S 97.5 0.5 54493:43 java
75603 root 20 0 59.463g 0.011t 22556 S 87.7 1.2 108:26.28 java
20143 root 20 0 23.169g 841940 10904 S 77.8 0.1 54:06.26 java
75431 www-data 20 0 438664 63096 47672 S 72.8 0.0 0:04.31 apache2
86880 root 20 0 59.582g 0.013t 14844 S 54.3 1.4 55:37.37 java
77324 root 20 0 50.901g 0.014t 11572 S 51.9 1.5 46:58.80 java
65663 root 20 0 7039944 28180 15064 S 49.4 0.0 0:00.40 java
16548 root 20 0 24.247g 1.575g 6528 S 39.5 0.2 12267:38 java
63027 telegraf 20 0 37.089g 192008 3560 S 38.3 0.0 4381:27 beam.smp
6183 root 20 0 23.548g 1.310g 5384 S 24.7 0.1 7467:03 java
98831 root 20 0 51.066g 8.217g 23260 S 22.2 0.9 18:16.70 java
56114 _apt 20 0 80.593g 0.021t 626812 S 21.0 2.3 6070:12 java
83182 root 20 0 59.552g 5.003g 12796 S 21.0 0.5 22:17.87 java
93232 root 20 0 51.076g 0.013t 14684 S 21.0 1.4 33:26.77 java
57953 root 20 0 55.767g 279596 27384 S 19.8 0.0 3:38.82 dockerd
92413 root 20 0 51.083g 0.014t 14504 S 17.3 1.5 33:00.10 java
26343 root 20 0 14.911g 706740 5080 S 16.0 0.1 494:17.90 java
76737 root 20 0 7045688 177884 24416 S 16.0 0.0 0:20.76 java
89283 root 20 0 8727428 171168 3576 S 16.0 0.0 3751:20 beam.smp
128453 root 20 0 59.476g 4.092g 18612 S 14.8 0.4 8:42.28 java
50447 www-data 20 0 216480 38368 27216 S 11.1 0.0 0:02.17 php-fpm
– Output of docker-runc exec
root@docker-linux-1-dh:~/tmp/CENTRAL-7165# docker-runc exec --cwd / -e PATH=/bin 0ce69c426d213b4ad3e07ba6da934555a6ec36a7edcb3050b2951b1b4a4ca445 ls
nsenter: failed to open ipc: No such file or directory
exec failed: container_linux.go:247: starting container process caused "process_linux.go:83: executing setns process caused \"exit status 15\""
– (Truncated) output of `docker run hello-world``
root@docker-linux-1-dh:/opt/scripts# docker run hello-world
Hello from Docker!
This message shows that your installation appears to be working correctly.
(...)
– Other containers could be restarted normally
root@docker-linux-1-dh:/opt/scripts# docker run --name test1 -d -p 80 nginx:alpine
8b1d528a8cd6a12ccfe5dab532f91f14dc3adb1c3d0ad21c1a9416e900369d52
root@docker-linux-1-dh:/opt/scripts# docker restart 8b1d528a8cd6a12ccf
8b1d528a8cd6a12ccf
– Host had almost 1k zombie processes (seems to be unrelated to this issue and aligned with my comment in https://github.com/moby/moby/issues/31007)
root@docker-linux-1-dh:/opt/scripts# ps -eo uid,pid,ppid,state,wchan:32,cmd | awk '$4 ~ "Z" {print $5}' | sort | uniq -c
964 exit
More debugging info can be found on: https://gist.github.com/thiagoalves/09c25222e2115fcc6a2d219c5f773a41 and on 2017-08-11-000850.tar.gz
Output of docker version
:
root@docker-linux-1-dh:~# docker version
Client:
Version: 17.03.2-ee-4
API version: 1.27
Go version: go1.7.5
Git commit: 1e6d71e
Built: Fri May 19 20:27:23 2017
OS/Arch: linux/amd64
Server:
Version: 17.03.2-ee-4
API version: 1.27 (minimum version 1.12)
Go version: go1.7.5
Git commit: 1e6d71e
Built: Fri May 19 20:27:23 2017
OS/Arch: linux/amd64
Experimental: false
Output of docker info
:
root@docker-linux-1-dh:~# docker version
Client:
Version: 17.03.2-ee-4
API version: 1.27
Go version: go1.7.5
Git commit: 1e6d71e
Built: Fri May 19 20:27:23 2017
OS/Arch: linux/amd64
Server:
Version: 17.03.2-ee-4
API version: 1.27 (minimum version 1.12)
Go version: go1.7.5
Git commit: 1e6d71e
Built: Fri May 19 20:27:23 2017
OS/Arch: linux/amd64
Experimental: false
root@docker-linux-1-dh:~# docker info
Containers: 391
Running: 376
Paused: 0
Stopped: 15
Images: 279
Server Version: 17.03.2-ee-4
Storage Driver: aufs
Root Dir: /opt/io1/docker/aufs
Backing Filesystem: extfs
Dirs: 5401
Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 4ab9917febca54791c5f071a9d1f404867857fcc
runc version: 54296cf40ad8143b62dbcaa1d90e520a2136ddfe
init version: 949e6fa
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.4.0-83-generic
Operating System: Ubuntu 16.04.2 LTS
OSType: linux
Architecture: x86_64
CPUs: 64
Total Memory: 960.7 GiB
Name: docker-linux-1-dh
ID: UGZS:UFD3:GB4C:W5MX:JU2L:K7PH:6ZWS:4GPM:27Q5:UNNN:X3DC:YDT7
Docker Root Dir: /opt/io1/docker
Debug Mode (client): false
Debug Mode (server): true
File Descriptors: 3615
Goroutines: 2259
System Time: 2017-08-11T21:33:20.219422409Z
EventsListeners: 8
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: true
Additional environment details (AWS, VirtualBox, physical, etc.): AWS EC2 - x1.16xlarge
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 7
- Comments: 34 (16 by maintainers)
This one is reproducible on many docker versions (17.03-ee, 17.06-ee, 17.07-ce) with different Linux distros (Ubuntu / CentOS) and different environments (AWS, VirtualBox).
It is actually pretty easy to reproduce it. Just create a Linux box with vagrant (4GB RAM + 4GB swap), install any docker version and execute the following commands:
for i in {1..300}; do docker run -d -it --restart=always --name poc_$i talves/health_poc; done
docker kill -s TERM $(docker ps -q)
docker ps
I was able to consistently reproduce the behavior:
for i in {1..300}; do docker run -d -it --restart=always --name poc_$i talves/health_poc; done
docker kill -s TERM $(docker ps -q)
docker ps
docker exec
on the remaining containersfor c in $(docker ps -q); do docker exec $c ls; done
(if step 3 results in 0 containers, start and stop all containers again a few times)
You will get an output like this one:
We don’t see this issue anymore after we moved to overlay2 from aufs. Also it is not reproducible in test env on overlay2.
@hernandanielg no need, the snippet you gave seems to indicate that you have at least 2 execs that haven’t returned (I’m assuming that the 3rd
docker-containerd-shim
also have a child).Could you give me the output of
docker version
so I know which revision to look at? It looks like the exec shims are not calling waitpid and are not exiting either (which would have had init reap the defunct processes)Thanks a lot. This confirms that it’s not a memory exhaustion issue! I’m going to try to find the best person to debug this on our side. Thanks once again! 👍🏻
I am facing same issue when I try to exec a command in an unstoppable container
The process is in Zombie state
This is the process stack
This is the parent and grand parent processes, shim process is in Sleeping state
@domano I advise that you move to overlay2 if possible. It fixes the problem
Gujys, there is an easy way to get a stuck container -
Deploy container using docker-compose with networking use custom network / bridge
when the container starts, use docker rm -f to remove the container (you should not remove it from network)
check the container status, it will be stuck, you cannot stop it, remove it, or kill it
docker top will display 0 processes running inside the container