moby: Docker 1.13 unable to kill unresponsive containers
–>
**Daemon unable to kill containers **
Steps to reproduce the issue:
- run docker 1.13
- try to stop a container, have it fail
- try to kill a container, have it fail.
Describe the results you received:
~# docker kill ss-manager-5-0-0-premium-170302-191041
~# tailf -n 50 /var/log/upstart/docker.log
time="2017-03-12T19:55:32.674806975Z" level=info msg="Container 4aa932810626 failed to exit within 10 seconds of kill - trying direct SIGKILL"
time="2017-03-12T19:55:32.674828202Z" level=info msg="Container 8b0adfb7e123 failed to exit within 10 seconds of kill - trying direct SIGKILL"
time="2017-03-12T19:55:32.676849792Z" level=info msg="Container 94d4b2ed6fba failed to exit within 10 seconds of kill - trying direct SIGKILL"
time="2017-03-12T19:55:32.676537069Z" level=info msg="Container bc30500c7abb failed to exit within 10 seconds of kill - trying direct SIGKILL"
time="2017-03-12T19:55:32.677315493Z" level=info msg="Container 5b142373205f failed to exit within 10 seconds of kill - trying direct SIGKILL"
time="2017-03-12T19:55:32.677341269Z" level=info msg="Container aeae119b7ab6 failed to exit within 10 seconds of kill - trying direct SIGKILL"
time="2017-03-12T19:55:32.677361586Z" level=info msg="Container 04a96afad4c8 failed to exit within 10 seconds of kill - trying direct SIGKILL"
time="2017-03-12T19:55:32.677610496Z" level=info msg="Container fcc9dbb773a4 failed to exit within 10 seconds of kill - trying direct SIGKILL"
time="2017-03-12T19:55:32.677673471Z" level=info msg="Container 61040a84271e failed to exit within 10 seconds of kill - trying direct SIGKILL"
time="2017-03-12T19:55:32.677693876Z" level=info msg="Container bc8447a998e9 failed to exit within 10 seconds of kill - trying direct SIGKILL"
time="2017-03-12T19:55:32.677707919Z" level=info msg="Container 643223aca1bd failed to exit within 10 seconds of kill - trying direct SIGKILL"
time="2017-03-12T19:55:32.677722431Z" level=info msg="Container 722241c15b54 failed to exit within 10 seconds of kill - trying direct SIGKILL"
time="2017-03-12T19:55:32.677804912Z" level=info msg="Container f4131dedc151 failed to exit within 10 seconds of kill - trying direct SIGKILL"
time="2017-03-12T19:55:39.848401195Z" level=info msg="Container 3458aed3490ed82b834af68eef2aff05c7026f2e5e372b6704e621e071a4f24d failed to exit within 10 seconds of signal 15 - using the force"
time="2017-03-12T19:55:39.849196760Z" level=warning msg="container kill failed because of 'container not found' or 'no such process': Cannot kill container 3458aed3490ed82b834af68eef2aff05c7026f2e5e372b6704e621e071a4f24d: rpc error: code = 2 desc = containerd: container not found"
time="2017-03-12T19:55:49.849866345Z" level=info msg="Container 3458aed3490e failed to exit within 10 seconds of kill - trying direct SIGKILL"
time="2017-03-12T19:55:49.869222190Z" level=warning msg="container kill failed because of 'container not found' or 'no such process': Cannot kill container a7c3c05fc8dfb462fb796baf2670837f5bb46fbf08602257c72945b1a3963649: rpc error: code = 2 desc = containerd: container not found"
time="2017-03-12T19:55:56.660633065Z" level=error msg="Handler for POST /containers/1ea92200f734d73e501a2289f0ad7b35877cdd5e452d6402901ff7eb9ba9c62b/start returned error: failed to create endpoint simple-ss-windows.9E7097DC-C95F-A2D5-0FD8-6C8BD0DB8716 on network bridge: adding interface vethf174bd5 to bridge docker0 failed: exchange full"
time="2017-03-12T19:55:59.870261160Z" level=info msg="Container a7c3c05fc8dfb462fb796baf2670837f5bb46fbf08602257c72945b1a3963649 failed to exit within 10 seconds of signal 15 - using the force"
time="2017-03-12T19:55:59.871896923Z" level=warning msg="container kill failed because of 'container not found' or 'no such process': Cannot kill container a7c3c05fc8dfb462fb796baf2670837f5bb46fbf08602257c72945b1a3963649: rpc error: code = 2 desc = containerd: container not found"
time="2017-03-12T19:56:09.740266272Z" level=warning msg="container kill failed because of 'container not found' or 'no such process': Cannot kill container b0fa7e915b55a85b1c99decea3675242b81722620393861644c79e8dd9ee184f: rpc error: code = 2 desc = containerd: container not found"
time="2017-03-12T19:56:09.873369121Z" level=info msg="Container a7c3c05fc8df failed to exit within 10 seconds of kill - trying direct SIGKILL"
time="2017-03-12T19:56:09.889438359Z" level=warning msg="container kill failed because of 'container not found' or 'no such process': Cannot kill container 04a96afad4c8ed5c0ded57507460b83c8a0ef5e4447b205dd702b003fc11bc67: rpc error: code = 2 desc = containerd: container not found"
time="2017-03-12T19:56:19.740961133Z" level=info msg="Container b0fa7e915b55a85b1c99decea3675242b81722620393861644c79e8dd9ee184f failed to exit within 10 seconds of signal 15 - using the force"
time="2017-03-12T19:56:19.741804980Z" level=warning msg="container kill failed because of 'container not found' or 'no such process': Cannot kill container b0fa7e915b55a85b1c99decea3675242b81722620393861644c79e8dd9ee184f: rpc error: code = 2 desc = containerd: container not found"
time="2017-03-12T19:56:19.890020497Z" level=info msg="Container 04a96afad4c8ed5c0ded57507460b83c8a0ef5e4447b205dd702b003fc11bc67 failed to exit within 10 seconds of signal 15 - using the force"
time="2017-03-12T19:56:19.891176708Z" level=warning msg="container kill failed because of 'container not found' or 'no such process': Cannot kill container 04a96afad4c8ed5c0ded57507460b83c8a0ef5e4447b205dd702b003fc11bc67: rpc error: code = 2 desc = containerd: container not found"
time="2017-03-12T19:56:29.743224528Z" level=info msg="Container b0fa7e915b55 failed to exit within 10 seconds of kill - trying direct SIGKILL"
time="2017-03-12T19:56:29.900078298Z" level=info msg="Container 04a96afad4c8 failed to exit within 10 seconds of kill - trying direct SIGKILL"
time="2017-03-12T19:56:29.958444420Z" level=warning msg="container kill failed because of 'container not found' or 'no such process': Cannot kill container 3458aed3490ed82b834af68eef2aff05c7026f2e5e372b6704e621e071a4f24d: rpc error: code = 2 desc = containerd: container not found"
time="2017-03-12T19:56:39.962634702Z" level=info msg="Container 3458aed3490ed82b834af68eef2aff05c7026f2e5e372b6704e621e071a4f24d failed to exit within 10 seconds of signal 15 - using the force"
time="2017-03-12T19:56:39.965354089Z" level=warning msg="container kill failed because of 'container not found' or 'no such process': Cannot kill container 3458aed3490ed82b834af68eef2aff05c7026f2e5e372b6704e621e071a4f24d: rpc error: code = 2 desc = containerd: container not found"
time="2017-03-12T19:56:49.966826818Z" level=info msg="Container 3458aed3490e failed to exit within 10 seconds of kill - trying direct SIGKILL"
time="2017-03-12T19:56:49.977822475Z" level=warning msg="container kill failed because of 'container not found' or 'no such process': Cannot kill container a7c3c05fc8dfb462fb796baf2670837f5bb46fbf08602257c72945b1a3963649: rpc error: code = 2 desc = containerd: container not found"
ERRO[377793] containerd: deleting container error=exit status 1: "container 421284b55846a2443d939f1a1ccf1542b2598e907371fc7df3ff3c56cb51aa53 does not exist\none or more of the container deletions failed\n"
time="2017-03-12T19:56:59.978711321Z" level=info msg="Container a7c3c05fc8dfb462fb796baf2670837f5bb46fbf08602257c72945b1a3963649 failed to exit within 10 seconds of signal 15 - using the force"
time="2017-03-12T19:56:59.980489891Z" level=warning msg="container kill failed because of 'container not found' or 'no such process': Cannot kill container a7c3c05fc8dfb462fb796baf2670837f5bb46fbf08602257c72945b1a3963649: rpc error: code = 2 desc = containerd: container not found"
time="2017-03-12T19:57:06.500490355Z" level=info msg="Container 421284b55846a2443d939f1a1ccf1542b2598e907371fc7df3ff3c56cb51aa53 failed to exit within 10 seconds of signal 15 - using the force"
time="2017-03-12T19:57:06.502819988Z" level=warning msg="container kill failed because of 'container not found' or 'no such process': Cannot kill container 421284b55846a2443d939f1a1ccf1542b2598e907371fc7df3ff3c56cb51aa53: rpc error: code = 2 desc = containerd: container not found"
time="2017-03-12T19:57:09.981052370Z" level=info msg="Container a7c3c05fc8df failed to exit within 10 seconds of kill - trying direct SIGKILL"
time="2017-03-12T19:57:16.503962723Z" level=info msg="Container 421284b55846 failed to exit within 10 seconds of kill - trying direct SIGKILL"
time="2017-03-12T19:57:33.957198699Z" level=warning msg="container kill failed because of 'container not found' or 'no such process': Cannot kill container 421284b55846a2443d939f1a1ccf1542b2598e907371fc7df3ff3c56cb51aa53: rpc error: code = 2 desc = containerd: container not found"
time="2017-03-12T19:57:43.958293987Z" level=info msg="Container 421284b55846a2443d939f1a1ccf1542b2598e907371fc7df3ff3c56cb51aa53 failed to exit within 10 seconds of signal 15 - using the force"
time="2017-03-12T19:57:43.959917250Z" level=warning msg="container kill failed because of 'container not found' or 'no such process': Cannot kill container 421284b55846a2443d939f1a1ccf1542b2598e907371fc7df3ff3c56cb51aa53: rpc error: code = 2 desc = containerd: container not found"
time="2017-03-12T19:57:53.963104972Z" level=info msg="Container 421284b55846 failed to exit within 10 seconds of kill - trying direct SIGKILL"
time="2017-03-12T20:00:32.370642737Z" level=warning msg="container kill failed because of 'container not found' or 'no such process': Cannot kill container 421284b55846a2443d939f1a1ccf1542b2598e907371fc7df3ff3c56cb51aa53: rpc error: code = 2 desc = containerd: container not found"
time="2017-03-12T20:00:42.373388749Z" level=info msg="Container 421284b55846 failed to exit within 10 seconds of kill - trying direct SIGKILL"
time="2017-03-12T20:01:36.605491268Z" level=warning msg="container kill failed because of 'container not found' or 'no such process': Cannot kill container 421284b55846a2443d939f1a1ccf1542b2598e907371fc7df3ff3c56cb51aa53: rpc error: code = 2 desc = containerd: container not found"
time="2017-03-12T20:03:59.049313727Z" level=warning msg="container kill failed because of 'container not found' or 'no such process': Cannot kill container 421284b55846a2443d939f1a1ccf1542b2598e907371fc7df3ff3c56cb51aa53: rpc error: code = 2 desc = containerd: container not found"
time="2017-03-12T20:04:09.050248921Z" level=info msg="Container 421284b55846 failed to exit within 10 seconds of kill - trying direct SIGKILL"
Describe the results you expected:
Additional information you deem important (e.g. issue happens only occasionally):
Output of docker version
:
~# docker version
Client:
Version: 1.13.0
API version: 1.25
Go version: go1.7.3
Git commit: 49bf474
Built: Tue Jan 17 09:50:17 2017
OS/Arch: linux/amd64
Server:
Version: 1.13.0
API version: 1.25 (minimum version 1.12)
Go version: go1.7.3
Git commit: 49bf474
Built: Tue Jan 17 09:50:17 2017
OS/Arch: linux/amd64
Experimental: false
Output of docker info
:
Containers: 1024
Running: 1023
Paused: 0
Stopped: 1
Images: 35
Server Version: 1.13.0
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 2120
Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 03e5862ec0d8d3b3f750e19fca3ee367e13c090e
runc version: 2f7393a47307a16f8cee44a37b262e8b81021e3e
init version: 949e6fa
Security Options:
apparmor
Kernel Version: 4.4.0-64-generic
Operating System: Ubuntu 14.04.5 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.859 GiB
Name: docker-sg-singapore-5.connectify.me
ID: 4W6K:UJSX:HMHT:EK47:2IAV:OH6Z:QQXJ:5UMR:QW6V:GJYH:3F4T:6VHN
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Additional environment details (AWS, VirtualBox, physical, etc.): Simplercloud VPS
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 8
- Comments: 36 (15 by maintainers)
@seb-elico This fix has been around for awhile. It would be difficult to figure out which releases it is in, but certainly in both 17.09 and 17.12… and should be in 17.06 and likely 17.03.
@cpuguy83 Do you have anything specific about that fix, like a commit or something?
We’re getting this semi regularly on 17.06.0-ce, 17.06.2-ce and 17.09.0-ce (ubuntu 16.04) on kernel 4.11 and 4.13.
Specifically, we are seeing containers in the output of docker ps, but the Pid referenced in the output of docker inspect is gone.
When trying to kill such containers, we get:
docker-containerd
is running, but there is nodocker-containerd-shim
for the container that I’m trying to kill.running
docker rm -f [container_id]
seems to work to cleanup the container.Sadly, I do not have have a way to reproduce this issue on demand. The workload on these nodes is quite CPU and memory intensive (aka I’ve seen this issue occur for containers processes which had been killed by the OOM killer, also memory pressure is a regular thing on the nodes…)
I’d say this is causing the issue:
There doesn’t seem to be anyone reading from that channel anymore, but I admit I’m getting lost in the logging code. @cpuguy83 is that an issue that we know we fixed in a newer version?