moby: Container stuck, can't be stopped or killed, can't exec into it either

Description

Steps to reproduce the issue: 1.Stop a container from rancher 2.Status is reported as Stopping 3.Login to rancher host , use docker commands to try to stop or kill the container

Describe the results you received:

# docker ps |grep robin
6fde7857082f        brightpowersoftware/robin-statement:27       "/.r/r java -XX:+U..."   20 hours ago        Up 15 hours                             r-bp-robin-statement-robin-statement-1-389a5813

[root@ip-10-30-0-193 log]# docker stop 6fde7857082f
6fde7857082f
[root@ip-10-30-0-193 log]# docker kill 6fde7857082f
6fde7857082f
# docker ps |grep robin
6fde7857082f        brightpowersoftware/robin-statement:27       "/.r/r java -XX:+U..."   20 hours ago        Up 15 hours                             r-bp-robin-statement-robin-statement-1-389a5813

[root@ip-10-30-0-193 log]# docker exec -ti 6fde7857082f bash
rpc error: code = 2 desc = containerd: container not found


docker.log:
time="2017-04-25T22:32:58.671506327Z" level=warning msg="container kill failed because of 'container not found' or 'no such process': Cannot kill container 6fde7857082f06185279934dadc939749584d8aac911156b6fae5b962a05010a: rpc error: code = 2 desc = containerd: container not found" 
time="2017-04-25T22:33:03.005621630Z" level=info msg="Container 6fde7857082f failed to exit within 10 seconds of kill - trying direct SIGKILL" 
time="2017-04-25T22:33:03.005961036Z" level=error msg="collecting stats for 6fde7857082f06185279934dadc939749584d8aac911156b6fae5b962a05010a: rpc error: code = 2 desc = containerd: container not found" 
time="2017-04-25T22:33:03.013659966Z" level=error msg="collecting stats for 6fde7857082f06185279934dadc939749584d8aac911156b6fae5b962a05010a: rpc error: code = 2 desc = containerd: container not found" 
time="2017-04-25T22:33:03.851538781Z" level=error msg="collecting stats for 6fde7857082f06185279934dadc939749584d8aac911156b6fae5b962a05010a: rpc error: code = 2 desc = containerd: container not found" 
time="2017-04-25T22:33:04.653255006Z" level=error msg="collecting stats for 6fde7857082f06185279934dadc939749584d8aac911156b6fae5b962a05010a: rpc error: code = 2 desc = containerd: container not found" 
time="2017-04-25T22:33:05.767981020Z" level=error msg="collecting stats for 6fde7857082f06185279934dadc939749584d8aac911156b6fae5b962a05010a: rpc error: code = 2 desc = containerd: container not found" 
time="2017-04-25T22:33:06.279304801Z" level=error msg="collecting stats for 6fde7857082f06185279934dadc939749584d8aac911156b6fae5b962a05010a: rpc error: code = 2 desc = containerd: container not found" 
time="2017-04-25T22:33:08.189192027Z" level=error msg="collecting stats for 6fde7857082f06185279934dadc939749584d8aac911156b6fae5b962a05010a: rpc error: code = 2 desc = containerd: container not found" 
time="2017-04-25T22:33:09.692933913Z" level=error msg="collecting stats for 6fde7857082f06185279934dadc939749584d8aac911156b6fae5b962a05010a: rpc error: code = 2 desc = containerd: container not found" 
time="2017-04-25T22:33:10.511364640Z" level=error msg="collecting stats for 6fde7857082f06185279934dadc939749584d8aac911156b6fae5b962a05010a: rpc error: code = 2 desc = containerd: container not found" 
time="2017-04-25T22:33:11.134226201Z" level=error msg="collecting stats for 6fde7857082f06185279934dadc939749584d8aac911156b6fae5b962a05010a: rpc error: code = 2 desc = containerd: container not found" 
time="2017-04-25T22:33:12.225999176Z" level=error msg="collecting stats for 6fde7857082f06185279934dadc939749584d8aac911156b6fae5b962a05010a: rpc error: code = 2 desc = containerd: container not found" 
time="2017-04-25T22:36:15.829833261Z" level=warning msg="libcontainerd: client is out of sync, restore was called on a fully synced container (6fde7857082f06185279934dadc939749584d8aac911156b6fae5b962a05010a)." 
time="2017-04-25T22:36:15.830257050Z" level=warning msg="libcontainerd: failed to retrieve container 6fde7857082f06185279934dadc939749584d8aac911156b6fae5b962a05010a state: rpc error: code = 2 desc = containerd: container not found" 
time="2017-04-25T22:36:28.544300083Z" level=info msg="Removing stale sandbox 70784eabf5de958d6930a4d951db3f8b18d08cd581a238d705c0c9ff2c00216e (6fde7857082f06185279934dadc939749584d8aac911156b6fae5b962a05010a)" 

Describe the results you expected: Container should be stopped/killed

Additional information you deem important (e.g. issue happens only occasionally): Happens occasionally, almost daily as of this week, with several unrelated containers

Output of docker version:

# docker version
Client:
 Version:      17.03.1-ce
 API version:  1.27
 Go version:   go1.7.5
 Git commit:   c6d412e
 Built:        Tue Mar 28 00:40:02 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.03.1-ce
 API version:  1.27 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   c6d412e
 Built:        Tue Mar 28 00:40:02 2017
 OS/Arch:      linux/amd64
 Experimental: false

Output of docker info:

# docker info
Containers: 79
 Running: 33
 Paused: 0
 Stopped: 46
Images: 137
Server Version: 17.03.1-ce
Storage Driver: overlay
 Backing Filesystem: extfs
 Supports d_type: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local rancher-nfs
 Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 4ab9917febca54791c5f071a9d1f404867857fcc
runc version: 54296cf40ad8143b62dbcaa1d90e520a2136ddfe
init version: N/A (expected: 949e6facb77383876aeff8a6944dde66b3089574)
Security Options:
 seccomp
  Profile: default
Kernel Version: 4.9.21-rancher
Operating System: RancherOS v1.0.0
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 14.94 GiB
Name: ip-10-30-0-193.ec2.internal
ID: DVSV:NLW5:VD5U:UOYY:FIOB:4RTH:7NGQ:GIBO:KKTR:VPYO:J6A2:4SZE
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.): AWS

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 4
  • Comments: 56 (24 by maintainers)

Most upvoted comments

Running into the same issue on 17.09.1:

  • container is “running” but the process isn’t
  • docker kill and docker stop return successfully after sending KILL twice, but docker ps still shows the container
  • only known workaround is restarting the Docker daemon
  • container kill failed because of 'container not found' or 'no such process' in the daemon logs).

@thaJeztah I use Docker for Windows:

> docker -v
Docker version 17.12.0-ce, build c97c6d6

I updated Docker several days ago. Before it I’ve never seen this error. It happens periodically after several hours of working. And I cannot find special reasons what it happens.

Maybe experiencing this issue in a single-node Swarm setup with some services that are memory hogs and often reach their memory limit. Sometimes manually using:container kill and exec works. Note that such containers seem to be always listed in desired state shutdown when running docker service ourservice ps (but they are definitely running and handling requests, or trying). Maybe unrelated, but in certain cases, by inspecting these containers I see Health check exceeded timeout, but even for containers that do not have this issue (different, faster healthcheck script) the situation is the same and they keep on running as unhealthy.

System information:

DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=17.10
DISTRIB_CODENAME=artful
DISTRIB_DESCRIPTION="Ubuntu 17.10"

Linux linux 4.13.0-41-generic #46-Ubuntu SMP Wed May 2 13:38:30 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Output of docker version:

Client:
 Version:      18.03.1-ce
 API version:  1.37
 Go version:   go1.9.5
 Git commit:   9ee9f40
 Built:        Thu Apr 26 07:17:38 2018
 OS/Arch:      linux/amd64
 Experimental: false
 Orchestrator: swarm

Server:
 Engine:
  Version:      18.03.1-ce
  API version:  1.37 (minimum version 1.12)
  Go version:   go1.9.5
  Git commit:   9ee9f40
  Built:        Thu Apr 26 07:15:45 2018
  OS/Arch:      linux/amd64
  Experimental: false

Output of docker info:

Containers: 93
 Running: 21
 Paused: 0
 Stopped: 72
Images: 43
Server Version: 18.03.1-ce
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 384
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: jejb6my7n50ulnilktgd2fxof
 Is Manager: true
 ClusterID: 0yus20uq607uzugzqbv9a1vzr
 Managers: 1
 Nodes: 1
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 10
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 0
 Autolock Managers: false
 Root Rotation In Progress: false
 Node Address: 192.168.2.102
 Manager Addresses:
  192.168.2.102:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 773c489c9c1b21a6d78b5c538cd395416ec50f88
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.13.0-41-generic
Operating System: Ubuntu 17.10
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 47.16GiB
Name: linux
ID: FLM4:OCTS:BWQR:ZLRQ:HVGG:VBSW:O5NE:IA2W:4Z6T:SS47:4BSE:A6ZT
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Username:
Registry:
Labels:
 provider=generic
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
May 16 15:21:09 linux dockerd[997]: time="2018-05-16T15:21:09.172490443+02:00" level=warning msg="Health check for container 91084a44ab9c1002f2f1e65c3ec998906b6a9b438f60ba475e6bb2e12d7c5a75 error: context cancelled"
May 16 15:21:10 linux dockerd[997]: time="2018-05-16T15:21:10.327939529+02:00" level=warning msg="Health check for container bc9b6dddb8321340e0a5c8080da0099471e125785e519813420534462cd39fee error: context cancelled"
May 16 15:21:10 linux dockerd[997]: time="2018-05-16T15:21:10.389474082+02:00" level=warning msg="Ignoring Exit Event, no such exec command found" container=91084a44ab9c1002f2f1e65c3ec998906b6a9b438f60ba475e6bb2e12d7c5a75 exec-id=7ca40d114b118b085dd0c7fa097c24cb8b12b8f4acab8adf4c4aaddd7962ab1b exec-pid=28793
May 16 15:21:14 linux dockerd[997]: time="2018-05-16T15:21:14.840198354+02:00" level=warning msg="Ignoring Exit Event, no such exec command found" container=bc9b6dddb8321340e0a5c8080da0099471e125785e519813420534462cd39fee exec-id=3843a8c2f70d3074327d5d092b614033737967a94a1adea1b27768c459110b42 exec-pid=28950
May 16 15:21:16 linux dockerd[997]: time="2018-05-16T15:21:16.310862168+02:00" level=warning msg="Health check for container 2112097b7e1f2f67d87dca97a961bbed30c8f531d1a0031a542e978f1dba6eb1 error: context cancelled"
May 16 15:21:19 linux dockerd[997]: time="2018-05-16T15:21:19.235450078+02:00" level=warning msg="Ignoring Exit Event, no such exec command found" container=2112097b7e1f2f67d87dca97a961bbed30c8f531d1a0031a542e978f1dba6eb1 exec-id=f5980a924b59e20b23b5e9372e5406cdca7c5d3c6a151a9b4cdc08f96291ade1 exec-pid=29446
May 16 15:21:22 linux dockerd[997]: time="2018-05-16T15:21:22.487333194+02:00" level=warning msg="Health check for container bc9b6dddb8321340e0a5c8080da0099471e125785e519813420534462cd39fee error: context cancelled"
May 16 15:21:25 linux dockerd[997]: time="2018-05-16T15:21:25.166545043+02:00" level=warning msg="Ignoring Exit Event, no such exec command found" container=bc9b6dddb8321340e0a5c8080da0099471e125785e519813420534462cd39fee exec-id=a9b0d0a9801eafcd602b192643435a580ab103160bd3c0f2c617b72f54fed600 exec-pid=30039
May 16 15:21:28 linux dockerd[997]: time="2018-05-16T15:21:28.523770366+02:00" level=warning msg="Health check for container 2112097b7e1f2f67d87dca97a961bbed30c8f531d1a0031a542e978f1dba6eb1 error: context cancelled"
May 16 15:21:32 linux dockerd[997]: time="2018-05-16T15:21:32.848645567+02:00" level=warning msg="Health check for container 91084a44ab9c1002f2f1e65c3ec998906b6a9b438f60ba475e6bb2e12d7c5a75 error: context cancelled"
May 16 15:21:34 linux dockerd[997]: time="2018-05-16T15:21:34.529881094+02:00" level=warning msg="Health check for container bc9b6dddb8321340e0a5c8080da0099471e125785e519813420534462cd39fee error: context cancelled"
May 16 15:21:35 linux dockerd[997]: time="2018-05-16T15:21:35+02:00" level=info msg="shim reaped" id=9c683c7f9981d7d006bb1fe09cbfdde76d92fc407fc21e4865d2937a498c0e70 module="containerd/tasks"
May 16 15:21:35 linux dockerd[997]: time="2018-05-16T15:21:35.578677366+02:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
May 16 15:21:35 linux dockerd[997]: time="2018-05-16T15:21:35.579311445+02:00" level=warning msg="rmServiceBinding df802cc4ee547f621a8fbbc651b42068187bedfe35bc2724dbb3060ca750556f possible transient state ok:false entries:0 set:false "
May 16 15:21:36 linux dockerd[997]: time="2018-05-16T15:21:36.557342920+02:00" level=error msg="fatal task error" error="task: non-zero exit (1): dockerexec: unhealthy container" module=node/agent/taskmanager node.id=jejb6my7n50ulnilktgd2fxof service.id=im375yj0ykpzybijttf2zibuo task.id=wtqpz6oxuf0jsv9ncqlbbppnj
May 16 15:21:39 linux dockerd[997]: time="2018-05-16T15:21:39.402154129+02:00" level=warning msg="Ignoring Exit Event, no such exec command found" container=bc9b6dddb8321340e0a5c8080da0099471e125785e519813420534462cd39fee exec-id=6255a92ba781ea06c74ff79dbb7a890d58ac38d4384d610d2543fbe0efdb2d5b exec-pid=30962
May 16 15:21:39 linux dockerd[997]: time="2018-05-16T15:21:39.872877635+02:00" level=warning msg="Ignoring Exit Event, no such exec command found" container=2112097b7e1f2f67d87dca97a961bbed30c8f531d1a0031a542e978f1dba6eb1 exec-id=eb6dfbc8d66fe40c12e2dd213c6bba125d4b04eb9c7adae250abcc1f204e88b0 exec-pid=30411
docker stats 2112097b7e1f2f67d87dca97a961bbed30c8f531d1a0031a542e978f1dba6eb1 --no-stream
CONTAINER ID        NAME                                              CPU %               MEM USAGE / LIMIT   MEM %               NET I/O             BLOCK I/O           PIDS
2112097b7e1f        ourwebapp_web_test.2.zuuewzn08kb6x0f327s8snq1j   17.68%              499.9MiB / 500MiB   99.99%              36MB / 3.82MB       28.9GB / 26.5GB     116

We met the same problem with docker version 17.06.0-ce. Any updates?

Experiencing same issue Docker version 20.10.23, build 7155243 Containers suddenly hang. Can’t stop, can’t kill, can’t exec. Super frustrating. I can’t seem to figure out where the problem is. How do i start to debug?

Running on windows 11 24G mem 2 CPU wsl 2 config using default docker-desktop distro