moby: docker cannot stop or kill a container

originated from https://github.com/docker/docker/issues/10589 and opening this one based on @thaJeztah’s request (https://github.com/docker/docker/issues/10589#issuecomment-214859917).

docker cannot rm, stop or kill the container:

➜  app git:(master) ✗ docker info  
Containers: 91
 Running: 2
 Paused: 0
 Stopped: 89
Images: 296
Server Version: 1.10.2
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 490
 Dirperm1 Supported: true
Execution Driver: native-0.2
Logging Driver: json-file
Plugins: 
 Volume: local
 Network: bridge null host
Kernel Version: 3.19.0-49-generic
Operating System: Ubuntu 14.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 3.844 GiB
Name: ubuntu
ID: 2HH2:BL4I:ZPYE:BIXT:656Y:R7FQ:I6RQ:7H7U:YHLI:BO6K:TY2Z:6FGB
WARNING: No swap limit support

➜  app git:(master) ✗ docker logs zleek-neo4j
Starting Neo4j Server...process [139]... waiting for server to be ready............ OK.
http://localhost:7474/ is ready.
Stopping Neo4j Server [139].... done
Starting Neo4j Server console-mode...
2016-03-03 21:48:25.629+0000 INFO  Successfully started database
2016-03-03 21:48:25.719+0000 INFO  Starting HTTP on port 7474 (4 threads available)
2016-03-03 21:48:26.301+0000 INFO  Enabling HTTPS on port 7473
2016-03-03 21:48:26.510+0000 INFO  Mounting static content at /webadmin                                                                [22/1813]
2016-03-03 21:48:26.711+0000 INFO  Mounting static content at /browser
2016-03-03 21:48:29.938+0000 INFO  Remote interface ready and available at http://0.0.0.0:7474/
2016-03-03 21:59:29.931+0000 INFO  Neo4j Server shutdown initiated by request
2016-03-03 21:59:30.035+0000 INFO  Successfully shutdown Neo4j Server
2016-03-03 21:59:30.125+0000 INFO  Successfully stopped database
2016-03-03 21:59:30.126+0000 INFO  Successfully shutdown database
Starting Neo4j Server...process [142]... waiting for server to be ready........... OK.
http://localhost:7474/ is ready.
Stopping Neo4j Server [142].... done
Starting Neo4j Server console-mode...
2016-03-03 22:02:14.884+0000 INFO  Successfully started database
2016-03-03 22:02:14.927+0000 INFO  Starting HTTP on port 7474 (4 threads available)
2016-03-03 22:02:15.295+0000 INFO  Enabling HTTPS on port 7473
2016-03-03 22:02:15.431+0000 INFO  Mounting static content at /webadmin
2016-03-03 22:02:15.529+0000 INFO  Mounting static content at /browser
2016-03-03 22:02:17.219+0000 INFO  Remote interface ready and available at http://0.0.0.0:7474/
Starting Neo4j Server...process [141]... waiting for server to be ready.................. OK.
http://localhost:7474/ is ready.
Stopping Neo4j Server [141].... done
Starting Neo4j Server console-mode...
2016-03-04 14:17:36.572+0000 INFO  Successfully started database
2016-03-04 14:17:36.611+0000 INFO  Starting HTTP on port 7474 (4 threads available)                                                     [0/1813]
2016-03-04 14:17:37.014+0000 INFO  Enabling HTTPS on port 7473
2016-03-04 14:17:37.133+0000 INFO  Mounting static content at /webadmin
2016-03-04 14:17:37.209+0000 INFO  Mounting static content at /browser
2016-03-04 14:17:38.582+0000 INFO  Remote interface ready and available at http://0.0.0.0:7474/
2016-03-04 14:24:48.444+0000 INFO  Neo4j Server shutdown initiated by request
2016-03-04 14:24:48.519+0000 INFO  Successfully shutdown Neo4j Server
2016-03-04 14:24:48.592+0000 INFO  Successfully stopped database
2016-03-04 14:24:48.592+0000 INFO  Successfully shutdown database
Starting Neo4j Server...process [141]... waiting for server to be ready.......... OK.
http://localhost:7474/ is ready.
Stopping Neo4j Server [141].... done
Starting Neo4j Server console-mode...
2016-03-04 14:25:42.498+0000 INFO  Successfully started database
2016-03-04 14:25:42.541+0000 INFO  Starting HTTP on port 7474 (4 threads available)
2016-03-04 14:25:42.916+0000 INFO  Enabling HTTPS on port 7473
2016-03-04 14:25:43.043+0000 INFO  Mounting static content at /webadmin
2016-03-04 14:25:43.126+0000 INFO  Mounting static content at /browser
2016-03-04 14:25:44.578+0000 INFO  Remote interface ready and available at http://0.0.0.0:7474/
2016-03-04 14:30:23.639+0000 INFO  Neo4j Server shutdown initiated by request
2016-03-04 14:30:23.688+0000 INFO  Successfully shutdown Neo4j Server
2016-03-04 14:30:23.763+0000 INFO  Successfully stopped database
2016-03-04 14:30:23.763+0000 INFO  Successfully shutdown database

➜  app git:(master) ✗ docker --version
Docker version 1.10.2, build c3959b1

➜  app git:(master) ✗ uname -a
Linux ubuntu 3.19.0-49-generic #55~14.04.1-Ubuntu SMP Fri Jan 22 11:24:31 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Some suggested that it might be related to https://github.com/docker/docker/issues/18180 but I seem to have kernel 3.19.0-49-generic but still having this issue for neo4j/neo4j:2.3.2 image from official docker registry.

➜  ~ docker info 
Containers: 6
 Running: 1
 Paused: 0
 Stopped: 5
Images: 69
Server Version: 1.10.3
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 155
 Dirperm1 Supported: true
Execution Driver: native-0.2
Logging Driver: json-file
Plugins: 
 Volume: local
 Network: bridge null host
Kernel Version: 3.19.0-49-generic
Operating System: Ubuntu 14.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 3.844 GiB
Name: ubuntu
ID: 2HH2:BL4I:ZPYE:BIXT:656Y:R7FQ:I6RQ:7H7U:YHLI:BO6K:TY2Z:6FGB
WARNING: No swap limit support

It is stuck and docker-compose stop gives below info:

➜  foo git:(master) ✗ docker-compose stop
Stopping foo-neo4j ... 

ERROR: for foo-neo4j  UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=70) 
ERROR: An HTTP request took too long to complete. Retry with --verbose to obtain debug information.
If you encounter this issue regularly because of slow network conditions, consider setting COMPOSE_HTTP_TIMEOUT to a higher value (current value: 60).

docker kill {container-id} also doesn’t work.

Repro Steps

I am not entirely sure as this doesn’t happen all the time. Let me know if there is anything I can do moire to narrow it down.

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Reactions: 22
  • Comments: 49 (8 by maintainers)

Most upvoted comments

Same problem here

$ sudo docker exec -it aa90c3c6c2c9 bash
OCI runtime exec failed: exec failed: container_linux.go:296: starting container process caused "process_linux.go:86: executing setns process caused \"exit status 21\"": unknown
$ sudo docker kill aa90c3c6c2c9
(hangs)
$ sudo docker stop aa90c3c6c2c9
(hangs)
$ sudo docker rm aa90c3c6c2c9
Error response from daemon: You cannot remove a running container aa90c3c6c2c9e1af1fbb4da692bd55aeec8d1e89d5958635286ca67238654560. Stop the container before attempting removal or force remove
$ docker -v
Docker version 17.12.0-ce, build c97c6d6
$ uname -a

Upd: docker info says I am using aufs.
Linux DL4 3.16.0-41-generic #57~14.04.1-Ubuntu SMP Thu Jun 18 18:01:13 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

You are the man @BenHall 🎆 🎉 updated the storage driver to overlay and everything is happy as you also indicated.

Exact steps I have followed for anyone having the same issue:

  • After a while of research on OverlayFS as a n00b, I ended up here. I verified with lsmod | grep overlay that I don’t have overlay kernel module loaded which made me end up here and I followed the instructions and installed the module.

  • After a VM restart, I ran lsmod | grep overlay again to confirm overlay kernel module is loaded. After seeing it there, I added --storage-driver=overlay flag to the DOCKER_OPTS line inside my docker config file (/etc/default/docker) and ran the below commands to restart docker:

    sudo /etc/init.d/docker stop
    sudo /etc/init.d/docker start
    
  • Now I have docker up and running and I ran docker info which confirmed that overlay is configured as the storage driver:

    Containers: 0
     Running: 0
     Paused: 0
     Stopped: 0
    Images: 9
    Server Version: 1.11.1
    Storage Driver: overlay
     Backing Filesystem: extfs
    Logging Driver: json-file
    Cgroup Driver: cgroupfs
    Plugins: 
     Volume: local
     Network: bridge null host
    Kernel Version: 3.19.0-49-generic
    Operating System: Ubuntu 14.04.4 LTS
    OSType: linux
    Architecture: x86_64
    CPUs: 4
    Total Memory: 3.844 GiB
    Name: ubuntu
    ID: 2HH2:BL4I:ZPYE:BIXT:656Y:R7FQ:I6RQ:7H7U:YHLI:BO6K:TY2Z:6FGB
    Docker Root Dir: /var/lib/docker
    Debug mode (client): false
    Debug mode (server): false
    Registry: https://index.docker.io/v1/
    

At the end, I have run the build command (docker build -t foo1 .) for this sample which was hanging on dotnet restore and it worked smoothly ✨ ✨ ✨

@thaJeztah I spoke with @cpuguy83 at OSCON and he suggested it was an AUFS problem. I’ve upgraded to Docker 1.11, 4.2.0-36-generic and overlay and everything appears to be happy.

yep facing the same problem, but on windows with windows containers.

This happens to me with the grafana loki log driver. So I usually update it, but turning it off and back on again might work too.

docker plugin disable loki -f
docker plugin upgrade loki --grant-all-permissions
docker plugin enable loki

After that I can stop the hung service if it didn’t do it automatically

I had the same problem, container was running, it listed in docker ps, but did not list in docker service ls. If I tried to kill or stop or get info on it, the command hung. I tried shutting down docker. That seemed to hang, so I used kill -9 on docker processes, and then restarted. All was good thereafter.

$ docker -v Docker version 18.03.0-ce, build 0520e24 $ uname -a Linux Brian 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1+deb9u1 (2018-05-07) x86_64 GNU/Linux

I didn’t, read the last comment on #40063. He suggest sticking with Docker 18.09 for now.

There are several pending potential fixes.

Same problem… what was the fix here?

Hanging with docker kill here as well:

CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS                          PORTS                              NAMES
00ecc6875de3        0bd13774a10b        "python server.py"       16 minutes ago      Restarting (1) 13 minutes ago                                      dashboard
a77be03b6123        postgres:11.5       "docker-entrypoint.s…"   16 minutes ago      Up 16 minutes                   5432/tcp, 0.0.0.0:5432->5433/tcp   db
2d6da0d955b7        redis:alpine        "docker-entrypoint.s…"   16 minutes ago      Up 16 minutes                   0.0.0.0:6379->6379/tcp             dash_starter_redis_1

And after a while, with docker-compose down I get the following message:

ERROR: An HTTP request took too long to complete. Retry with --verbose to obtain debug information.
If you encounter this issue regularly because of slow network conditions, consider setting COMPOSE_HTTP_TIMEOUT to a higher value (current value: 60).

Any fix?

2020 and for those who are facing this problem: sudo service docker stop && sudo killall dockerd && sudo service docker start

Now you can start your containers again.

I know that this hapens because docker freezes (dont know why) and let dummy child processes locked on memory, so we need to force container processes to be killed before restarting containers…

Same here! Try to run ps ax | grep docker and then kill the process 3639 ? Sl 0:00 docker stop Android-8.0 kill 3639

only server reboot helped me. I was able to stop docker with sudo systemctl stop docker but could not start it back.

Client&server: 17.12.1-ce

Same story. I had a container that was updating local psql db. The container freezes at some point. I made no real changes to psql at the point.

I can’t stop/kill --signal=9 or rm the container. I can reboot docker service using systemctl but it changes nothing. I can kill the container by killing one of the ps ax | grep docker process.

The bug is extremely rare for me, but it’s just weird.

@arogozhnikov saw exactly same issue with much later docker versions. @thaJeztah can we reopen the issue and start investigation?

# docker info
Containers: 73
 Running: 42
 Paused: 0
 Stopped: 31
Images: 23
Server Version: 18.06.2-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: efs local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: nvidia runc
Default Runtime: nvidia
Init Binary: docker-init
containerd version: 468a545b9edcd5932818eb9de8e72413e616e86e
......
# uname -a
Linux node-k8s-use1-prod-shared-001-kubecluster-3-0a0c5466 4.15.0-15-generic #16~16.04.1-Ubuntu SMP Thu Apr 5 12:19:23 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Hello,

I may be wrong but who knows… it may helps some.

I was playing with Docker swarm and that where i started having this proble with the registry:2

I couldn’t kill or stop or remove the registry container, it was restarting always even when I restarted the docker daemon and/or updating restart policy.

Finally, what i found/remembered, is that, the container has been created by a service creation. docker service create --name registry --publish published=5000,target=5000 registry:2.

I just did: docker service ls 42zlzbmpmi7z | registry | replicated | 1/1 | registry:2 | *:5000->5000/tcp

Graped the id (the first two or three digits are enough to identify the container) to remove the service: docker service rm 42

et voilà… it automatically removes the container.

Hope it helped somes 😃

Here’s the output:

> docker kill 4691404a3536_transformationrunner
4691404a3536_transformationrunner
> docker rm 4691404a3536_transformationrunner  
Error response from daemon: You cannot remove a running container 4691404a353682d084ac4cf48142177a77fceef988a2eaab6658b017f836a404. Stop the container before attempting removal or force remove

C18223FD-44DC-4ACD-B197-B660FC33917F/20200221163606

I discovered that it only happens when I use fluentd logging driver. I’m trying to create a minimal example, then I’ll file a new bug.

Happens to me occasionally with mongo. Here’s the log: docker.log

Docker version is 19.03.2.

@tugberkugurlu Hi, that works a treat now, need to understand why though.