moby: Docker system prune stuck in locked mode

I run command docker system prune yesterday, it took some time and then my SSH session was disconnected from different reason.

Unfortunately I am getting now:

Error response from daemon: a prune operation is already running.

Obviously there is a lock and prune command is not running anymore.

Steps to reproduce the issue:

  1. docker system prune

Describe the results you received: “a prune operation is already running.”

Describe the results you expected: Automatic unlock after a certain amount of time, self healing or posibillity to unlock

Additional information you deem important (e.g. issue happens only occasionally):

Output of docker version:

Client:
 Version:	17.12.0-ce
 API version:	1.35
 Go version:	go1.9.2
 Git commit:	c97c6d6
 Built:	Wed Dec 27 20:11:19 2017
 OS/Arch:	linux/amd64

Server:
 Engine:
  Version:	17.12.0-ce
  API version:	1.35 (minimum version 1.12)
  Go version:	go1.9.2
  Git commit:	c97c6d6
  Built:	Wed Dec 27 20:09:53 2017
  OS/Arch:	linux/amd64
  Experimental:	false

Output of docker info:

Containers: 40
 Running: 25
 Paused: 0
 Stopped: 15
Images: 261
Server Version: 17.12.0-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: bu0xu0nf5r9g1ydlblnjf4rdi
 Is Manager: true
 ClusterID: zkqa5nrgqn042xedq172mwz5v
 Managers: 3
 Nodes: 3
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 2
 Autolock Managers: false
 Root Rotation In Progress: false
 Node Address: 10.10.10.3
 Manager Addresses:
  10.10.10.1:2377
  10.10.10.2:2377
  10.10.10.3:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 89623f28b87a6004d4b785663257362d1658a729
runc version: b2567b37d7b75eb4cf325b77297b140ea686ce8f
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.4.0-112-generic
Operating System: Ubuntu 16.04.3 LTS
OSType: linux
Architecture: x86_64
CPUs: 32
Total Memory: 62.88GiB
Name: xxxxx
ID: CW4M:4OEM:N3QG:UHYR:NF64:SZVT:IDGC:7O6L:LILC:UPYG:S6TG:5URD
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support

Additional environment details (AWS, VirtualBox, physical, etc.): Physical, private cluster

About this issue

  • Original URL
  • State: open
  • Created 6 years ago
  • Reactions: 22
  • Comments: 47 (16 by maintainers)

Most upvoted comments

I can confirm that prune stuck because of a non-responding container. When I kill the container first by kill -9 PROCESS_ID where the process Id I get from ps aux | grep docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/CONTAINER_ID

The problem is that you need to know that there is a container that does not respond on docker 😕 The container works (i.e. node.js works fine) but just docker is not able to even inspect it.

Btw this container should not even be there because we run docker service update... with the :latest image. Docker created another container and this was not killed. So there were two running containers with two different versions.

Solution described by https://github.com/moby/moby/issues/36447#issuecomment-373273071 worked for me as well. Thanks very much for that detailled report!

Finding defect containers by the way is very easy with

for i in $(docker ps -q) ; do echo $i && docker inspect $i ; done

if there will be no prompt in the last line but a hash, this is the defect container.

Hi! I have the same problem and the exact same version. Docker is working in swarm mode. The problem is that there’s no directory /var/lib/docker/. And I can’t kill the process. Any ideas?

Just like to share my 2 cents...

for i in $(docker ps -q) ; do echo -n "$i: " && docker inspect $i | wc -l  ; done
a008c638bb1a: 269
415d2bd82440: 269
1499b90f284e: 236
7c09611b437d: 261
e88e7fe19cc5: 236
406f273b8380: 236
2f26cff6688a: 242
0cb943e3292a: 236
1e9f2fc1d6c8: 269
417ec30ce9b2: 236
7eb727195e0f: 245
86c93b11c213: 269
ff5582645e2b: 245

No stuck containers!

docker version
Client:
 Version:           18.09.5
 API version:       1.39
 Go version:        go1.10.8
 Git commit:        e8ff056dbc
 Built:             Thu Apr 11 04:44:28 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.5
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.8
  Git commit:       e8ff056
  Built:            Thu Apr 11 04:10:53 2019
  OS/Arch:          linux/amd64
  Experimental:     true

docker system prune
WARNING! This will remove:
        - all stopped containers
        - all networks not used by at least one container
        - all dangling images
        - all dangling build cache
Are you sure you want to continue? [y/N] y
Error response from daemon: a prune operation is already running

@cgoeller Thanks. Have you tried with 17.12.1 yet? It might be fixed there, but no guarantees as there are a few places where this deadlock is happening some of which was fixed in 17.12.1. There are several more patches coming in 18.03.1.

Looks like the right commits, yes.

Can confirm that prune hangs because of unresponsive container that stuck during exec. We are running Kubernetes and docker 17.12.0-ce and have the same issue sometimes because of daily cron for pruning the system.

We can see that container with docker ps -a with the Created status that hanged: 17492dcb9f49 gcr.io/google_containers/pause-amd64:3.0 "/pause" 2 days ago Created k8s_POD_...

And also corresponding containerd-shim and docker-runc processes: root 12651 0.0 0.0 7512 3632 ? Sl Mar16 0:00 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/17492dcb9f4989953cac832453b0132d28bafa622ff93bac2824ce0bebf698f9 -address /var/run/docker/containerd/docker-containerd.sock -containerd-binary /usr/bin/docker-containerd -runtime-root /var/run/docker/runtime-runc

root 13405 0.0 0.0 122168 9520 ? Sl Mar16 0:00 docker-runc --root /var/run/docker/runtime-runc/moby --log /run/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/17492dcb9f4989953cac832453b0132d28bafa622ff93bac2824ce0bebf698f9/log.json --log-format json start 17492dcb9f4989953cac832453b0132d28bafa622ff93bac2824ce0bebf698f9

Dockerd stack trace: goroutine-stacks-2018-03-19T032810-0500.log Containerd stack trace: containerd-trace.log

Killing the containerd-shim process removes the unresponsive container. Then after terminating the hanged prune command it can be executed again without a prune operation is already running message.

Unfortunately, i cannot find a way to reproduce the issue, so cannot tell if bumping docker version will help, but there somehow exist similar issues with hanged containers even on 18.03.

It would be very helpful if somebody confirm that that prune does not stuck on >=17.12.1.

Update: I had to restart the server, service docker restart did not work and the command stuck - I killed it after 10 minutes.