moby: live-restore in combination with docker compose is broken with docker-ce version 20.10.19 and newer
Description
Hello,
we are heavily using the live-restore feature in our production environments and facing automatic deployment issues through our ci. In our environment all Operating System Packages are updated automatically. When the docker engine gets updated and someone restarts a container in a running compose stack a docker compose down -v --remove-orphans does not work.
Update: This happens particularly to named volumes with local driver and binded directory.
docker-compose.yml example
version: '2.4'
services:
app:
image: appimage
networks:
app-network:
aliases:
- app
restart: "unless-stopped"
volumes:
- log-app:/log
volumes:
log-app:
driver: local
driver_opts:
type: none
device: /srv/docker/stack/prod/log/app
o: bind
Operating System: Debian GNU/Linux 11.5 (Bullseye)
Reproduce
root@host:/srv/docker/stack/prod# su - dockerdeploy
dockerdeploy@host:~# cd /srv/docker/stack/prod
dockerdeploy@host:/srv/docker/stack/prod# docker compose up -d
dockerdeploy@host:/srv/docker/stack/prod# exit
root@host:/srv/docker/stack/prod# systemctl restart docker.service
root@host:/srv/docker/stack/prod# su - dockerdeploy
dockerdeploy@host:~# cd /srv/docker/stack/prod
dockerdeploy@host:/srv/docker/stack/prod$ docker compose restart app
[+] Running 1/1
⠿ Container app Started
dockerdeploy@host:/srv/docker/stack/prod$ docker compose down -v --remove-orphans
[+] Running 21/37
.....
Error response from daemon: remove prod_log-app: volume has active mounts
Expected behavior
docker compose down -v --remove-orphans should reliably tear down the stack
docker version
Client: Docker Engine - Community
Version: 20.10.21
API version: 1.41
Go version: go1.18.7
Git commit: baeda1f
Built: Tue Oct 25 18:02:28 2022
OS/Arch: linux/amd64
Context: default
Experimental: true
Server: Docker Engine - Community
Engine:
Version: 20.10.21
API version: 1.41 (minimum version 1.12)
Go version: go1.18.7
Git commit: 3056208
Built: Tue Oct 25 18:00:19 2022
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.6.9
GitCommit: 1c90a442489720eec95342e1789ee8a5e1b9536f
runc:
Version: 1.1.4
GitCommit: v1.1.4-0-g5fd4c4d
docker-init:
Version: 0.19.0
GitCommit: de40ad0
docker info
Client:
Context: default
Debug Mode: false
Plugins:
app: Docker App (Docker Inc., v0.9.1-beta3)
buildx: Docker Buildx (Docker Inc., v0.9.1-docker)
Server:
Containers: 34
Running: 34
Paused: 0
Stopped: 0
Images: 28
Server Version: 20.10.21
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: systemd
Cgroup Version: 2
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 1c90a442489720eec95342e1789ee8a5e1b9536f
runc version: v1.1.4-0-g5fd4c4d
init version: de40ad0
Security Options:
apparmor
seccomp
Profile: default
cgroupns
Kernel Version: 5.10.0-19-amd64
Operating System: Debian GNU/Linux 11 (bullseye)
OSType: linux
Architecture: x86_64
CPUs: 60
Total Memory: 157.2GiB
Name: prod
ID: XXXX:XXXX:XXXX:XXXX:XXXX:XXXX:XXXX:XXXX:XXXX:XXXX:XXXX:XXXX
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: true
Additional Info
Docker Compose version v2.12.2
daemon.json
{
"experimental": true,
"fixed-cidr-v6": "fd00:dead:beef:c0::/80",
"ipv6": true,
"ip6tables": true,
"live-restore": true,
"mtu": 1400
}
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 17 (11 by maintainers)
I have a working patch for this, it needs tests and I’d like to do a bisect to see exactly the commit that caused the issue so it’s going to take a bit of time to get this in, but I at least see what’s happening.
And sure enough I can reproduce this with the graphdriver backend, does not reproduce with containerd snapshotters.
I can confirm that this is still an issue on 24.0.2. Reproduced with this setup.
Versions
Docker compose version
OS
Compose file
Start up the stack, auto-creating the volume
Restart docker daemon – note that this bug exists across any restart, updating the docker daemon is just one possible trigger of a restart.
Bring down the container stack leaving volumes so we can see their state before trying to delete them. note that running
docker compose down -v
here also triggers the same error as described in the original bug report.Current status of the volume
Now try to delete the volume
This failure should not happen. This appears to be a bug in the internal reference counting in the docker daemon.
Restart the docker daemon again to try to clear up the error in reference counting.
Now notice that we can delete the volume with no error.