moby: Restoring containers from a custom checkpoint-dir is broken

Copied from https://github.com/containerd/containerd/issues/2406

Description

This was first reported in moby/moby#35694 but no issue was ever created for it.

The containerd 1.0 integration broke the --checkpoint-dir option for restoring a container from a particular checkpoint location. This means this option is broken in docker releases 17.12 and up.

See https://github.com/moby/moby/commit/ddae20c032#diff-3cb140026df40998ea29c5bcb6bb292eR118

Steps to reproduce the issue:

  1. docker run --name crtest -d busybox /bin/sh -c ‘i=0; while true; do echo $i; i=$(expr $i + 1); sleep 1; done’
  2. docker checkpoint create --checkpoint-dir /var/lib/docker/checkpoints crtest checkpoint1
  3. docker stop crtest (see moby/moby#35690)
  4. docker start --checkpoint-dir /var/lib/docker/checkpoints --checkpoint checkpoint1 crtest

Describe the results you received:

“Error response from daemon: custom checkpointdir is not supported”

Describe the results you expected:

Docker start with custom checkpoint-dir should succeed.

Note that C/R is currently broken in docker even without using a custom checkpoint-dir; see moby/moby#35691

Output of docker version:

Client: Version: 18.05.0-ce API version: 1.37 Go version: go1.9.5 Git commit: f150324 Built: Wed May 9 22:17:48 2018 OS/Arch: linux/amd64 Experimental: false Orchestrator: swarm

Server: Engine: Version: 18.05.0-ce API version: 1.37 (minimum version 1.12) Go version: go1.9.5 Git commit: f150324 Built: Wed May 9 22:15:57 2018 OS/Arch: linux/amd64 Experimental: true

Output of docker info:

Containers: 1 Running: 0 Paused: 0 Stopped: 1 Images: 4 Server Version: 18.05.0-ce Storage Driver: overlay Backing Filesystem: extfs Supports d_type: true Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog Swarm: inactive Runtimes: nvidia runc Default Runtime: runc Init Binary: docker-init containerd version: 773c489 runc version: 4fc53a81fb7c994640722ac585fa9ca548971871 init version: 949e6fa Security Options: apparmor Kernel Version: 4.14.43-041443-generic Operating System: Ubuntu 14.04.5 LTS OSType: linux Architecture: x86_64 CPUs: 2 Total Memory: 14.92GiB Name: ip-10-97-0-215 ID: BEEB:4M2D:QUZT:CXJW:WZ4H:WPYV:3BNT:ZGR6:ZGQU:S6ZM:EML5:SNC6 Docker Root Dir: /var/lib/docker Debug Mode (client): false Debug Mode (server): false Registry: https://index.docker.io/v1/ Labels: Experimental: true Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.):

AWS

CC @crosbymichael @dmcgowan @mlaventure @thaJeztah

About this issue

  • Original URL
  • State: open
  • Created 6 years ago
  • Reactions: 5
  • Comments: 30 (7 by maintainers)

Commits related to this issue

Most upvoted comments

Hi @harishanand95

The workaround would be to manually copy the checkpoint directory inside /var/lib/docker/containers/<CONTAINER ID>/checkpoints/

For example:

docker run -d --name looper2 busybox \
         /bin/sh -c 'i=0; while true; do echo $i; i=$(expr $i + 1); sleep 1; done'

docker checkpoint create --checkpoint-dir=/tmp looper2 checkpoint2

mv /tmp/checkpoint2 /var/lib/docker/containers/$(docker ps -aq --no-trunc --filter name=looper2)/checkpoints/

docker start --checkpoint=checkpoint2 looper2

I have a patch to fix this issue. I’m going to create a pull request, when https://github.com/containerd/containerd/pull/2425 will be merged.

@cnnrznn I was able to get this working in a newly created container by copying the checkpoint into the newly created container before running it. The following code snippet worked for me:

docker create --name looper3 busybox
sudo cp -r /tmp/checkpoint2 /var/lib/docker/containers/$(docker ps -aq --no-trunc --filter name=looper3)/checkpoints/
docker start --checkpoint=checkpoint2 looper3

Anyone still using the workaround? Checkpoint does not seem to work if applied to a newly restored container. I tried the workaround posted by @rst0git and the scripts by @MihaelBercic of manually copying the checkpoint.

Steps to reproduce:

docker run --security-opt=seccomp:unconfined --name cr -d busybox /bin/sh -c 'i=0; while true; do echo $i; i=$(expr $i + 1); sleep 1; done'
docker logs -n1 -f cr # Logs work, shows an increasing number (Ctrl+c to stop)
docker checkpoint create --checkpoint-dir /tmp/ cr checkpoint1
docker stop -t0 cr && docker rm cr
docker create --security-opt=seccomp:unconfined --name cr busybox /bin/sh -c 'i=0; while true; do echo $i; i=$(expr $i + 1); sleep 1; done'
cp -r /tmp/checkpoint1 /var/lib/docker/containers/$(docker ps -aq --no-trunc --filter name=cr)/checkpoints/
docker start --checkpoint checkpoint1 cr # remove the container to restore in a new one
docker logs -n1 -f cr # Logs work, shows an increasing number (Ctrl+c to stop)
docker checkpoint create --checkpoint-dir /tmp/ cr checkpoint2
Error response from daemon: Cannot checkpoint container cr: OCI runtime pause failed: freezer not supported: openat2 /sys/fs/cgroup/system.slice/docker-d4cae4eaeea3a2d93155029a112fbe4062b7c0402355f662f3ea5ae329441b46.scope/cgroup.freeze: no such file or directory: unknown

The container fails to pause after it is restored in a new container.

docker info:

Client:
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc., 0.0.0+unknown)

Server:
 Containers: 1
  Running: 1
  Paused: 0
  Stopped: 0
 Images: 20
 Server Version: 20.10.25
 Storage Driver: overlay2
  Backing Filesystem: xfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 0cae528dd6cb557f7201036e9f43420650207b58
 runc version: f19387a6bec4944c770f7668ab51c4348d9c2f38
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
  cgroupns
 Kernel Version: 6.1.38-59.109.amzn2023.x86_64
 Operating System: Amazon Linux 2023
 OSType: linux
 Architecture: x86_64
 CPUs: 1
 Total Memory: 949.8MiB
 Name: ip-10-0-139-156.ec2.internal
 ID: 54UQ:MNHU:YLFT:E5ZG:OJTX:XVYP:YUFB:R2LQ:3HY4:ZERB:OSUU:THDA
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: true
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Tested with criu 3.17 and 3.18

criu --version
Version: 3.18

containerd

containerd --version
containerd github.com/containerd/containerd 1.7.2 0cae528dd6cb557f7201036e9f43420650207b58

Hi @MihaelBercic, you should be able to use docker create to first create the container then move the checkpoint.

However, I would recommend using Podman for container migration as it is actively developed, supported and maintained.

@adrianreber is the author of the checkpoint/restore functionality in Podman and also a CRIU maintainer. Adrian has a few very good talks and articles on this topic:

I hope this helps.

@avagin I wonder if that PR is ready to be made now that containerd/containerd#2425 is ready to roll? I can try to tackle it if not - thank you!