containerd: Restoring containers from a custom checkpoint-dir is broken

Description

This was first reported in https://github.com/moby/moby/pull/35694 but no issue was ever created for it.

Containerd 1.0 (I believe) broke the --checkpoint-dir option for restoring a container from a particular checkpoint location. This means this option is broken in docker releases 17.12 and up.

See https://github.com/moby/moby/commit/ddae20c032#diff-3cb140026df40998ea29c5bcb6bb292eR118

Steps to reproduce the issue:

  1. docker run --name crtest -d busybox /bin/sh -c ‘i=0; while true; do echo $i; i=$(expr $i + 1); sleep 1; done’
  2. docker checkpoint create --checkpoint-dir /var/lib/docker/checkpoints crtest checkpoint1
  3. docker stop crtest (see https://github.com/moby/moby/issues/35690)
  4. docker start --checkpoint-dir /var/lib/docker/checkpoints --checkpoint checkpoint1 crtest

Describe the results you received:

“Error response from daemon: custom checkpointdir is not supported”

Describe the results you expected:

Docker start with custom checkpoint-dir should succeed.

Note that C/R is currently broken in docker even without using a custom checkpoint-dir; see https://github.com/moby/moby/issues/35691

Output of containerd --version:

Client: Version: 18.05.0-ce API version: 1.37 Go version: go1.9.5 Git commit: f150324 Built: Wed May 9 22:17:48 2018 OS/Arch: linux/amd64 Experimental: false Orchestrator: swarm

Server: Engine: Version: 18.05.0-ce API version: 1.37 (minimum version 1.12) Go version: go1.9.5 Git commit: f150324 Built: Wed May 9 22:15:57 2018 OS/Arch: linux/amd64 Experimental: true

Containers: 1 Running: 0 Paused: 0 Stopped: 1 Images: 4 Server Version: 18.05.0-ce Storage Driver: overlay Backing Filesystem: extfs Supports d_type: true Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog Swarm: inactive Runtimes: nvidia runc Default Runtime: runc Init Binary: docker-init containerd version: 773c489c9c1b21a6d78b5c538cd395416ec50f88 runc version: 4fc53a81fb7c994640722ac585fa9ca548971871 init version: 949e6fa Security Options: apparmor Kernel Version: 4.14.43-041443-generic Operating System: Ubuntu 14.04.5 LTS OSType: linux Architecture: x86_64 CPUs: 2 Total Memory: 14.92GiB Name: ip-10-97-0-215 ID: BEEB:4M2D:QUZT:CXJW:WZ4H:WPYV:3BNT:ZGR6:ZGQU:S6ZM:EML5:SNC6 Docker Root Dir: /var/lib/docker Debug Mode (client): false Debug Mode (server): false Registry: https://index.docker.io/v1/ Labels: Experimental: true Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false

@crosbymichael @dmcgowan

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 1
  • Comments: 18 (11 by maintainers)

Most upvoted comments

Because of the way we handle checkpoints now, as an image, this option is not supported anymore. you could still persist that image to disk, like any other image and load it that way, we just don’t have dir support.

@crosbymichael , hello. I’m wondering how exactly am I supposed to restore a CRIU checkpoint stored in a custom directory? You mentioned persisting that image to disk. What image are you implying here? If you have some simple snippet, I’d really appreciate it.

Thank you!

@tswift242 I’d have to refresh my memory of too. But I believe it’s still relevant. If one want to save the checkpoint in an external directory instead of having it stored in containerd’s snapshotter, when wanting to restore from that external directory, creating the manifest/snapshot in containerd and passing those should do the trick (it’s basically what the client is doing when creating the checkpoint in the first place if I’m not mistaken)

@tswift242 let me refresh my knowledge on the docker integration today and get back with ya.

@crosbymichael My team is very interested in fixing this issue and getting C/R to work in docker master in the short term. We may have some bandwidth in the coming weeks to contribute a fix for this issue, if we got some pointers.

Is it perhaps as simple as changing the value of checkpointDir here: https://github.com/moby/moby/blob/master/daemon/start.go#L185 ?

@mlaventure previously left this comment: https://github.com/moby/moby/pull/35694#issuecomment-353107559. Is that still relevant? If so, can you comment/elaborate on that?

@stevvooe ya, still working on the interface and getting new requirements from @tswift242.

I’ll probably close this one after I create a tracking issue for c/r work. We have a few things that we need/want before we move it out of experimental in docker.