moby: ZFS failed to register layer dataset does not exist

Description Trying to pull some images (I’m still unsure why some work and other do not) result in the following error

failed to register layer: exit status 2: "/sbin/zfs zfs snapshot pond/docker/4522c1adffaf266062442dba2eb347084d774267a124d1bd7a04e6b3cb9634f0@371563767" => cannot open 'pond/docker/4522c1adffaf266062442dba2eb347084d774267a124d1bd7a04e6b3cb9634f0': dataset does not exist
usage:
	snapshot|snap [-r] [-o property=value] ... <filesystem|volume>@<snap> ...

For the property list, run: zfs set|get

For the delegated permission list, run: zfs allow|unallow

Not all images have this problem

› docker image pull nginx
Using default tag: latest
latest: Pulling from library/nginx
5040bd298390: Already exists
333547110842: Pull complete
4df1e44d2a7a: Pull complete
Digest: sha256:f2d384a6ca8ada733df555be3edc427f2e5f285ebf468aae940843de8cf74645
Status: Downloaded newer image for nginx:latest
› docker image pull nginx:alpine
alpine: Pulling from library/nginx
b7f33cc0b48e: Already exists
9a57e9207914: Pull complete
79f62f9c7236: Pull complete
50a2334db9bc: Pull complete
Digest: sha256:d34e2176dab830485b0cb79340e1d5ebf7d530b34ad7bfe05d269ffed20a88f4
Status: Downloaded newer image for nginx:alpine

But others do

› docker image pull cantino/huginn
Using default tag: latest
latest: Pulling from cantino/huginn
c60055a51d74: Already exists
755da0cdb7d2: Already exists
969d017f67e6: Already exists
37c9a9113595: Already exists
a3d9f8479786: Already exists
d7d0c2c8ec3c: Extracting [==================================================>]    783 B/783 B
3480f3638ec5: Download complete
dc00c829c370: Download complete
52c5f6110241: Download complete
65f3a34b31c0: Download complete
4970b4c36b25: Download complete
df6da8726ead: Download complete
236e60444501: Download complete
94fb09e7b96d: Download complete
aac028a46934: Download complete
c78dd42f78d9: Download complete
failed to register layer: exit status 2: "/sbin/zfs zfs snapshot pond/docker/4522c1adffaf266062442dba2eb347084d774267a124d1bd7a04e6b3cb9634f0@371563767" => cannot open 'pond/docker/4522c1adffaf266062442dba2eb347084d774267a124d1bd7a04e6b3cb9634f0': dataset does not exist
usage:
	snapshot|snap [-r] [-o property=value] ... <filesystem|volume>@<snap> ...

For the property list, run: zfs set|get

For the delegated permission list, run: zfs allow|unallow

Steps to reproduce the issue:

  1. Ensure you are using docker with the zfs driver
  2. Pull an image (cantino/huginn)
  3. Observe ZFS error

Describe the results you received: A dataset does not exist error for trying to create a snapshot off a dataset that did not exist.

Describe the results you expected: The dataset to be created correctly.

Additional information you deem important (e.g. issue happens only occasionally): The two images I’ve had this happen on so far are plexinc/pms-docker:plexpass, plexinc/pms-docker:public, cantino/huginn. As the output above shows, nginx and friends do not have this issue.

Output of docker version:

› docker version
Client:
 Version:      1.13.1
 API version:  1.26
 Go version:   go1.7.5
 Git commit:   092cba3
 Built:        Wed Feb  8 06:50:14 2017
 OS/Arch:      linux/amd64

Server:
 Version:      1.13.1
 API version:  1.26 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   092cba3
 Built:        Wed Feb  8 06:50:14 2017
 OS/Arch:      linux/amd64
 Experimental: false

Output of docker info:

› docker info
Containers: 5
 Running: 5
 Paused: 0
 Stopped: 0
Images: 12
Server Version: 1.13.1
Storage Driver: zfs
 Zpool: pond
 Zpool Health: ONLINE
 Parent Dataset: pond/docker
 Space Used By Parent: 1008901120
 Space Available: 501376187904
 Parent Quota: no
 Compression: lz4
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: aa8187dbd3b7ad67d8e5e3a15115d3eef43a7ed1
runc version: 9df8b306d01f59d3a8029be411de015b7304dd8f
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.4.0-62-generic
Operating System: Ubuntu 16.04.2 LTS
OSType: linux
Architecture: x86_64
CPUs: 12
Total Memory: 31.32 GiB
Name: cortex
ID: ZHTI:AFNB:DZ56:WQ2P:VSZL:5EWB:NLLR:4Q3M:4YWV:NV4C:J52R:UFKK
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.): Running on Ubuntu 16.04 with ZFS on Linux

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 1
  • Comments: 26 (11 by maintainers)

Most upvoted comments

One of the solution could be:

  1. Stop docker service
  2. Remove /var/lib/docker
  3. Start docker service again

BUT, this will remove all data!

I get the following error message while trying to delete a docker container:

Error response from daemon: container aec632519d964766878bd703c69b407320bb9f5883d5ffde47a5452b23f5cd85: driver "zfs" failed to remove root filesystem: exit status 1: "/usr/sbin/zfs fs destroy -r rpool/ROOT/ubuntu_50dmnj/var/lib/23d5a3ca6f8c3ce2a92f46ada48c2d1144864cca9e20ad65c00b02052f40583f" => cannot open 'rpool/ROOT/ubuntu_50dmnj/var/lib/23d5a3ca6f8c3ce2a92f46ada48c2d1144864cca9e20ad65c00b02052f40583f': dataset does not exis

Already tried restarting docker and the host system…did not work…any guess?

EDIT:

Using:

Docker version 19.03.8, build afacb8b7f0 Ubuntu 20.04 with Kernel 5.4.0-21-generic

@chrishoage at the moment you can only manually fix the inconsistencies in docker’s image layer “database”. The problem is, that docker thinks that this dataset exists as it is referenced in /var/lib/docker/image/zfs/layerdb/sha256/<layer_sha256>/cache-id in plain text. This directory (/var/lib/docker/image/zfs/layerdb/sha256/<layer_sha256>/cache-id) must be removed as well as all layers, where <layer_sha256> is the parent. All files in layerdb are json files or plain text so grep will work here. At last but not least you also need to remove any container file in /var/lib/docker/containers/ which uses a layer you have just deleted as well as the corresponding mount in /var/lib/docker/image/zfs/layerdb/mounts/.

Update an alternative is to just build a new docker image, with different content. This will then get a different id.

This worked for me.

1. Stop Docker service: `sudo systemctl stop stop docker`
2. Check the existence of Docker ZFS snapshots with something like `sudo zfs list -H -o name -t snapshot | grep docker`
3. Delete the aforementioned ZFS snapshots: `sudo zfs list -H -o name -t snapshot | grep docker | xargs -n1 sudo zfs destroy -R`
3. Start Docker service again: `sudo systemctl start docker`

I had the same problem. This worked for me:

1. Stop docker service

2. Remove /var/lib/docker

3. Start docker service again

This has happened several times I did similar things I do find it a very obscure way to handle this situation. Does anyone know what is going on here?

seems like the workaround for now, while upstream ponders implementing a proper fix, is to stop docker and destroy the /var/lib/docker zfs volume, and then start docker and recreate all the containers.

The error happens when a zfs dataset is destroyed, without an update in the image layer of docker:

$ zfs destroy -R zroot/root/nixos/0368bc77be5e5db69270386d15137fb971a081590c5f436b21916da09d633c50
$ docker run nginx          
/nix/store/9glgvsw1i76lxzblwdlm659lv1bg065c-docker-1.13.1/libexec/docker/docker: Error response from daemon: exit status 2: "/nix/store/mddh2ffqi53f3sw96pj6v4f0mzn04lcd-zfs-user-0.6.5.9/bin/zfs zfs snapshot zroot/root/nixos/0368bc77be5e5db69270386d15137fb971a081590c5f436b21916da09d633c50@256866051" => cannot open 'zroot/root/nixos/0368bc77be5e5db69270386d15137fb971a081590c5f436b21916da09d633c50': dataset does not exist
usage:
        snapshot|snap [-r] [-o property=value] ... <filesystem|volume>@<snap> ...

For the property list, run: zfs set|get

For the delegated permission list, run: zfs allow|unallow.
See '/nix/store/9glgvsw1i76lxzblwdlm659lv1bg065c-docker-1.13.1/libexec/docker/docker run --help'.