moby: docker build extremely slow with zfs

Description

Docker builds with uncached layers are extremely slow when using the zfs storage driver. Our images have multiple layers. Images which used to build in 10-20 seconds now take over two minutes with zfs. We have some images with over 50 layers that now take 15 minutes to build.

Steps to reproduce the issue:

  1. Launch docker with the zfs storage driver: docker daemon -D --storage-driver=zfs

  2. Create a docker file with lots of layers. Example: FROM scratch CMD [date] CMD [date] CMD [date] CMD [date] CMD [date] CMD [date] CMD [date] CMD [date] CMD [date] CMD [date] CMD [date] CMD [date] CMD [date] CMD [date] CMD [date] CMD [date] CMD [date] CMD [date] CMD [date] CMD [date]

  3. Time the build step

Describe the results you received: With zfs, the above build takes 11 seconds

docker daemon -D --storage-driver=zfs

time docker build . -t date_image

… real 0m11.807s user 0m0.008s sys 0m0.008s

Describe the results you expected: When running with the default storage driver the above build takes 2 seconds

docker daemon -D -g /var/lib/docker_aufs

time docker build . -t date_image

… real 0m2.402s user 0m0.012s sys 0m0.008s

Additional information you deem important (e.g. issue happens only occasionally): This is just a basic example to demonstrate the timing difference. We experience the same sort of problems with larger, more complex images.

Output of docker version:

Client:
 Version:      1.12.1
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   23cf638
 Built:        Thu Aug 18 05:33:38 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.1
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   23cf638
 Built:        Thu Aug 18 05:33:38 2016
 OS/Arch:      linux/amd64

Output of docker info:

with zfs:
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 53
Server Version: 1.12.1
Storage Driver: zfs
 Zpool: zpool-docker
 Zpool Health: ONLINE
 Parent Dataset: zpool-docker/docker
 Space Used By Parent: 787317248
 Space Available: 7465794560
 Parent Quota: no
 Compression: off
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: null overlay bridge host
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: apparmor seccomp
Kernel Version: 4.4.0-47-generic
Operating System: Ubuntu 16.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 3.859 GiB
Name: linux-vm
ID: QJ3P:OINM:LIYM:26J3:IAU3:S2JW:UVU6:BBAJ:SH5Q:6N22:UV3W:PRLL
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 14
 Goroutines: 22
 System Time: 2017-02-22T00:43:16.86026941-05:00
 EventsListeners: 0
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Insecure Registries:
 127.0.0.0/8

without zfs Containers: 0 Running: 0 Paused: 0 Stopped: 0 Images: 53 Server Version: 1.12.1 Storage Driver: zfs Zpool: zpool-docker Zpool Health: ONLINE Parent Dataset: zpool-docker/docker Space Used By Parent: 787317248 Space Available: 7465794560 Parent Quota: no Compression: off Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: null overlay bridge host Swarm: inactive Runtimes: runc Default Runtime: runc Security Options: apparmor seccomp Kernel Version: 4.4.0-47-generic Operating System: Ubuntu 16.04.1 LTS OSType: linux Architecture: x86_64 CPUs: 1 Total Memory: 3.859 GiB Name: linux-vm ID: QJ3P:OINM:LIYM:26J3:IAU3:S2JW:UVU6:BBAJ:SH5Q:6N22:UV3W:PRLL Docker Root Dir: /var/lib/docker Debug Mode (client): false Debug Mode (server): true File Descriptors: 14 Goroutines: 22 System Time: 2017-02-22T00:43:16.86026941-05:00 EventsListeners: 0 Registry: https://index.docker.io/v1/ WARNING: No swap limit support Insecure Registries: 127.0.0.0/8

Additional environment details (AWS, VirtualBox, physical, etc.): The test above was run on virtual box. But we experience this behavior on raw hardware as well.

About this issue

  • Original URL
  • State: open
  • Created 7 years ago
  • Reactions: 18
  • Comments: 36 (9 by maintainers)

Most upvoted comments

Got fed up with this - builds that should take seconds take minutes per stage. Docker metadata performance decreases dramatically overtime as the number of legacy snapshots get out of hand. System stability decreases as docker activity increase, pretty sure the metadata acts as a global lock - docker hangs while waiting for zfs to list snapshots.

The ZFS driver is worse than useless right now.

I understand that things are complicated and a proper fix to this issue might be very difficult to build, and that docker might not have the developer resources to spend on this right now. However, I find it unacceptable that this crippling issue has been open for over a year while the docs cheerfully advertise the ZFS storage driver as something that seems like a good idea, without any mention of performance limitations. It never should have made it out of beta. The docs say

at this point in time it is not recommended to use the zfs Docker storage driver for production use unless you have substantial experience with ZFS on Linux

Baloney. Substantial ZFS experience cannot do anything about the fact that the ZFS storage driver uses legacy mountpoints.

I’m now using overlay2 on top of xfs, on top of ZFS. It’s several orders of magnitude faster now.

No excuses.

@felipelalli how is docker supposed to make zfs faster? Please stop adding useless messages here, this is not helping anyone.

I ran a couple performance tests with bonnie++ that I thought I’d share here. Host machine runs Ubuntu 18.04 bare metal and has a ZFS pool mirroring two Samsung SSDs.

First I benchmarked the ZFS filesystem directly, then I repeated the benchmark inside a docker container.

Results: bonnie_zfs.pdf

Most values just show a slight performance hit, except for: Sequential Create -> Create -> Latency -> 14x slower Sequential Output -> Rewrite -> Latency -> 7x slower

Docker version 18.05.0-ce, storage driver zfs, kernel 4.15.0-23.

@cpuguy83

This is completely unhelpful and does not belong here. If you want to see a change, please propose a change

I hear you! But I’m going to defend this. There’s nothing for one to get entitled about when it comes to open source projects which they are literally using for free, nor anyone to be frustrated towards, so on the one hand it’s pretty ridiculous to have strong feelings. On the other hand, this such a fundamental tool to modern software development and this issue is so ridiculous that I think it’s not unreasonable to say - “not good enough, we can do better” (who are you talking to man?).

Furthermore, although you and I may be just as empowered to fix this as anyone else in theory, we did not write docker and ZFS. And I’m guessing you also have a lot of other things you need to be doing… so in practice…

No excuses! And especially not that technical ownership has been federated out of existence.

At any rate I did propose a change, and if I get the chance it would be nice to fix the docs…

Does this get slower with with the increase in layers stored? Is the slowness dependent on the number of layers in a particular image or is it global to the number of layers stored in the driver?

It’s global - I’m pretty sure that a good amount of the slowness results from ZFS’s metadata structures not efficiently handling a bajillion legacy mountpoints. zfs list takes forever. I believe it also affects snapshot performance.

I can confirm that the difference is night and day. I now run a EXT4 fs on a ZVOL and things that would take minutes now take seconds.

Which is odd considering that the backing FS is still ZFS.

zfs/spl layers using datasets whilst efficient and powerful and pretty heavy weight compared to overlay2 layers

do we know why/if overlay2 works over zfs/zpl? and if not … what’s required to fix it?

that doesn’t resolve the core issue people are reporting here, but it would mean /var/lib/docker/ would exist on a zfs/zpl ds and behave much as it does for xfs and others

Hi! We are facing the same issue here. Please, fix it.

Can someone who gets fast docker builds with zfs please post their debug dockerd log for the example image?

overlay2 has a check that stops the daemon from starting if the backing filesystem is ZFS, but AUFS doesn’t.

I’ve been using AUFS on ZFS for close to 2 years of moderate usage, with no issues. It is way faster than the ZFS backend.

I am seeing exaggerated slowness on Ubuntu 16.x (with 32G of RAM) using ZFS storage driver as well.

FYI OpenZFS 2.2 has been released, it adds whiteout support, so, the overlay2 driver is now performant.

There is still one unresolved performance issue that makes overlayfs unmount slow, so, sometimes up to 10s penalty during container creation / destruction: https://github.com/openzfs/zfs/issues/15581 Eyeballs welcome.

Fast Clone Detection (should land in the next major ZFS release) provides a nice speedup when building Dockerfiles: https://github.com/zfsonlinux/zfs/commit/37f03da8ba6e1ab074b503e1dd63bfa7199d0537 I didn’t do any benchmarks but it’s noticable.

Well, letting zfs’s native mount on linux would delay boot and shutdown, because the system would import & mount all datasets, even unused container images.

Not exactly. You could set the canmount property to off to prevent a volume from being mounted.

I’m also facing this same performance issue.

Hey guys,

I’m facing same “issue” the build operation of docker images is taking too long on ZFS storage too, our setup is kinda different(VMs that runs on NFS but ZFS pool is created from local drives of the host machines)