moby: docker build extremely slow with zfs
Description
Docker builds with uncached layers are extremely slow when using the zfs storage driver. Our images have multiple layers. Images which used to build in 10-20 seconds now take over two minutes with zfs. We have some images with over 50 layers that now take 15 minutes to build.
Steps to reproduce the issue:
-
Launch docker with the zfs storage driver: docker daemon -D --storage-driver=zfs
-
Create a docker file with lots of layers. Example: FROM scratch CMD [date] CMD [date] CMD [date] CMD [date] CMD [date] CMD [date] CMD [date] CMD [date] CMD [date] CMD [date] CMD [date] CMD [date] CMD [date] CMD [date] CMD [date] CMD [date] CMD [date] CMD [date] CMD [date] CMD [date]
-
Time the build step
Describe the results you received: With zfs, the above build takes 11 seconds
docker daemon -D --storage-driver=zfs
time docker build . -t date_image
… real 0m11.807s user 0m0.008s sys 0m0.008s
Describe the results you expected: When running with the default storage driver the above build takes 2 seconds
docker daemon -D -g /var/lib/docker_aufs
time docker build . -t date_image
… real 0m2.402s user 0m0.012s sys 0m0.008s
Additional information you deem important (e.g. issue happens only occasionally): This is just a basic example to demonstrate the timing difference. We experience the same sort of problems with larger, more complex images.
Output of docker version
:
Client:
Version: 1.12.1
API version: 1.24
Go version: go1.6.3
Git commit: 23cf638
Built: Thu Aug 18 05:33:38 2016
OS/Arch: linux/amd64
Server:
Version: 1.12.1
API version: 1.24
Go version: go1.6.3
Git commit: 23cf638
Built: Thu Aug 18 05:33:38 2016
OS/Arch: linux/amd64
Output of docker info
:
with zfs:
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 53
Server Version: 1.12.1
Storage Driver: zfs
Zpool: zpool-docker
Zpool Health: ONLINE
Parent Dataset: zpool-docker/docker
Space Used By Parent: 787317248
Space Available: 7465794560
Parent Quota: no
Compression: off
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: null overlay bridge host
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: apparmor seccomp
Kernel Version: 4.4.0-47-generic
Operating System: Ubuntu 16.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 3.859 GiB
Name: linux-vm
ID: QJ3P:OINM:LIYM:26J3:IAU3:S2JW:UVU6:BBAJ:SH5Q:6N22:UV3W:PRLL
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
File Descriptors: 14
Goroutines: 22
System Time: 2017-02-22T00:43:16.86026941-05:00
EventsListeners: 0
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Insecure Registries:
127.0.0.0/8
without zfs Containers: 0 Running: 0 Paused: 0 Stopped: 0 Images: 53 Server Version: 1.12.1 Storage Driver: zfs Zpool: zpool-docker Zpool Health: ONLINE Parent Dataset: zpool-docker/docker Space Used By Parent: 787317248 Space Available: 7465794560 Parent Quota: no Compression: off Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: null overlay bridge host Swarm: inactive Runtimes: runc Default Runtime: runc Security Options: apparmor seccomp Kernel Version: 4.4.0-47-generic Operating System: Ubuntu 16.04.1 LTS OSType: linux Architecture: x86_64 CPUs: 1 Total Memory: 3.859 GiB Name: linux-vm ID: QJ3P:OINM:LIYM:26J3:IAU3:S2JW:UVU6:BBAJ:SH5Q:6N22:UV3W:PRLL Docker Root Dir: /var/lib/docker Debug Mode (client): false Debug Mode (server): true File Descriptors: 14 Goroutines: 22 System Time: 2017-02-22T00:43:16.86026941-05:00 EventsListeners: 0 Registry: https://index.docker.io/v1/ WARNING: No swap limit support Insecure Registries: 127.0.0.0/8
Additional environment details (AWS, VirtualBox, physical, etc.): The test above was run on virtual box. But we experience this behavior on raw hardware as well.
About this issue
- Original URL
- State: open
- Created 7 years ago
- Reactions: 18
- Comments: 36 (9 by maintainers)
Got fed up with this - builds that should take seconds take minutes per stage. Docker metadata performance decreases dramatically overtime as the number of legacy snapshots get out of hand. System stability decreases as docker activity increase, pretty sure the metadata acts as a global lock - docker hangs while waiting for zfs to list snapshots.
The ZFS driver is worse than useless right now.
I understand that things are complicated and a proper fix to this issue might be very difficult to build, and that docker might not have the developer resources to spend on this right now. However, I find it unacceptable that this crippling issue has been open for over a year while the docs cheerfully advertise the ZFS storage driver as something that seems like a good idea, without any mention of performance limitations. It never should have made it out of beta. The docs say
Baloney. Substantial ZFS experience cannot do anything about the fact that the ZFS storage driver uses legacy mountpoints.
I’m now using
overlay2
on top ofxfs
, on top ofZFS
. It’s several orders of magnitude faster now.No excuses.
@felipelalli how is docker supposed to make zfs faster? Please stop adding useless messages here, this is not helping anyone.
I ran a couple performance tests with
bonnie++
that I thought I’d share here. Host machine runs Ubuntu 18.04 bare metal and has a ZFS pool mirroring two Samsung SSDs.First I benchmarked the ZFS filesystem directly, then I repeated the benchmark inside a docker container.
Results: bonnie_zfs.pdf
Most values just show a slight performance hit, except for: Sequential Create -> Create -> Latency -> 14x slower Sequential Output -> Rewrite -> Latency -> 7x slower
Docker version
18.05.0-ce
, storage driverzfs
, kernel4.15.0-23
.@cpuguy83
I hear you! But I’m going to defend this. There’s nothing for one to get entitled about when it comes to open source projects which they are literally using for free, nor anyone to be frustrated towards, so on the one hand it’s pretty ridiculous to have strong feelings. On the other hand, this such a fundamental tool to modern software development and this issue is so ridiculous that I think it’s not unreasonable to say - “not good enough, we can do better” (who are you talking to man?).
Furthermore, although you and I may be just as empowered to fix this as anyone else in theory, we did not write docker and ZFS. And I’m guessing you also have a lot of other things you need to be doing… so in practice…
No excuses! And especially not that technical ownership has been federated out of existence.
At any rate I did propose a change, and if I get the chance it would be nice to fix the docs…
It’s global - I’m pretty sure that a good amount of the slowness results from ZFS’s metadata structures not efficiently handling a bajillion legacy mountpoints.
zfs list
takes forever. I believe it also affects snapshot performance.I can confirm that the difference is night and day. I now run a EXT4 fs on a ZVOL and things that would take minutes now take seconds.
Which is odd considering that the backing FS is still ZFS.
zfs/spl layers using datasets whilst efficient and powerful and pretty heavy weight compared to overlay2 layers
do we know why/if overlay2 works over zfs/zpl? and if not … what’s required to fix it?
that doesn’t resolve the core issue people are reporting here, but it would mean
/var/lib/docker/
would exist on a zfs/zpl ds and behave much as it does for xfs and othersHi! We are facing the same issue here. Please, fix it.
Can someone who gets fast docker builds with zfs please post their debug dockerd log for the example image?
overlay2 has a check that stops the daemon from starting if the backing filesystem is ZFS, but AUFS doesn’t.
I’ve been using AUFS on ZFS for close to 2 years of moderate usage, with no issues. It is way faster than the ZFS backend.
I am seeing exaggerated slowness on Ubuntu 16.x (with 32G of RAM) using ZFS storage driver as well.
FYI OpenZFS 2.2 has been released, it adds whiteout support, so, the
overlay2
driver is now performant.There is still one unresolved performance issue that makes overlayfs unmount slow, so, sometimes up to 10s penalty during container creation / destruction: https://github.com/openzfs/zfs/issues/15581 Eyeballs welcome.
Fast Clone Detection (should land in the next major ZFS release) provides a nice speedup when building Dockerfiles: https://github.com/zfsonlinux/zfs/commit/37f03da8ba6e1ab074b503e1dd63bfa7199d0537 I didn’t do any benchmarks but it’s noticable.
Not exactly. You could set the
canmount
property tooff
to prevent a volume from being mounted.I’m also facing this same performance issue.
Hey guys,
I’m facing same “issue” the build operation of docker images is taking too long on ZFS storage too, our setup is kinda different(VMs that runs on NFS but ZFS pool is created from local drives of the host machines)