skopeo: Sync is very, very slow.

Hello,

Does anybody have any thoughts on how to speed up a sync. We mirror a lot of external repos on mass for all versions and the process can take several hours layer by layer, tag by tag, repo by repo.

Is there anyway to parallel to run for every repo ? I’ve thought about maybe splitting each external repo in to separate yamls and run more skopeo commands at same time.

skopeo --insecure-policy sync --src yaml --dest docker src_mirrors.yaml dest.internal.repo/mirrors/ --scoped --dest-creds X:X

docker.io:
    images:
        istio/proxyv2: []
        sergrua/kube-tagger: []
        splunk/fluentd-hec: []
        hazelcast/management-center: []
        k8srestdev/scaling: []
        hjacobs/kube-resource-report: []
        tutum/dnsutils: []
        bitnami/external-dns: []
        falcosecurity/falco: []
        praqma/helmsman: []
        weaveworks/flagger: []
        weaveworks/flagger-loadtester: []
quay.io:
    images:
        coreos/prometheus-config-reloader: []
        prometheus/prometheus: []
        prometheus/alertmanager: []
        prometheus/node-exporter: []
        kiali/kiali: []
        coreos/configmap-reload: []
        coreos/prometheus-operator: []
        thanos/thanos: []
        calico/node: []
        coreos/kube-state-metrics: []
        calico/typha: []
        external_storage/efs-provisioner: []
gcr.io:
    images: 
        kubernetes-helm/tiller: []
        google_containers/busybox: []
        k8s-artifacts-prod/autoscaling: []
k8s.gcr.io:
    images:
        cluster-proportional-autoscaler-amd64: []
        metrics-server-amd64: [] 

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 4
  • Comments: 23 (8 by maintainers)

Most upvoted comments

Some good news - https://github.com/containers/image/pull/1041 reduces the sync time of about 650 images from ~35mins to ~7mins (!)

With roughly the simply change you described in containers/image/copy.copyOneImage

So this seems like a nice performance win for the use case of synchronizing a large number of images which are expected to already exist.

The PR is still very much a draft, but this functionality would certainly be a good to have IMHO for this usage pattern. If you would like to consider please @mtrmac

skopeo_sync

Thanks, that’s very encouraging. I’m not sure I’ll have the time to look into the details of the behavior this week, but it’s definitely an important improvement to prioritize.

I think it’s just too many round trips. Some images have 4,000 tags so maybe it’s just as everything is serial it takes a very long time.

Ouch. Yeah, in the best possible case reading 4000 manifests to see whether the tags have moved (and that can only be done one tag at a time) is going to take some time. OTOH if most of the time is spent waiting, parallelizing this across a few threads might noticeably help. The code structure right now makes that a bit difficult (the checks are part of a copy, and a copy automatically parallelizes copies of individual layers, so a dozen copies * 6 layer threads = 70 parallel TCP streams), it might make sense to do a “presence” check before starting a copy if almost all images are not added/updated.

@samdoran and I forgot to add, TYVM for sharing the results!

I can confirm with skopeo 1.2.3 sycing is much faster because nothing is copied if it already exists in the destination. Our daily skopeo sync went from ~20 minutes to less than five minutes. 🎊

More Faster Sync

Would this be difficult to implement? If easy enough, and you can point me to roughly where the changes could be made. I could patch a local build and see what improvement this might make?

That would be interesting. Very roughly, in containers/image/copy.copyOneImage, somewhere after IsRunningImageAllowed (exact placement TBD, probably affects correctness!, but this could be good enough for a basic performance estimate), compare that the byte arrays returned by unparsedImage.Manifest() and c.dest.Reference().NewImageSource().GetManifest() (+ appropriate error checking on all the calls), and claim success if they are equal.

From looking at the source, it seems sync implements a ‘copy’ for each tag (reasonable of course). But in case there are any possible optimization where the destination is expected to exist unmodified? In particular the writing of the manifest seems to always take ~300ms, but should be unchanged?

The “Copying blob/Writing manifest” entries record starts of that activity, and we do up to 6 blob copies in parallel. So, going by this specific record, if I’m reading it right, reading the original manifest and config took 0 ms, ??? set up took 260 ms, checks that the layers are present took 523 ms, writing the config 263 ms, writing the manifest 0 ms. Several of those times just don’t make sense to me; but it’s certainly true that the layer checks do take a big part of that.

Yes, copies are implemented in a way that assumes the destination needs to be updated; an extra “is the destination already exactly what we want it to be” check would, for most copies, be just an extra delay. OTOH for sync it’s probably the other way around, and it may well make sense to opt in to doing that check (maybe just assuming that the image does not need converting). Or, in the extreme, don’t ask the destination at all, and maintain a local state file that records the last manifest digest written to a particular tag. (We would probably still have to query each individual tag at the source, all 4000 or however many of them.)

This sync thing is starting to grow fairly complex…

I have the same sort of problem with sync (which is very nice by the way!). The use case is to ensure ‘remote’ registries are in sync with an internal/source registry. And we run a periodic job which ‘duplicates’ out internal image/tags. But where generally the majority of the content is unmodified.

The below timestamps are for a specific image copy (where the destination image already exists). Which took just over 1s. This is 11 minutes or so for the 685 tags (and there are a few more images).

From looking at the source, it seems sync implements a ‘copy’ for each tag (reasonable ,of course). But in case where the destination is expected to exist unmodified, are there any possible optimizations? In particular the writing of the manifest seems to always take ~300ms, but should be unchanged?

[2020-08-25T15:45:20.030Z] time="2020-08-25T15:45:19Z" level=info msg="Copying image tag 2/685" from="docker://registry.mycompany.com/myimage:1.104.3700-224" to="docker://123456.dkr.ecr.us-east-1.amazonaws.com/mydept/myimage1.104.3700-224"
[2020-08-25T15:45:20.030Z] Getting image source signatures
[2020-08-25T15:45:20.290Z] Copying blob sha256:1d2c4ce43b78cb9a97ede7f19ad1406a43ee50532568bda660193e4a404b424f
[2020-08-25T15:45:20.290Z] Copying blob sha256:ab36d12d348c0bea37250f280d104b217e4a41eb0138810bf08a4bc2217ebc9a
[2020-08-25T15:45:20.290Z] Copying blob sha256:1bb3415aff5dfdb4d01df8cd5fc5e8e83a020fd94ba26666dfdf8f2210cf1128
[2020-08-25T15:45:20.551Z] Copying blob sha256:8675d36fe29c085721a38ecb2904f1eee2d961f10bbd571da44f285237a9bdc4
[2020-08-25T15:45:20.551Z] Copying blob sha256:36b5a23020d4b97a70093feaa7d20908a9df33a0c25e18b2d25475a602014f4d
[2020-08-25T15:45:20.551Z] Copying blob sha256:1c9f515fc6ab2b7ebfcaffd8af681b68869d78a3b19c69e87c296363ab1bc2fe
[2020-08-25T15:45:20.551Z] Copying blob sha256:8f0a11903a588d724fc210eee6332b02c21d419d44928be199f8221597019b88
[2020-08-25T15:45:20.551Z] Copying blob sha256:e0190a56de136e88d9d03def88bac4be75183c25092c414b7a763baaf69b1366
[2020-08-25T15:45:20.551Z] Copying blob sha256:2da361b4af751864acdb25852798a7111747ba7d8553efe81db13c3c65594e54
[2020-08-25T15:45:20.551Z] Copying blob sha256:022f7122d0d11a32f6494b8b56d43139f79ea412c9fb83ac2bce7a67dd904748
[2020-08-25T15:45:20.551Z] Copying blob sha256:5fb3ed35e1ccebc8b1c114d7dd1ddb5eab87ea8cac4151cf26c6871e2eed102f
[2020-08-25T15:45:20.813Z] Copying blob sha256:1ca6ec59c1f4bad31d655fd2dcc7542d2419170cbb98b18a3965f668cacdf8e2
[2020-08-25T15:45:20.813Z] Copying blob sha256:8686faffd55b78f473033906abbe5304d5484c04c2e129124df32483d1052678
[2020-08-25T15:45:20.813Z] Copying config sha256:eb21d52c12296d28abaae33ee35de840c7d11d0e190f1e45091f5a3325327d14
[2020-08-25T15:45:21.076Z] Writing manifest to image destination
[2020-08-25T15:45:21.076Z] Storing signatures
[2020-08-25T15:45:21.076Z] time="2020-08-25T15:45:21Z" level=info msg="Copying image tag 3/685" from="...