spegel: imagePull from private repo is slow second time

imagepull second time from another node in same cluster should be fast but it took same time as of downloading from remote repo, looks like caching didnt work

here are the logs from spegel pod for xxx.yyy.io/app-image-k8:dev_123 image (masked repo and image details)

any pointers to look into what can be the issue?

{"level":"error","ts":1696247240.4278097,"caller":"gin@v0.0.9/logger.go:62","msg":"","path":"/v2/app-image-k8/manifests/dev_123","status":404,"method":"HEAD","latency":5.000973923,"ip":"10.14.130.153","handler":"mirror","error":"could not resolve mirror for key: xxx.yyy.io/app-image-k8:dev_123","stacktrace":"github.com/xenitab/pkg/gin.Logger.func1\n\t/go/pkg/mod/github.com/xenitab/pkg/gin@v0.0.9/logger.go:62\ngithub.com/gin-gonic/gin.(*Context).Next\n\t/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/context.go:174\ngithub.com/gin-gonic/gin.(*Engine).handleHTTPRequest\n\t/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/gin.go:620\ngithub.com/gin-gonic/gin.(*Engine).ServeHTTP\n\t/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/gin.go:576\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2936\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:1995"}

{"level":"error","ts":1696247731.0288842,"caller":"registry/registry.go:211","msg":"mirror failed attempting next","error":"expected mirror to respond with 200 OK but received: 500 Internal Server Error","stacktrace":"github.com/xenitab/spegel/internal/registry.(*Registry).handleMirror.func2\n\t/build/internal/registry/registry.go:211\nnet/http/httputil.(*ReverseProxy).modifyResponse\n\t/usr/local/go/src/net/http/httputil/reverseproxy.go:324\nnet/http/httputil.(*ReverseProxy).ServeHTTP\n\t/usr/local/go/src/net/http/httputil/reverseproxy.go:490\ngithub.com/xenitab/spegel/internal/registry.(*Registry).handleMirror\n\t/build/internal/registry/registry.go:217\ngithub.com/xenitab/spegel/internal/registry.(*Registry).registryHandler\n\t/build/internal/registry/registry.go:137\ngithub.com/gin-gonic/gin.(*Context).Next\n\t/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/context.go:174\ngithub.com/xenitab/spegel/internal/registry.(*Registry).metricsHandler\n\t/build/internal/registry/registry.go:271\ngithub.com/gin-gonic/gin.(*Context).Next\n\t/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/context.go:174\ngithub.com/gin-gonic/gin.CustomRecoveryWithWriter.func1\n\t/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/recovery.go:102\ngithub.com/gin-gonic/gin.(*Context).Next\n\t/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/context.go:174\ngithub.com/slok/go-http-metrics/middleware/gin.Handler.func1.1\n\t/go/pkg/mod/github.com/slok/go-http-metrics@v0.10.0/middleware/gin/gin.go:17\ngithub.com/slok/go-http-metrics/middleware.Middleware.Measure\n\t/go/pkg/mod/github.com/slok/go-http-metrics@v0.10.0/middleware/middleware.go:117\ngithub.com/slok/go-http-metrics/middleware/gin.Handler.func1\n\t/go/pkg/mod/github.com/slok/go-http-metrics@v0.10.0/middleware/gin/gin.go:16\ngithub.com/gin-gonic/gin.(*Context).Next\n\t/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/context.go:174\ngithub.com/xenitab/pkg/gin.Logger.func1\n\t/go/pkg/mod/github.com/xenitab/pkg/gin@v0.0.9/logger.go:28\ngithub.com/gin-gonic/gin.(*Context).Next\n\t/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/context.go:174\ngithub.com/gin-gonic/gin.(*Engine).handleHTTPRequest\n\t/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/gin.go:620\ngithub.com/gin-gonic/gin.(*Engine).ServeHTTP\n\t/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/gin.go:576\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2936\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:1995"}

About this issue

  • Original URL
  • State: closed
  • Created 9 months ago
  • Reactions: 1
  • Comments: 31 (11 by maintainers)

Most upvoted comments

discard_unpacked_layers = true in the containerd config is the issue. which repulls the blob layer every time making spegel slow

https://github.com/awslabs/amazon-eks-ami/blob/915ce2222e692a4d7b5904ab020d3c28e1e3e511/files/containerd-config.toml#L10

@phillebaba I’m also facing the same issue in EKS (v1.27) with AWS CNI.

But what I found was, spegel was able to successfully mirror all non-blob layers, but when the mirror request comes-in for an blob layer spegel returns 500 stating content digest sha256 not found <sha256_value> and this issue is because the containerd library spegel is using returns 404 for all blob layers.

I ssh’d into the EKS node and used the ctr cli, even with ctr I’m facing the same issue. I can able to get index/manifest/config layers using ./ctr content get sha256:<sha256_vaule_of_non_blob_layer>, but when I try to get a blob layer with digest ctr returns not found. I’ve used “quay.io/prometheus/alertmanager:v0.24.0” as reference, but the behavior is same will all registries.

  • verifying “alertmanager” image is already present in the node debug1

  • regctl cli output for list of layers present for “linux/amd” arch debug2

  • I was able to get the content of the image’s “manifest” type digest debug3

  • When I try to retrieve a blob layer ctr returns not found, which the same behaviour we getting from containerd’s golang package debug4

  • But, if I use code with function similar to this https://github.com/XenitAB/spegel/blob/4a190529ade0eabbfcdc107c8173ef39b6c2f3b8/internal/oci/containerd.go#L150, I was able to get the list of all layers, but still not able to retrieve the blob layer’s data debug5.

If I use ctr to pull the image like ctr image pull quay.io/quay/busybox:latest then everything is working fine, I can able to retrieve the content of blob layers. But if I use an image which is already present in the machine (pulled via kubelet), then I can’t able to get the blob data.

If I try to repull the same alertmanager image via ctr It freshly pulls all the blob layers, but it reports all the non-blob layers as “exists” image

@tico24 in your example you create a new config file. Is this acutally required? I think that it would be nice to have an example that overrides the existing configuration file.

Yup, it’s required. The old configfile gets re-written to default after this script runs. Blame Jeff Bezos for that one.

@infa-nang thanks for the great detective work determining the root cause of this issue!

The kind cluster used for the e2e tests sets the same configuration. https://github.com/XenitAB/spegel/blob/4a190529ade0eabbfcdc107c8173ef39b6c2f3b8/e2e/kind-config.yaml#L7-L11

There are now three main tasks that need to be done.

  • Document the containerd configuration in the compatibility guide to help others.
  • Document configuration required to override setting in EKS.
  • Implement a check and maybe log a warning if this is detected, alternatively exit.
  • Create a performance test to document expected performance from Spegel.

I have found a regression causing this issue and am working on a fix. In the meantime the best solution is to reduce the mirrorResolveTimeout to a low value like “50ms”.

@phillebaba Updated above. @tico24 I had to omit the newline character from the sed inline amend to get it to work, not sure if karpenter was mangling it when launching the instance but confirmed on the instance via SSM shell, that field had a prefix n character on the bootstrap.sh file.

Additionally I can confirm spegel’s logs are much more 200 and much less 500/404 so that’s encouraging. Container start speed is better anecdotally. I’m also using nvme backed instances with the containerd layer cache on the local nvme disk for even more performance.

Another caveat here is I’ve only deployed AL2 backed AMIs, it’s going to need to be tested against other types…

Just confirming that my world is much happier after tweaking the containerd config. If it’s useful, here’s how I did it using the eks terraform module:

      pre_bootstrap_user_data = <<-EOT
        #!/bin/bash
        cat /etc/containerd/config.toml | sed -E 's/discard_unpacked_layers = true/discard_unpacked_layers = false/g' > /etc/containerd/config-new.toml

        set -ex
        cat <<-EOF > /etc/profile.d/bootstrap.sh
        export CONTAINERD_CONFIG_FILE=/etc/containerd/config-new.toml
        EOF
        # Source extra environment variables in bootstrap script
        sed -i '/^set -o errexit/a\\nsource /etc/profile.d/bootstrap.sh' /etc/eks/bootstrap.sh
      EOT

I downgraded AWS CNI from 0.15.x to 0.14.x and 0.13.x in the hopes that maybe the newer feature whereby it obeys networkPolicies might be causing the issue.

Unfortunately it is still not working with AWS CNI 0.13.

spegel-78c2f registry {"level":"info","ts":1696918978.0747795,"caller":"gin@v0.0.9/logger.go:53","msg":"","path":"/v2/library/ubuntu/manifests/latest","status":200,"method":"HEAD","latency":0.003605546,"ip":"10.207.61.216","handler":"mirror"}                                                                                                                                                     spegel-2kz5z registry {"level":"info","ts":1696918978.074557,"caller":"gin@v0.0.9/logger.go:53","msg":"","path":"/v2/library/ubuntu/manifests/latest","status":200,"method":"HEAD","latency":0.001763525,"ip":"10.207.61.216","handler":"manifest"}
spegel-78c2f registry {"level":"info","ts":1696918978.0826175,"caller":"gin@v0.0.9/logger.go:53","msg":"","path":"/v2/library/ubuntu/manifests/sha256:9b8dec3bf938bc80fbe758d856e96fdfab5f56c39d44b0cff351e847bb1b01ea","status":200,"method":"GET","latency":0.002599054,"ip":"10.207.61.216","handler":"mirror"}                                                                                     spegel-2kz5z registry {"level":"info","ts":1696918978.0823524,"caller":"gin@v0.0.9/logger.go:53","msg":"","path":"/v2/library/ubuntu/manifests/sha256:9b8dec3bf938bc80fbe758d856e96fdfab5f56c39d44b0cff351e847bb1b01ea","status":200,"method":"GET","latency":0.0013331,"ip":"10.207.61.216","handler":"manifest"}
spegel-78c2f registry {"level":"error","ts":1696918983.0943055,"caller":"gin@v0.0.9/logger.go:62","msg":"","path":"/v2/library/ubuntu/manifests/sha256:f58b48967ecc767fc238144ffdb7eb668cefcc8438de8f8a59c4cefbbf29b323","status":404,"method":"GET","latency":5.000980193,"ip":"10.207.61.216","handler":"mirror","error":"could not resolve mirror for key: sha256:f58b48967ecc767fc238144ffdb7eb668cefcc8438de8f8a59c4cefbbf29b323","stacktrace":"github.com/xenitab/pkg/gin.Logger.func1\n\t/go/pkg/mod/github.com/xenitab/pkg/gin@v0.0.9/logger.go:62\ngithub.com/gin-gonic/gin.(*Context).Next\n\t/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/context.go:174\ngithub.com/gin-gonic/gin.(*Engine).handleHTTPRequest\n\t/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/gin.go:620\ngithub.com/gin-gonic/gin.(
*Engine).ServeHTTP\n\t/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/gin.go:576\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2936\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:1995"}                                                                                                                                                                     spegel-2kz5z registry {"level":"info","ts":1696918983.0951376,"caller":"registry/registry.go:174","msg":"handling mirror request from external node","path":"/v2/library/ubuntu/manifests/sha256:f58b48967ecc767fc238144ffdb7eb668cefcc8438de8f8a59c4cefbbf29b323","ip":"10.207.61.216"}
spegel-2kz5z registry {"level":"error","ts":1696918988.095955,"caller":"gin@v0.0.9/logger.go:62","msg":"","path":"/v2/library/ubuntu/manifests/sha256:f58b48967ecc767fc238144ffdb7eb668cefcc8438de8f8a59c4cefbbf29b323","status":404,"method":"GET","latency":5.000852283,"ip":"10.207.61.216","handler":"mirror","error":"could not resolve mirror for key: sha256:f58b48967ecc767fc238144ffdb7eb668cefcc8438de8f8a59c4cefbbf29b323","stacktrace":"github.com/xenitab/pkg/gin.Logger.func1\n\t/go/pkg/mod/github.com/xenitab/pkg/gin@v0.0.9/logger.go:62\ngithub.com/gin-gonic/gin.(*Context).Next\n\t/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/context.go:174\ngithub.com/gin-gonic/gin.(*Engine).handleHTTPRequest\n\t/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/gin.go:620\ngithub.com/gin-gonic/gin.(*
Engine).ServeHTTP\n\t/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/gin.go:576\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2936\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:1995"}                                                                                                                                                                      spegel-2kz5z registry {"level":"info","ts":1696918993.2968013,"caller":"registry/registry.go:174","msg":"handling mirror request from external node","path":"/v2/library/ubuntu/blobs/sha256:bf9e5b7213b4e0d99cddc039011cc60bfd76ed5ef63bbd837ab0b8416c305c39","ip":"10.207.61.216"}
spegel-78c2f registry {"level":"error","ts":1696918993.2962723,"caller":"gin@v0.0.9/logger.go:62","msg":"","path":"/v2/library/ubuntu/blobs/sha256:bf9e5b7213b4e0d99cddc039011cc60bfd76ed5ef63bbd837ab0b8416c305c39","status":404,"method":"GET","latency":5.00064066,"ip":"10.207.61.216","handler":"mirror","error":"could not resolve mirror for key: sha256:bf9e5b7213b4e0d99cddc039011cc60bfd76ed5ef63bbd837ab0b8416c305c39","stacktrace":"github.com/xenitab/pkg/gin.Logger.func1\n\t/go/pkg/mod/github.com/xenitab/pkg/gin@v0.0.9/logger.go:62\ngithub.com/gin-gonic/gin.(*Context).Next\n\t/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/context.go:174\ngithub.com/gin-gonic/gin.(*Engine).handleHTTPRequest\n\t/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/gin.go:620\ngithub.com/gin-gonic/gin.(*Engi
ne).ServeHTTP\n\t/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/gin.go:576\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2936\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:1995"}                                                                                                                                                                          spegel-2kz5z registry {"level":"error","ts":1696918998.2970965,"caller":"gin@v0.0.9/logger.go:62","msg":"","path":"/v2/library/ubuntu/blobs/sha256:bf9e5b7213b4e0d99cddc039011cc60bfd76ed5ef63bbd837ab0b8416c305c39","status":404,"method":"GET","latency":5.000317961,"ip":"10.207.61.216","handler":"mirror","error":"could not resolve mirror for key: sha256:bf9e5b7213b4e0d99cddc039011cc60bfd76ed
5ef63bbd837ab0b8416c305c39","stacktrace":"github.com/xenitab/pkg/gin.Logger.func1\n\t/go/pkg/mod/github.com/xenitab/pkg/gin@v0.0.9/logger.go:62\ngithub.com/gin-gonic/gin.(*Context).Next\n\t/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/context.go:174\ngithub.com/gin-gonic/gin.(*Engine).handleHTTPRequest\n\t/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/gin.go:620\ngithub.com/gin-gonic/gin.(*Engine).ServeHTTP\n\t/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/gin.go:576\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2936\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:1995"}
spegel-2kz5z registry {"level":"info","ts":1696919003.6734476,"caller":"registry/registry.go:174","msg":"handling mirror request from external node","path":"/v2/library/ubuntu/blobs/sha256:c391327a0f1df5790bcb00ac010a1a3924c9e4387ce36b290bc16fd9f4cc5740","ip":"10.207.61.216"}                                                                                                                   spegel-78c2f registry {"level":"error","ts":1696919003.6729622,"caller":"gin@v0.0.9/logger.go:62","msg":"","path":"/v2/library/ubuntu/blobs/sha256:c391327a0f1df5790bcb00ac010a1a3924c9e4387ce36b290bc16fd9f4cc5740","status":404,"method":"GET","latency":5.001155381,"ip":"10.207.61.216","handler":"mirror","error":"could not resolve mirror for key: sha256:c391327a0f1df5790bcb00ac010a1a3924c9e4
387ce36b290bc16fd9f4cc5740","stacktrace":"github.com/xenitab/pkg/gin.Logger.func1\n\t/go/pkg/mod/github.com/xenitab/pkg/gin@v0.0.9/logger.go:62\ngithub.com/gin-gonic/gin.(*Context).Next\n\t/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/context.go:174\ngithub.com/gin-gonic/gin.(*Engine).handleHTTPRequest\n\t/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/gin.go:620\ngithub.com/gin-gonic/gin.(*Engine).ServeHTTP\n\t/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/gin.go:576\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2936\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:1995"}
spegel-2kz5z registry {"level":"error","ts":1696919008.6743145,"caller":"gin@v0.0.9/logger.go:62","msg":"","path":"/v2/library/ubuntu/blobs/sha256:c391327a0f1df5790bcb00ac010a1a3924c9e4387ce36b290bc16fd9f4cc5740","status":404,"method":"GET","latency":5.000890982,"ip":"10.207.61.216","handler":"mirror","error":"could not resolve mirror for key: sha256:c391327a0f1df5790bcb00ac010a1a3924c9e4387ce36b290bc16fd9f4cc5740","stacktrace":"github.com/xenitab/pkg/gin.Logger.func1\n\t/go/pkg/mod/github.com/xenitab/pkg/gin@v0.0.9/logger.go:62\ngithub.com/gin-gonic/gin.(*Context).Next\n\t/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/context.go:174\ngithub.com/gin-gonic/gin.(*Engine).handleHTTPRequest\n\t/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/gin.go:620\ngithub.com/gin-gonic/gin.(*Eng
ine).ServeHTTP\n\t/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/gin.go:576\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2936\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:1995"}                                                                                                                                                                         spegel-78c2f registry {"level":"info","ts":1696919009.5786955,"caller":"state/state.go:49","msg":"received image event","image":"docker.io/library/ubuntu:latest@sha256:9b8dec3bf938bc80fbe758d856e96fdfab5f56c39d44b0cff351e847bb1b01ea"}
spegel-78c2f registry {"level":"info","ts":1696919009.5839336,"caller":"state/state.go:49","msg":"received image event","image":"docker.io/library/ubuntu:latest@sha256:9b8dec3bf938bc80fbe758d856e96fdfab5f56c39d44b0cff351e847bb1b01ea"}                                                                                                                                                             spegel-78c2f registry {"level":"info","ts":1696919009.5862403,"caller":"state/state.go:49","msg":"received image event","image":"docker.io/library/ubuntu"}