distribution: Registry becomes permanently unavailable after 30 seconds, using S3 driver

When we spin up a registry on an EC2-instance with an attached IAM instance role (giving access to the backing S3 bucket), the registry becomes consistently unavailable after about 30 seconds. During the initial “grace-period” it seems to work fine, and thereafter it only responds with HTTP - 503.

The above happens every time with registry:2.6.1, have not been able to reproduce so far with registry:2.5.1

I suspect this is the same problem being described in #2283 also.

The specifics of our setup:

config.yml:

version: 0.1
log:
  level: debug
  fields:
    service: registry
storage:
  cache:
    blobdescriptor: inmemory
http:
  addr: :5000
  secret: REDACTED
  debug:
    addr: :5001
headers:
    X-Content-Type-Options: [nosniff]
health:
  storagedriver:
    enabled: true
    interval: 10s
    threshold: 3

The invocation:

docker run -d -p 5000:5000 -p 5001:5001 --restart=always --name registry \
 -v `pwd`/config.yml:/etc/docker/registry/config.yml \
 -e "REGISTRY_STORAGE=s3" \
 -e "REGISTRY_STORAGE_S3_REGION=eu-west-1" \
 -e "REGISTRY_STORAGE_S3_BUCKET=REDACTED" \
 registry:2.6.1

Right after running the container or restarting it we see this:

$ curl -v http://localhost:5000
* Rebuilt URL to: http://localhost:5000/
*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 5000 (#0)
> GET / HTTP/1.1
> Host: localhost:5000
> User-Agent: curl/7.47.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Cache-Control: no-cache
< Date: Tue, 30 May 2017 15:20:21 GMT
< Content-Length: 0
< Content-Type: text/plain; charset=utf-8
<
* Connection #0 to host localhost left intact

$ curl -v http://localhost:5001/debug/health
*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 5001 (#0)
> GET /debug/health HTTP/1.1
> Host: localhost:5001
> User-Agent: curl/7.47.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Length: 2
< Content-Type: application/json; charset=utf-8
< Date: Tue, 30 May 2017 15:20:25 GMT
<
* Connection #0 to host localhost left intact
{}
$

Then after about 30 seconds, the following:

$ curl -v http://localhost:5000
* Rebuilt URL to: http://localhost:5000/
*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 5000 (#0)
> GET / HTTP/1.1
> Host: localhost:5000
> User-Agent: curl/7.47.0
> Accept: */*
>
< HTTP/1.1 503 Service Unavailable
< Content-Type: application/json; charset=utf-8
< Date: Tue, 30 May 2017 14:56:51 GMT
< Content-Length: 125
<
{"errors":[{"code":"UNAVAILABLE","message":"service unavailable","detail":"health check failed: please see /debug/health"}]}

$ curl -v http://localhost:5001/debug/health
*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 5001 (#0)
> GET /debug/health HTTP/1.1
> Host: localhost:5001
> User-Agent: curl/7.47.0
> Accept: */*
>
< HTTP/1.1 503 Service Unavailable
< Content-Length: 47
< Content-Type: application/json; charset=utf-8
< Date: Tue, 30 May 2017 14:56:54 GMT
<
* Connection #0 to host localhost left intact
{"storagedriver_s3":"s3aws: Path not found: /"}

The registry logs:

$ docker logs --tail=200 -f -t registry
2017-05-30T14:40:33.327662022Z time="2017-05-30T14:40:33.327382019Z" level=info msg="redis not configured" go.version=go1.7.3 instance.id=42b8d328-f5cb-4bb3-b3f2-7a93d77febe0 service=registry version=v2.6.1
2017-05-30T14:40:33.336419060Z time="2017-05-30T14:40:33.336028387Z" level=info msg="debug server listening :5001"
2017-05-30T14:40:33.336430327Z time="2017-05-30T14:40:33.336173487Z" level=info msg="Starting upload purge in 45m0s" go.version=go1.7.3 instance.id=42b8d328-f5cb-4bb3-b3f2-7a93d77febe0 service=registry version=v2.6.1
2017-05-30T14:40:33.348947098Z time="2017-05-30T14:40:33.348727425Z" level=info msg="listening on [::]:5000" go.version=go1.7.3 instance.id=42b8d328-f5cb-4bb3-b3f2-7a93d77febe0 service=registry version=v2.6.1
2017-05-30T14:40:43.448078780Z time="2017-05-30T14:40:43.447797929Z" level=debug msg="s3aws.Stat(\"/\")" go.version=go1.7.3 instance.id=42b8d328-f5cb-4bb3-b3f2-7a93d77febe0 service=registry trace.duration=98.746543ms trace.file="/go/src/github.com/docker/distribution/registry/storage/driver/base/base.go" trace.func="github.com/docker/distribution/registry/storage/driver/base.(*Base).Stat" trace.id=11f43f27-dfb4-4fde-b67b-b40eb3b27d8e trace.line=137 version=v2.6.1
2017-05-30T14:40:53.238568918Z 172.17.0.1 - - [30/May/2017:14:40:53 +0000] "GET / HTTP/1.1" 200 0 "" "curl/7.47.0"
2017-05-30T14:40:53.493194665Z time="2017-05-30T14:40:53.492892001Z" level=debug msg="s3aws.Stat(\"/\")" go.version=go1.7.3 instance.id=42b8d328-f5cb-4bb3-b3f2-7a93d77febe0 service=registry trace.duration=143.883466ms trace.file="/go/src/github.com/docker/distribution/registry/storage/driver/base/base.go" trace.func="github.com/docker/distribution/registry/storage/driver/base.(*Base).Stat" trace.id=b25a47e5-f0ad-4f85-adc0-53eed3245b0d trace.line=137 version=v2.6.1
2017-05-30T14:40:54.873493591Z 172.17.0.1 - - [30/May/2017:14:40:54 +0000] "GET / HTTP/1.1" 200 0 "" "curl/7.47.0"
2017-05-30T14:40:55.717273385Z 172.17.0.1 - - [30/May/2017:14:40:55 +0000] "GET / HTTP/1.1" 200 0 "" "curl/7.47.0"
2017-05-30T14:40:56.392247414Z 172.17.0.1 - - [30/May/2017:14:40:56 +0000] "GET / HTTP/1.1" 200 0 "" "curl/7.47.0"
2017-05-30T14:41:00.510609780Z 172.17.0.1 - - [30/May/2017:14:41:00 +0000] "GET / HTTP/1.1" 200 0 "" "curl/7.47.0"
2017-05-30T14:41:03.380975062Z time="2017-05-30T14:41:03.380695246Z" level=debug msg="s3aws.Stat(\"/\")" go.version=go1.7.3 instance.id=42b8d328-f5cb-4bb3-b3f2-7a93d77febe0 service=registry trace.duration=31.651662ms trace.file="/go/src/github.com/docker/distribution/registry/storage/driver/base/base.go" trace.func="github.com/docker/distribution/registry/storage/driver/base.(*Base).Stat" trace.id=49db908e-074b-40f4-8471-3150ff080abf trace.line=137 version=v2.6.1
2017-05-30T14:41:06.701383438Z 172.17.0.1 - - [30/May/2017:14:41:06 +0000] "GET / HTTP/1.1" 503 125 "" "curl/7.47.0"
2017-05-30T14:41:13.379890009Z time="2017-05-30T14:41:13.379588757Z" level=debug msg="s3aws.Stat(\"/\")" go.version=go1.7.3 instance.id=42b8d328-f5cb-4bb3-b3f2-7a93d77febe0 service=registry trace.duration=30.570087ms trace.file="/go/src/github.com/docker/distribution/registry/storage/driver/base/base.go" trace.func="github.com/docker/distribution/registry/storage/driver/base.(*Base).Stat" trace.id=1d3db1d6-7e5b-4fa4-aee5-3d743a94e2e0 trace.line=137 version=v2.6.1
2017-05-30T14:41:23.377456357Z time="2017-05-30T14:41:23.377151322Z" level=debug msg="s3aws.Stat(\"/\")" go.version=go1.7.3 instance.id=42b8d328-f5cb-4bb3-b3f2-7a93d77febe0 service=registry trace.duration=28.111547ms trace.file="/go/src/github.com/docker/distribution/registry/storage/driver/base/base.go" trace.func="github.com/docker/distribution/registry/storage/driver/base.(*Base).Stat" trace.id=8d5fd464-e31f-495e-9dc6-fb3710f823a5 trace.line=137 version=v2.6.1

Registry logs after restart of container, it works in the beginning, and then eventually 503:

2017-05-30T15:20:18.474925398Z time="2017-05-30T15:20:18.474654839Z" level=info msg="redis not configured" go.version=go1.7.3 instance.id=84453db6-8e3d-429c-a98b-53cb599946ce service=registry version=v2.6.1
2017-05-30T15:20:18.484491091Z time="2017-05-30T15:20:18.484105064Z" level=info msg="debug server listening :5001"
2017-05-30T15:20:18.484502292Z time="2017-05-30T15:20:18.484249629Z" level=info msg="Starting upload purge in 8m0s" go.version=go1.7.3 instance.id=84453db6-8e3d-429c-a98b-53cb599946ce service=registry version=v2.6.1
2017-05-30T15:20:18.496314674Z time="2017-05-30T15:20:18.496090308Z" level=info msg="listening on [::]:5000" go.version=go1.7.3 instance.id=84453db6-8e3d-429c-a98b-53cb599946ce service=registry version=v2.6.1
2017-05-30T15:20:21.930669941Z 172.17.0.1 - - [30/May/2017:15:20:21 +0000] "GET / HTTP/1.1" 200 0 "" "curl/7.47.0"
2017-05-30T15:20:28.539811199Z time="2017-05-30T15:20:28.539504682Z" level=debug msg="s3aws.Stat(\"/\")" go.version=go1.7.3 instance.id=84453db6-8e3d-429c-a98b-53cb599946ce service=registry trace.duration=43.133214ms trace.file="/go/src/github.com/docker/distribution/registry/storage/driver/base/base.go" trace.func="github.com/docker/distribution/registry/storage/driver/base.(*Base).Stat" trace.id=c4dbf494-f9fa-4306-9847-cc50909347f5 trace.line=137 version=v2.6.1
2017-05-30T15:20:38.527759086Z time="2017-05-30T15:20:38.527461076Z" level=debug msg="s3aws.Stat(\"/\")" go.version=go1.7.3 instance.id=84453db6-8e3d-429c-a98b-53cb599946ce service=registry trace.duration=31.106894ms trace.file="/go/src/github.com/docker/distribution/registry/storage/driver/base/base.go" trace.func="github.com/docker/distribution/registry/storage/driver/base.(*Base).Stat" trace.id=be574080-bbe8-49f0-bc79-795ae328db68 trace.line=137 version=v2.6.1
2017-05-30T15:20:48.534808753Z time="2017-05-30T15:20:48.534495526Z" level=debug msg="s3aws.Stat(\"/\")" go.version=go1.7.3 instance.id=84453db6-8e3d-429c-a98b-53cb599946ce service=registry trace.duration=38.070819ms trace.file="/go/src/github.com/docker/distribution/registry/storage/driver/base/base.go" trace.func="github.com/docker/distribution/registry/storage/driver/base.(*Base).Stat" trace.id=4088eccc-a3fa-47ef-b56d-e56a156005b8 trace.line=137 version=v2.6.1
2017-05-30T15:20:58.528801376Z time="2017-05-30T15:20:58.528503898Z" level=debug msg="s3aws.Stat(\"/\")" go.version=go1.7.3 instance.id=84453db6-8e3d-429c-a98b-53cb599946ce service=registry trace.duration=32.12672ms trace.file="/go/src/github.com/docker/distribution/registry/storage/driver/base/base.go" trace.func="github.com/docker/distribution/registry/storage/driver/base.(*Base).Stat" trace.id=988f1141-e51a-4afa-a333-d21ad135ca1c trace.line=137 version=v2.6.1
2017-05-30T15:21:05.321744770Z 172.17.0.1 - - [30/May/2017:15:21:05 +0000] "GET / HTTP/1.1" 503 125 "" "curl/7.47.0"
2017-05-30T15:21:08.527621977Z time="2017-05-30T15:21:08.527319714Z" level=debug msg="s3aws.Stat(\"/\")" go.version=go1.7.3 instance.id=84453db6-8e3d-429c-a98b-53cb599946ce service=registry trace.duration=30.94061ms trace.file="/go/src/github.com/docker/distribution/registry/storage/driver/base/base.go" trace.func="github.com/docker/distribution/registry/storage/driver/base.(*Base).Stat" trace.id=773c73dc-b972-4bcd-9a39-8c688d768b3f trace.line=137 version=v2.6.1
2017-05-30T15:21:18.535229553Z time="2017-05-30T15:21:18.5349485Z" level=debug msg="s3aws.Stat(\"/\")" go.version=go1.7.3 instance.id=84453db6-8e3d-429c-a98b-53cb599946ce service=registry trace.duration=38.546592ms trace.file="/go/src/github.com/docker/distribution/registry/storage/driver/base/base.go" trace.func="github.com/docker/distribution/registry/storage/driver/base.(*Base).Stat" trace.id=83dfce3b-1eeb-4e38-ba86-1e7e2c04fc7b trace.line=137 version=v2.6.1

Some details about the docker install:

$ docker version
Client:
 Version:      17.03.1-ce
 API version:  1.27
 Go version:   go1.7.5
 Git commit:   c6d412e
 Built:        Mon Mar 27 17:14:09 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.03.1-ce
 API version:  1.27 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   c6d412e
 Built:        Mon Mar 27 17:14:09 2017
 OS/Arch:      linux/amd64
 Experimental: false


$ docker exec registry registry -v
registry github.com/docker/distribution v2.6.1



$ docker info
Containers: 4
 Running: 1
 Paused: 0
 Stopped: 3
Images: 4
Server Version: 17.03.1-ce
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 20
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 4ab9917febca54791c5f071a9d1f404867857fcc
runc version: 54296cf40ad8143b62dbcaa1d90e520a2136ddfe
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.4.0-1017-aws
Operating System: Ubuntu 16.04.2 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 990.7 MiB
Name: REDACTED
ID: REDACTED
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 32 (1 by maintainers)

Most upvoted comments

@ulm0, The actual problem is with the docker distribution itself. In case if you are using newly created bucket you will get 500 error. Unfortunately, docker distribution hasn’t fixed it, but there is a workaround. Just upload an empty file into a root of your bucket. This will fix a problem with 500 error.

Same issue here with switf.

Btw for the new guys here (like me), to check if your problem is related : run your container with -p 5001:5001 -e REGISTRY_HTTP_DEBUG_ADDR=0.0.0.0:5001 and then curl localhost:5001/debug/health.

Disabling the health check (with -e REGISTRY_HEALTH_STORAGEDRIVER_ENABLED=false) fix the issue (unless you use Let’sEncrypt and they issued too many certs for you while you were debugging… but that’s another story - any idea on how to avoid that?)

Additionally, if you’re looking for an interim workaround, pushing an empty .keep file into the bucket resolved the issue for us.

Looks like the health check against the backend is failing. You can disable the health check and this should go away.

The problem seems to be related to a regression from 8e915d69f and https://github.com/docker/distribution/pull/2115. List was changed to a Stat to reduce load. We should probably consider a not found error as healthy, as that represents a response from the backend.

Confirming this still exists on K8s in combination with MinIO and registry:2.8.1

2.8 is in maintenance mode at this point i.e. not getting any updates other than high severity security patches. I’d encourage you to try distribution/distribution:edge

For anyone that might stumble upon this error message and is using AWS IAM to restrict access, please ensure that your policy is not only including BUCKET/* but also BUCKET. This did the trick for me.

@farshidtz I believe the 2.6.2 was released a bit before this fix was merged, so a new release will probably be necessary.

@stevvooe Testing with with a custom registry build, including the #2377 changes I can no longer reproduce the issue, so it seems fixed. 👍 😄