moby: Docker Engine's Swarm failing to use credential helper when scaling.

Description

When using docker service scale or docker service update --replicas X, Docker Engine’s Swarm will not fetch new authentication token via credential helper if the one already defined on the service definition has expired. AWS’ auth tokens expire every 12 hours, as an example. This causes all replicas to spawn only on nodes that already have the image downloaded.

This probably is also an issue when deploying a new version of an image, but I have not tested that yet.

I did post a work-around at the end of this, in regards to using --with-registry-auth along with service update replicas X, but it has the side effect of restarting all running containers. Disruptive.

Steps to reproduce the issue:

Setup

Create a repo in ECR called ‘redis’. I’m using the new Ohio region (us-east-2).
Ensure your AWS user has write/pull access to this repo. Easiest to just setup a managed policy.
Tag new latest redis based on official redis. docker tag redis:latest REDACTED.dkr.ecr.us-east-2.amazonaws.com/redis:latest
Generate local ECR token and push it real good.

$ eval $(aws ecr get-login --region us-east-2)
$ docker push REDACTED.dkr.ecr.us-east-2.amazonaws.com/redis:latest
The push refers to a repository [REDACTED.dkr.ecr.us-east-2.amazonaws.com/redis]
a58d4434732b: Pushed
741b78d804b7: Pushed
78731fd42c78: Pushed
c235d5b4caa3: Pushed
307248831aca: Pushed
387483b2c715: Pushed
a2ae92ffcd29: Pushed
latest: digest: sha256:40f100b5d60bffceddd1a5635ce52fe0aa39c229feed8c2c6b641d85bc6baa14 size: 1783

Create Service

Setup new Swarm with 1 Manager and 1 Worker, using Ubuntu 16.04 and Docker 1.13.1 (official steps to install). If you are creating these nodes in EC2, ensure they have an IAM Role you can use for testing.
Your IAM Role or User should have read access to the ECR repo. I used the Managed Policy AmazonEC2ContainerRegistryReadOnly.
Install aws-cli ONLY on the manager (needed for credential helper). Run as ROOT.

sudo su
apt-get install -y python-pip && pip install awscli
mkdir -p /home/ubuntu/.aws && \
  printf "[default]\noutput = json\nregion = us-east-2" > /home/ubuntu/.aws/config

If your Manager node is NOT in AWS, ensure you have your read-only IAM User setup with aws configure.
Install Amazon ECR Credential Helper ONLY on the manager. Run as ROOT.

sudo su
apt-get install -y make
cd ~ && \
  git clone https://github.com/awslabs/amazon-ecr-credential-helper.git && \
  cd amazon-ecr-credential-helper && \
  make docker && \
  mv ./bin/local/docker-credential-ecr-login /usr/local/bin/
mkdir -p /home/ubuntu/.docker && printf '{\n  "credsStore": "ecr-login"\n}' > /home/ubuntu/.docker/config.json

Create visualizer service, this will ensure that the Manager already has a container running, hopefully pushing our Redis service to spawn on the Worker instead of the Manager later.

docker service create \
  --name=viz \
  --publish=8080:8080/tcp \
  --constraint=node.role==manager \
  --mount=type=bind,src=/var/run/docker.sock,dst=/var/run/docker.sock \
  manomarks/visualizer

Create redis service. You can see it spawn on the Worker (The ip- hostname is not the same as the Manager we’re on). You can also see that ECR credential helper was used via its log files.

ubuntu@ip-10-2-0-38:~/.ecr/log$ docker service create --with-registry-auth --name redis --replicas 1 REDACTED.dkr.ecr.us-east-2.amazonaws.com/redis:latest
v6oozr2ki7tirslfgvqxzhyve
ubuntu@ip-10-2-0-38:~/.ecr/log$ docker service ps redis
ID            NAME     IMAGE                                                      NODE          DESIRED STATE  CURRENT STATE            ERROR  PORTS
v9fy6b9t02c6  redis.1  REDACTED.dkr.ecr.us-east-2.amazonaws.com/redis:latest  ip-10-2-0-98  Running        Preparing 5 seconds ago
ubuntu@ip-10-2-0-38:~/.ecr/log$ ll
total 12
drwxrw-r-x 2 ubuntu ubuntu 4096 Feb 15 20:05 ./
drwxrw-r-x 3 ubuntu ubuntu 4096 Feb 15 20:05 ../
-rw-rw-r-- 1 ubuntu ubuntu  736 Feb 15 20:08 ecr-login.log.2017-02-15-20
ubuntu@ip-10-2-0-38:~/.ecr/log$

# Further verification
ubuntu@ip-10-2-0-38:~/.ecr/log$ docker node ls
ID                           HOSTNAME      STATUS  AVAILABILITY  MANAGER STATUS
wpipkd3orljqyyrftntzy7rlg *  ip-10-2-0-38  Ready   Active        Leader
zh7hmacxjor2uvfqvq0p0bdg3    ip-10-2-0-98  Ready   Active
ubuntu@ip-10-2-0-38:~/.ecr/log$ docker node ps wpipkd3orljqyyrftntzy7rlg zh7hmacxjor2uvfqvq0p0bdg3
ID            NAME       IMAGE                                                      NODE          DESIRED STATE  CURRENT STATE          ERROR  PORTS
v9fy6b9t02c6  redis.1    REDACTED.dkr.ecr.us-east-2.amazonaws.com/redis:latest  ip-10-2-0-98  Running        Running 2 minutes ago
rphsczqpqqqm  viz.1      manomarks/visualizer:latest                                ip-10-2-0-38  Running        Running 4 minutes ago
rphsczqpqqqm   \_ viz.1  manomarks/visualizer:latest                                ip-10-2-0-38  Running        Running 4 minutes ago

ubuntu@ip-10-2-0-38:~/.ecr/log$ cat ecr-login.log.2017-02-15-20
2017-02-15T20:05:19Z [DEBUG] Retrieving credentials for REDACTED in us-east-2 (REDACTED.dkr.ecr.us-east-2.amazonaws.com)
2017-02-15T20:05:19Z [DEBUG] GetCredentials for REDACTED
2017-02-15T20:05:19Z [DEBUG] Checking file cache for REDACTED
2017-02-15T20:05:19Z [DEBUG] Calling ECR.GetAuthorizationToken for REDACTED
2017-02-15T20:05:19Z [DEBUG] Saving credentials to file cache for REDACTED
2017-02-15T20:08:49Z [DEBUG] Retrieving credentials for REDACTED in us-east-2 (REDACTED.dkr.ecr.us-east-2.amazonaws.com)
2017-02-15T20:08:49Z [DEBUG] GetCredentials for REDACTED
2017-02-15T20:08:49Z [DEBUG] Checking file cache for REDACTED
2017-02-15T20:08:49Z [DEBUG] Using cached token for REDACTED

At this point, the Manager has an ECR token in-hand that won’t expire for 12 hours. You can wait 12 hours to proceed to the next step, but I found another way to repro this issue. Detach the AmazonEC2ContainerRegistryReadOnly policy from your Role or User (alternatively, you can use “Revoke Sessions” in IAM to temporarily disable the user/role). I’ve seen the same behavior whether I waited 12 hours, or removed the policy.
For good measure, backup or remove the credential helper’s own cache. mv ~/.ecr/cache.json ~/.ecr/cache.json.bak

# After removing read access to ECR, verify
ubuntu@ip-10-2-0-38:~/.ecr/log$ docker pull REDACTED.dkr.ecr.us-east-2.amazonaws.com/redis:latest
Error response from daemon: repository REDACTED.dkr.ecr.us-east-2.amazonaws.com/redis not found: does not exist or no pull access

Ensure the redis image does not exist on the Manager in the event you accidentally downloaded it in verify step above.
If you removed the policy, your Swarm will now not have access to download the image on the Manager. This is the same behavior experienced when your token expires. Try to scale up the redis service to 3 or more, which should make the Swarm try to load a copy on the Manager. It will fail.

ubuntu@ip-10-2-0-38:~/.ecr/log$ docker service scale redis=3
redis scaled to 3
ubuntu@ip-10-2-0-38:~/.ecr/log$ docker service ps redis --no-trunc
ID                         NAME         IMAGE                                                                                                                              NODE          DESIRED STATE  CURRENT STATE                ERROR                                                                                                                                        PORTS
v9fy6b9t02c6530jf5pkrcmp0  redis.1      REDACTED.dkr.ecr.us-east-2.amazonaws.com/redis:latest@sha256:40f100b5d60bffceddd1a5635ce52fe0aa39c229feed8c2c6b641d85bc6baa14  ip-10-2-0-98  Running        Running 2 hours ago
ibib73k99fl32og49hniv7k8m  redis.2      REDACTED.dkr.ecr.us-east-2.amazonaws.com/redis:latest@sha256:40f100b5d60bffceddd1a5635ce52fe0aa39c229feed8c2c6b641d85bc6baa14  ip-10-2-0-98  Running        Running 41 seconds ago
okpnq0vw6801g851o1q2v13sp   \_ redis.2  REDACTED.dkr.ecr.us-east-2.amazonaws.com/redis:latest@sha256:40f100b5d60bffceddd1a5635ce52fe0aa39c229feed8c2c6b641d85bc6baa14  ip-10-2-0-38  Shutdown       Rejected 50 seconds ago      "No such image: REDACTED.dkr.ecr.us-east-2.amazonaws.com/redis@sha256:40f100b5d60bffceddd1a5635ce52fe0aa39c229feed8c2c6b641d85bc6baa14"
kja20wvbs6mcowe61wza56e07   \_ redis.2  REDACTED.dkr.ecr.us-east-2.amazonaws.com/redis:latest@sha256:40f100b5d60bffceddd1a5635ce52fe0aa39c229feed8c2c6b641d85bc6baa14  ip-10-2-0-38  Shutdown       Rejected 55 seconds ago      "No such image: REDACTED.dkr.ecr.us-east-2.amazonaws.com/redis@sha256:40f100b5d60bffceddd1a5635ce52fe0aa39c229feed8c2c6b641d85bc6baa14"
75cbvvf56qnq4hb21d2tf6pcp   \_ redis.2  REDACTED.dkr.ecr.us-east-2.amazonaws.com/redis:latest@sha256:40f100b5d60bffceddd1a5635ce52fe0aa39c229feed8c2c6b641d85bc6baa14  ip-10-2-0-38  Shutdown       Rejected about a minute ago  "No such image: REDACTED.dkr.ecr.us-east-2.amazonaws.com/redis@sha256:40f100b5d60bffceddd1a5635ce52fe0aa39c229feed8c2c6b641d85bc6baa14"
lemeqny01e6ffjdrbid8xx6m9   \_ redis.2  REDACTED.dkr.ecr.us-east-2.amazonaws.com/redis:latest@sha256:40f100b5d60bffceddd1a5635ce52fe0aa39c229feed8c2c6b641d85bc6baa14  ip-10-2-0-38  Shutdown       Rejected about a minute ago  "No such image: REDACTED.dkr.ecr.us-east-2.amazonaws.com/redis@sha256:40f100b5d60bffceddd1a5635ce52fe0aa39c229feed8c2c6b641d85bc6baa14"
kp607hy0hqrupnfx0ggxtbkug  redis.3      REDACTED.dkr.ecr.us-east-2.amazonaws.com/redis:latest@sha256:40f100b5d60bffceddd1a5635ce52fe0aa39c229feed8c2c6b641d85bc6baa14  ip-10-2-0-98  Running        Running about a minute ago

# Verify that the credential helper was not utilized in regenerating authentication.
ubuntu@ip-10-2-0-38:~/.ecr/log$ ll
total 12
drwxrw-r-x 2 ubuntu ubuntu 4096 Feb 15 22:36 ./
drwxrw-r-x 3 ubuntu ubuntu 4096 Feb 15 22:38 ../
-rw-rw-r-- 1 ubuntu ubuntu 1057 Feb 15 20:17 ecr-login.log.2017-02-15-20
ubuntu@ip-10-2-0-38:~/.ecr/log$ date
Wed Feb 15 22:42:46 UTC 2017
ubuntu@ip-10-2-0-38:~/.ecr/log$ cat ecr-login.log.2017-02-15-20
2017-02-15T20:05:19Z [DEBUG] Retrieving credentials for REDACTED in us-east-2 (REDACTED.dkr.ecr.us-east-2.amazonaws.com)
2017-02-15T20:05:19Z [DEBUG] GetCredentials for REDACTED
2017-02-15T20:05:19Z [DEBUG] Checking file cache for REDACTED
2017-02-15T20:05:19Z [DEBUG] Calling ECR.GetAuthorizationToken for REDACTED
2017-02-15T20:05:19Z [DEBUG] Saving credentials to file cache for REDACTED
2017-02-15T20:08:49Z [DEBUG] Retrieving credentials for REDACTED in us-east-2 (REDACTED.dkr.ecr.us-east-2.amazonaws.com)
2017-02-15T20:08:49Z [DEBUG] GetCredentials for REDACTED
2017-02-15T20:08:49Z [DEBUG] Checking file cache for REDACTED
2017-02-15T20:08:49Z [DEBUG] Using cached token for REDACTED
2017-02-15T20:17:30Z [DEBUG] Retrieving credentials for REDACTED in us-east-2 (REDACTED.dkr.ecr.us-east-2.amazonaws.com)
2017-02-15T20:17:30Z [DEBUG] GetCredentials for REDACTED
2017-02-15T20:17:30Z [DEBUG] Checking file cache for REDACTED
2017-02-15T20:17:30Z [DEBUG] Using cached token for REDACTED
ubuntu@ip-10-2-0-38:~/.ecr/log$ ll ~/.ecr
total 20
drwxrw-r-x 3 ubuntu ubuntu 4096 Feb 15 22:38 ./
drwxr-xr-x 7 ubuntu ubuntu 4096 Feb 15 22:39 ../
-rw------- 1 ubuntu ubuntu 4884 Feb 15 22:24 cache.json.bak
drwxrw-r-x 2 ubuntu ubuntu 4096 Feb 15 22:36 log/

As a side note, it’s worth noting that the Swarm does eventually run all replicas on the Worker, after failing to launch them on Manager. This is not what I want but at least it doesn’t give up trying to scale.

ubuntu@ip-10-2-0-38:~/.ecr/log$ docker node ls
ID                           HOSTNAME      STATUS  AVAILABILITY  MANAGER STATUS
wpipkd3orljqyyrftntzy7rlg *  ip-10-2-0-38  Ready   Active        Leader
zh7hmacxjor2uvfqvq0p0bdg3    ip-10-2-0-98  Ready   Active
ubuntu@ip-10-2-0-38:~/.ecr/log$ docker node ps zh7hmacxjor2uvfqvq0p0bdg3
ID            NAME     IMAGE                                                      NODE          DESIRED STATE  CURRENT STATE          ERROR  PORTS
v9fy6b9t02c6  redis.1  REDACTED.dkr.ecr.us-east-2.amazonaws.com/redis:latest  ip-10-2-0-98  Running        Running 2 hours ago
ibib73k99fl3  redis.2  REDACTED.dkr.ecr.us-east-2.amazonaws.com/redis:latest  ip-10-2-0-98  Running        Running 3 minutes ago
kp607hy0hqru  redis.3  REDACTED.dkr.ecr.us-east-2.amazonaws.com/redis:latest  ip-10-2-0-98  Running        Running 3 minutes ago

Describe the results you received:

Docker Engine’s Swarm did not attempt to use the credential helper if the credentials on the service definition were invalid. Instead it output the error “Image does not exist” (the error message could also be improved).

Describe the results you expected:

If Docker Engine Swarm’s authentication token stored on the service definition fails, it should use the installed credential helper again to generate a new authentication token and try again. It is assumed that all Managers will have the credential helper installed. If that new token fails (or no credential helper installed), THEN proceed with error messaging and distribute the replicas to workers who already have the image downloaded.

Additional information you deem important (e.g. issue happens only occasionally):

The same results happen with docker service update as did with docker service scale, which is to be expected, as scale is just an alias.

However, if I do docker service update --with-registry-auth --replicas X along with scaling, it does seem to fetch fresh authentication tokens. Then I can scale and watch it spread across swarm nodes. This would be a valid work-around, but I don’t like that it seems to restart all currently running containers too. This could be disruptive.

ubuntu@ip-10-2-0-38:~/.ecr/log$ ls -l
total 8
-rw-rw-r-- 1 ubuntu ubuntu 1057 Feb 15 20:17 ecr-login.log.2017-02-15-20
-rw-rw-r-- 1 ubuntu ubuntu  415 Feb 16 21:19 ecr-login.log.2017-02-16-21
ubuntu@ip-10-2-0-38:~/.ecr/log$ docker images
REPOSITORY             TAG                 IMAGE ID            CREATED             SIZE
manomarks/visualizer   <none>              137b9c6f7977        2 weeks ago         325 MB
ubuntu@ip-10-2-0-38:~/.ecr/log$ docker service update --with-registry-auth --replicas 5 redis
redis
ubuntu@ip-10-2-0-38:~/.ecr/log$ docker images
REPOSITORY                                           TAG                 IMAGE ID            CREATED             SIZE
REDACTED.dkr.ecr.us-east-2.amazonaws.com/redis   <none>              74d8f543ac97        2 weeks ago         184 MB
manomarks/visualizer                                 <none>              137b9c6f7977        2 weeks ago         325 MB
ubuntu@ip-10-2-0-38:~/.ecr/log$ ll
total 20
drwxrw-r-x 2 ubuntu ubuntu 4096 Feb 17 17:59 ./
drwxrw-r-x 3 ubuntu ubuntu 4096 Feb 17 17:59 ../
-rw-rw-r-- 1 ubuntu ubuntu 1057 Feb 15 20:17 ecr-login.log.2017-02-15-20
-rw-rw-r-- 1 ubuntu ubuntu  415 Feb 16 21:19 ecr-login.log.2017-02-16-21
-rw-rw-r-- 1 ubuntu ubuntu  415 Feb 17 17:59 ecr-login.log.2017-02-17-17
# You can see credential helper was hit above
# And the containers are spread across the nodes below
ubuntu@ip-10-2-0-38:~/.ecr/log$ docker service ps redis
ID            NAME         IMAGE                                                      NODE          DESIRED STATE  CURRENT STATE           ERROR  PORTS
j1uct8p1wwpw  redis.1      REDACTED.dkr.ecr.us-east-2.amazonaws.com/redis:latest  ip-10-2-0-98  Running        Running 2 minutes ago
t4eyy2ydm2xm   \_ redis.1  REDACTED.dkr.ecr.us-east-2.amazonaws.com/redis:latest  ip-10-2-0-98  Shutdown       Shutdown 2 minutes ago
51w4qbbugpmm  redis.2      REDACTED.dkr.ecr.us-east-2.amazonaws.com/redis:latest  ip-10-2-0-98  Running        Running 2 minutes ago
lublj14k8780  redis.3      REDACTED.dkr.ecr.us-east-2.amazonaws.com/redis:latest  ip-10-2-0-38  Running        Running 2 minutes ago
e8ad3wzgbahb  redis.4      REDACTED.dkr.ecr.us-east-2.amazonaws.com/redis:latest  ip-10-2-0-98  Running        Running 2 minutes ago
1ru8wm46qf2r  redis.5      REDACTED.dkr.ecr.us-east-2.amazonaws.com/redis:latest  ip-10-2-0-38  Running        Running 2 minutes ago

# On the worker, you can see it went from 1 container to 3, but restarted the original container which is not desired.
ubuntu@ip-10-2-0-98:~$ docker ps
CONTAINER ID        IMAGE                                                                                                                        COMMAND                  CREATED             STATUS              PORTS               NAMES
9af4f6ed97ad        REDACTED.dkr.ecr.us-east-2.amazonaws.com/redis@sha256:40f100b5d60bffceddd1a5635ce52fe0aa39c229feed8c2c6b641d85bc6baa14   "docker-entrypoint..."   20 hours ago        Up 20 hours         6379/tcp            redis.1.t4eyy2ydm2xmfknh8ftmboh8m
ubuntu@ip-10-2-0-98:~$ docker ps
CONTAINER ID        IMAGE                                                                                                                        COMMAND                  CREATED             STATUS              PORTS               NAMES
9681d803ba89        REDACTED.dkr.ecr.us-east-2.amazonaws.com/redis@sha256:40f100b5d60bffceddd1a5635ce52fe0aa39c229feed8c2c6b641d85bc6baa14   "docker-entrypoint..."   3 seconds ago       Up 2 seconds        6379/tcp            redis.1.j1uct8p1wwpwnsjr462wt3ysw
7ee493bfb6f8        REDACTED.dkr.ecr.us-east-2.amazonaws.com/redis@sha256:40f100b5d60bffceddd1a5635ce52fe0aa39c229feed8c2c6b641d85bc6baa14   "docker-entrypoint..."   3 seconds ago       Up 3 seconds        6379/tcp            redis.2.51w4qbbugpmmgsvgqpce03aqw
605ae7b500fb        REDACTED.dkr.ecr.us-east-2.amazonaws.com/redis@sha256:40f100b5d60bffceddd1a5635ce52fe0aa39c229feed8c2c6b641d85bc6baa14   "docker-entrypoint..."   3 seconds ago       Up 3 seconds        6379/tcp            redis.4.e8ad3wzgbahb4kd0hs2mje3zd

Output of docker version:

ubuntu@ip-10-2-0-38:~$ docker version
Client:
 Version:      1.13.1
 API version:  1.26
 Go version:   go1.7.5
 Git commit:   092cba3
 Built:        Wed Feb  8 06:50:14 2017
 OS/Arch:      linux/amd64

Server:
 Version:      1.13.1
 API version:  1.26 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   092cba3
 Built:        Wed Feb  8 06:50:14 2017
 OS/Arch:      linux/amd64
 Experimental: false

Output of docker info:

ubuntu@ip-10-2-0-38:~$ docker info
Containers: 1
 Running: 1
 Paused: 0
 Stopped: 0
Images: 1
Server Version: 1.13.1
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 15
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
Swarm: active
 NodeID: wpipkd3orljqyyrftntzy7rlg
 Is Manager: true
 ClusterID: vws4u9zsjaug5c1xvfniflkgi
 Managers: 1
 Nodes: 2
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
 Node Address: 10.2.0.38
 Manager Addresses:
  10.2.0.38:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: aa8187dbd3b7ad67d8e5e3a15115d3eef43a7ed1
runc version: 9df8b306d01f59d3a8029be411de015b7304dd8f
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.4.0-57-generic
Operating System: Ubuntu 16.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 990.6 MiB
Name: ip-10-2-0-38
ID: 6P54:KGSB:NGZA:RCFO:BOE7:3TYQ:NFEB:CDON:YMTT:ZECH:IAZW:TTTK
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.):

AWS, as described above, using IAM Role with Policy AmazonEC2ContainerRegistryReadOnly to gain pull access to ECR repo.

About this issue

Original URL
State: open
Created 7 years ago
Reactions: 9
Comments: 33 (9 by maintainers)

Most upvoted comments

I think the problem is bigger than it is described here because the scaling scenario you discuss involve the user. So, the user is there and he can do something. However, there is another tricky use case which cannot be resolved with --with-registry-auth.

Consider a case where you have three nodes in a swarm and you create a service with a scale factor of two. At that point of time, the service is distributed between two out of three nodes and everything goes without problems. A week later the container in one of the nodes starts failing and Docker decides to move it to the third node, which did not have this container before. Now, the old token already expired and there is no a user to provide a new one. Eventually, Docker gets crazy trying to bring the container on different nodes and restore the scale factor.

I don’t really understand how to deal with this ECR concept of expiring tokens. It makes the swarm feature unusable with the ECR.

alexander-frolov on Sep 13, 2017

@hamiltont – I gave up trying to figure this out and went with deploying the aws ecr proxy in the swarm (https://hub.docker.com/r/esailors/aws-ecr-http-proxy/). That allows me to pull from localhost and never have to worry about the creds timing out, etc.

Also note that aws cli v2.0 now forces a new way of logging into ECR:

aws --region us-east-1 ecr get-login-password | docker login --username AWS --password-stdin <me>.dkr.ecr.us-east-1.amazonaws.com

gudlyf on Mar 2, 2020