portainer: New nodes are unable to pull images from registry with authentication

Bug description

New nodes are unable to pull images from remote registry that requires authentication.

I have private registry in AWS ECR, nodes are running outside of AWS and this requires authentication in ECR then. When new node joins, Portainer will attempt to start containers on it including those defined as global. The automatic pull fails every time, right until I manually pull image via Portainer dashboard specifying that registry. Looking at docker logs when this issue happens I see it fails to pull image from that registry, which makes me think that the auth credentials are not used in that case (even though saved), but used (added) to node when manually pull image.

Expected behavior

Image pulled and containers started

Steps to reproduce the issue:

Steps to reproduce the behavior:

  1. Add registry with authentication and images in it in Portainer dashboard
  2. Deploy new stack from dashboard with some images in global mode
  3. Join new node to cluster

Technical details:

  • Portainer version: 1.20.0
  • Docker version: mix of 18.09.1-ce and 18.06.1-ce
  • Platform: Linux
  • Command used to start Portainer: curl -L https://downloads.portainer.io/portainer-agent-stack.yml -o portainer-agent-stack.yml && docker stack deploy --compose-file=portainer-agent-stack.yml portainer
  • Browser: Chrome 71.0.3578.98

Additional context

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 1
  • Comments: 16 (2 by maintainers)

Most upvoted comments

I’m struggling to run properly Docker Swarm on AWS EC2 instances with private ECR repository. Portainer isn’t much help because it adds another layer of “something can go wrong”.

What I’ve found (or done) so far:

  1. I’m using CRON to run aws erc get-login on each node (manager and workers) just for a sake of being sure docker has access to repository.
  2. Fact: never pull :latest image on any worker! Portainer/Swarm will always use it as fallback, even if newer version is available in the repository. You should allow Swarm to always fetch latest version of images based on tag, but referenced by sha (and this is correct!).
  3. Portainer isn’t aware of changing registry credentials. In fact it isn’t also if you’re changing password straight in Portainer console “Registries”.
  4. Clicking on “Update the service” seems not to add --with-registry-auth parameter which ends up with No such image: ... and sha being removed from service’s image!
  5. Manually calling docker service update --force --with-registry-auth service-name fixes the problem until next ECR credentials rotation.
  6. After manual service update --with-registry-auth you can succesfuly “Update the service” from Portainer until next ECR credentials rotation.

I have no idea if I’m doing something wrong, but Portainer seems to have problems with private registries. One solution (while using manual docker login for ECR) could be to add --with-registry-auth for each “Update the service” while another than DockerHub repository is selected.

I had the same issue when deploying using docker stack command the solution was to add --with-registry-auth arg, maybe portainer engine will have to do the same

docker stack deploy --with-registry-auth --prune -c docker-compose-prod.yml my-stack

Worked for me. Thanks. I have a cluster swarn with three nodes. One registry deployed in each one. Share volume and traefik a loadbalancer. Registry with http authent. The problem was to pull the image on other node except the node 1. deploy the stack with --with-registry-auth worked for me.

While the solution is correct and it works The issue still remains for new nodes added to the swarm, they fail to pull images, even if you do a aws ecr3 get-login on that node it does help at all the only way I found was to re deploy same stack with the --with-registry-auth as stated before, when then seems to propagate the auth token to the recently added node and start working as espected.

In fact, it doesn’t look like a portainer related issue, which I am not currently using, it does seems a docker swarm issue related. Test conducted were issuing plain docker commands from the terminal.

Hope this helps

Experiencing similar behaviour. We have a swarm cluster on AWS. Each node has access to pull images from ECR and use ecr-credential-helper.

When deploying an image from ECR onto our swarm, it fails to schedule it. Once I pull the image manually on a swarm node, the container is successfully scheduled.

I had the same issue when deploying using docker stack command the solution was to add --with-registry-auth arg, maybe portainer engine will have to do the same

docker stack deploy --with-registry-auth --prune -c docker-compose-prod.yml my-stack

Worked for me. Thanks. I have a cluster swarn with three nodes. One registry deployed in each one. Share volume and traefik a loadbalancer. Registry with http authent. The problem was to pull the image on other node except the node 1. deploy the stack with --with-registry-auth worked for me.

I am also having the same issue but I am running my own swarm with a nexus docker repository. Portainer fails to find the image until I run docker pull <repo>/<image>

I’m thinking that this issue is actually related to https://github.com/portainer/portainer/issues/1533

Thanks for the report, we’re aware that Portainer do not really interact well with AWS and we’ll investigate a solution for this.

I had the same issue when deploying using docker stack command the solution was to add --with-registry-auth arg, maybe portainer engine will have to do the same

docker stack deploy --with-registry-auth --prune -c docker-compose-prod.yml my-stack