moby: Swarm does not consider health checks while updating
Hi,
Im playing around with docker 1.12 and the HEALTHCHECK
directive in dockerfiles.
What I do is:
docker swarm init
Dockerfile
FROM nginx:latest
RUN touch /marker
ADD ./check_running.sh /check_running.sh
RUN chmod +x /check_running.sh
HEALTHCHECK --interval=5s --timeout=3s CMD ./check_running.sh
check_running.sh
#!/usr/bin/env bash
if [[ -e /marker ]]; then
rm /marker
exit 2
else
exit 0
fi
Build some images from this Dockerfile:
$ docker build -t mynginx:1 .
$ docker build -t mynginx:2 .
Start a service with 5 replicas
$ docker service create --name web --replicas=5 -p 8080:80 mynginx:1
Wait some time (5s) until all replicas are healthy. Now I want to update the image to v2. The update procedure should be rolling with a parallelism of 1. So I run:
$ docker service update --image mynginx:2 --update-parallelism 1 web
From my understanding swarm should execute the update based on this algorithm:
- swarm stops one container
- swarm starts a container with mynginx:2 and waits as long as it becomes healthy again (not starting)
- go to 1)
But what I get is:
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
eee2b1b184ac mynginx:2 "nginx -g 'daemon off" Less than a second ago Up Less than a second (health: starting) 80/tcp, 443/tcp web.3.6kn82dul8gnyvzgz6cje2byfj
b3797cdb5e05 mynginx:2 "nginx -g 'daemon off" 3 seconds ago Up 2 seconds (health: starting) 80/tcp, 443/tcp web.5.46ll6b2lhrhr98utowuifccei
0c85128b031e mynginx:2 "nginx -g 'daemon off" 6 seconds ago Up 5 seconds (health: starting) 80/tcp, 443/tcp web.1.apenirwkjp96xiyhhspx134wx
9dcec24ae32a mynginx:2 "nginx -g 'daemon off" 9 seconds ago Up 8 seconds (health: starting) 80/tcp, 443/tcp web.2.5oi6fk1c6u25s3rpqbal39i3b
f37332162d40 mynginx:2 "nginx -g 'daemon off" 12 seconds ago Up 11 seconds (healthy) 80/tcp, 443/tcp web.4.7mgs62e6s78iijcqwv4cose11
Sometimes all replicas are in starting state which results in a downtime of this service. My expectation was that I can obtain βnativeβ zero downtime deployment with HEALTHCHECK
and rolling updates on my swarm cluster π
Whats wrong with my attempt?
Thanks!
Update: Sorry for closing/opening this issue. My mobile is somehow broken today π
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 18 (11 by maintainers)
@runshenzhu Yeah this is my non native workaround. But it is very hard to predict bootstrap time in some circumstances, so real checking would be much better.
It is not mentioned explicitly in the docs, but it was my expectation. π
When exiting with 1 it works for me. Swarm will reschedule unhealthy containers.
This is my most wanted feature because it allows native zero downtime deployments for the first time.