moby: Usage of max-replicas-per-node not compatible with start-first update_config
Description
Usage of max-replicas-per-node
is not compatible when using order: start-first
in update_config
. The maximum number of replicas prevents the start of the new replacement containers.
Steps to reproduce the issue:
- Define a
docker-compose.yml
withmax-replicas-per-node
constraint:
version: "3.8"
services:
web:
image: nginx:1.16
deploy:
mode: replicated
replicas: 1
placement:
constraints: [node.labels.role == web]
max_replicas_per_node: 1
update_config:
parallelism: 1
order: start-first
failure_action: rollback
delay: 10s
- Deploy stack to swarm
$ docker stack deploy -c docker-compose.yml repro_max_replicas_bug
Creating network repro_max_replicas_bug_default
Creating service repro_max_replicas_bug_web
$ docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
v2yd365lmwvb repro_max_replicas_bug_web replicated 1/1 (max 1 per node) nginx:1.16
- Update the image version in the
docker-compose.yml
and deploy the stack again:
$ sed -i 's/nginx:1.16/nginx:1.17/' docker-compose.yml
$ docker stack deploy -c docker-compose.yml repro_max_replicas_bug
Updating service repro_max_replicas_bug_web (id: v2yd365lmwvbbu0i4r0g6f026)
- Verify deployment status with
docker service ps
$ docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
v2yd365lmwvb repro_max_replicas_bug_web replicated 1/1 (max 1 per node) nginx:1.17
$ docker service ps v2yd365lmwvb --no-trunc
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
qaheo9ufk24tged4biq9hlvfo repro_max_replicas_bug_web.1 nginx:1.17@sha256:282530fcb7cd19f3848c7b611043f82ae4be3781cb00105a1d593d7e6286b596 Running Pending 32 seconds ago "no suitable node (max replicas per node limit exceed)"
n0z0c2dpt4kih8x91h8vm2e35 \_ repro_max_replicas_bug_web.1 nginx:1.16@sha256:8723f69d18865756716b1b6a7cebae0107c39c7ad9b9b310875a3a0a5be235a1 aldebaran Running Running about a minute ago
Describe the results you received:
The new container that runs with a newer image is not started and doesn’t replace the old container.
Describe the results you expected:
The new container is started and replaces the old one.
Additional information you deem important (e.g. issue happens only occasionally):
This issue happens every time on different servers.
Output of docker version
:
Client:
Version: 19.03.6
API version: 1.40
Go version: go1.13.8
Git commit: 369ce74
Built: Wed, 26 Feb 2020 11:20:11 +1100
OS/Arch: linux/amd64
Experimental: false
Server:
Engine:
Version: 19.03.6
API version: 1.40 (minimum version 1.12)
Go version: go1.13.8
Git commit: 369ce74
Built: Wed Feb 26 00:20:11 2020
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 19.03.6
GitCommit: 7c1e88399ec0b0b077121d9d5ad97e647b11c870
runc:
Version: 1.0.0~rc10+dfsg1
GitCommit: 1.0.0~rc10+dfsg1-1
docker-init:
Version: 0.18.0
GitCommit:
Output of docker info
:
Client:
Debug Mode: false
Server:
Containers: 10
Running: 9
Paused: 0
Stopped: 1
Images: 52
Server Version: 19.03.6
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: active
NodeID: mwh8whayw7caqnr543ypvgmgt
Is Manager: true
ClusterID: bx8s8ubbmvevmcpsc3fnsamdo
Managers: 1
Nodes: 1
Default Address Pool: 10.0.0.0/8
SubnetSize: 24
Data Path Port: 4789
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 10
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Force Rotate: 0
Autolock Managers: false
Root Rotation In Progress: false
Node Address: 192.168.1.160
Manager Addresses:
192.168.1.160:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 7c1e88399ec0b0b077121d9d5ad97e647b11c870
runc version: 1.0.0~rc10+dfsg1-1
init version:
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 5.4.0-4-amd64
Operating System: Debian GNU/Linux bullseye/sid
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 15.58GiB
Name: aldebaran
ID: DB6L:27Z6:HDCS:XJND:WUFH:UZ5R:53TZ:COAN:PMMP:X75E:DFBE:3IOI
Docker Root Dir: /home/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: No swap limit support
Additional environment details (AWS, VirtualBox, physical, etc.):
Verified on:
- Physical machine:
$ lsb_release -a
Distributor ID: Debian
Description: Debian GNU/Linux bullseye/sid
Release: unstable
Codename: sid
- Google Cloud VM:
$ lsb_release -a
Distributor ID: Ubuntu
Description: Ubuntu 18.04.3 LTS
Release: 18.04
Codename: bionic
About this issue
- Original URL
- State: open
- Created 4 years ago
- Reactions: 1
- Comments: 16 (5 by maintainers)
In my opinion, the max-replicas-per-node option should not prevent services from updating. We should distinguish between intented behavior and what’s needed to get the intented status running. I think, everyone would bindly accept if it just worked that way and that it’s absolutely logical that it would result in temporarily more services per host when using
start_first
Why should I spare system power just to make the update work? System power costs money.
At least, consider something like a surge flag for that.
With very quick look, who ever will be implementing this need:
Original PR can be found from: https://github.com/moby/swarmkit/pull/2758
If you need more tips/help with that work you can find me from Docker community Slack.
I like that
surge
idea. For the sake of possibly getting that feature in this century, I’d suggest this as a flag for the time being. Hence I updated my original answer to use this terminology (surge)https://github.com/moby/moby/issues/40797#issuecomment-1289995338
I still think it’s not much more than an ignore/skip check at the right place in the code
We too stumbled across this issue with
max-replicas-per-node
.Perhaps a sensible way to go about this would be a mechanism similar to k8s’ (sorry) maxSurge functionality. It essentially allows replicas to surge above the defined replica number during deployment rollouts, thus allows a solution without causing issues with
start-first
.@sgohl @GCSBOSS to be honest I don’t understand what is problem with default
stop-first
option? In fault tolerant system you need to anyway have at least 2 replicas of each application which are running on two different nodes which why you also can update those one by one and application is up all the time.Just make sure that you have included:
to your config.
If your applicantion is slow to start then also make sure that you use reasonable delay between updates. Also longer healthcheck start time can help.
When it comes to @sgohl proposal. It is reasonable but someone need take action and implement it if you want to see it or you need to have support contract with Mirantis and ask them to do so. That is because there is no active feature development happening in swarmkit anymore by Docker Inc.
Perhaps someone is interested in contributing to the documentation to describe this scenario (i.e., with
start-first
, the swarm cluster must have a node that has< max-replicas-per-node
for the service in use, to start the new instance before stopping the old one).