moby: Service update may fail with "update out of sequence" error

Description

Repeatedly calling docker service update may trigger an update out of sequence error. This seems to happen because the api call to service inspect (GET /services/{id}) returns an old Version.Index even after the successful return of previous a call to service update (POST /services/{id}/update).

Steps to reproduce the issue:

docker swarm init
docker service create --name test busybox tail -f /dev/null
while docker service update test --constraint-add "node.labels.a != b"; do true; done

Describe the results you received:

After some time repeatedly updating the service (~30s on my machine) the last command will fail with the error Error response from daemon: rpc error: code = 2 desc = update out of sequence.

Describe the results you expected:

I expected that successful calls to POST /service/{id}/update would guarantee that subsequent calls to GET /service/{id} returned an updated Version.Index. This seems not to be the case. I’m not sure if this behavior is intended or not, if this is working as expected I think a clarification in the API documentation would be nice.

Output of docker version:

$ docker version
Client:
 Version:      1.13.1-rc1
 API version:  1.25
 Go version:   go1.7.4
 Git commit:   2527cfc
 Built:        Fri Jan 27 21:54:54 2017
 OS/Arch:      linux/amd64

Server:
 Version:      1.13.1-rc1
 API version:  1.25 (minimum version 1.12)
 Go version:   go1.7.4
 Git commit:   2527cfc
 Built:        Fri Jan 27 21:54:54 2017
 OS/Arch:      linux/amd64
 Experimental: false

The problem also happens with 1.13.0 and 1.12.6

Output of docker info:

$ docker info
Containers: 26
 Running: 4
 Paused: 0
 Stopped: 22
Images: 264
Server Version: 1.13.1-rc1
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 397
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
Swarm: active
 NodeID: n2osx48m66gag48ggoyg1w1pc
 Is Manager: true
 ClusterID: 66490aoqoo9cdx00t5n95hw34
 Managers: 1
 Nodes: 1
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
 Node Address: 192.168.50.4
 Manager Addresses:
  192.168.50.4:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 03e5862ec0d8d3b3f750e19fca3ee367e13c090e
runc version: 2f7393a47307a16f8cee44a37b262e8b81021e3e
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.4.0-31-generic
Operating System: Ubuntu 16.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 2.915 GiB
Name: vagrant
ID: XXOL:4PPB:VZV3:W7ZD:QRDT:FY6D:L2WN:OI5T:3Z3I:HZS5:TI6Z:BXJN
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 58
 Goroutines: 153
 System Time: 2017-02-07T17:31:31.46294919Z
 EventsListeners: 1
Username: cezarsa
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Experimental: false
Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.):

Tested on Vagrant + Virtualbox and also on Ubuntu 14.04 on private Cloudstack.

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 19
  • Comments: 21 (9 by maintainers)

Most upvoted comments

This is a real problem for automation. Without getting into the details of the optimistic locking in the API, would it make sense for the CLI to (optionally) block after a successful update until the version it created becomes visible?

Still an issue for us. Makes CI review deployments next to impossible without some nasty hacks.

This is discussed in detail here: https://github.com/docker/swarmkit/issues/1379

I’ve tried very hard to push for fixing this by versioning the service spec separately from the service object (https://github.com/docker/swarmkit/pull/1392), but we were never able to reach a consensus on the fine details, so this hasn’t moved forward.

@manziman It’s a sequence number. When you do an update you need to specify the current version of the service. If the version of the service in swarm in swarm is different than the version of the service you passed in it causes an out of sequence error.

The reason for this is to ensure that two updates don’t conflict with each other. Once the update is completed the service gets a new version. This should only ever be generated by swarm.

Same issue happens, sometimes, when executing the following command while the stack and its services are active (i.e updating them because a new image has been pulled):

#!/usr/bin/env bash
docker stack deploy ourapp --with-registry-auth --compose-file=docker-compose.yml

Two global services, five replicated services.

No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 17.10
Release:	17.10
Codename:	artful

Output of docker version

Client:
 Version:      18.03.1-ce
 API version:  1.37
 Go version:   go1.9.5
 Git commit:   9ee9f40
 Built:        Thu Apr 26 07:17:38 2018
 OS/Arch:      linux/amd64
 Experimental: false
 Orchestrator: swarm

Server:
 Engine:
  Version:      18.03.1-ce
  API version:  1.37 (minimum version 1.12)
  Go version:   go1.9.5
  Git commit:   9ee9f40
  Built:        Thu Apr 26 07:15:45 2018
  OS/Arch:      linux/amd64
  Experimental: false

Output of docker info

Containers: 27
 Running: 24
 Paused: 0
 Stopped: 3
Images: 11
Server Version: 18.03.1-ce
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 160
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: jejb6my7n50ulnilktgd2fxof
 Is Manager: true
 ClusterID: 0yus20uq607uzugzqbv9a1vzr
 Managers: 1
 Nodes: 1
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 10
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 0
 Autolock Managers: false
 Root Rotation In Progress: false
 Node Address: 192.168.2.102
 Manager Addresses:
  192.168.2.102:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 773c489c9c1b21a6d78b5c538cd395416ec50f88
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.13.0-41-generic
Operating System: Ubuntu 17.10
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 23.54GiB
Name: linuxcompany
ID: FLM4:OCTS:BWQR:ZLRQ:HVGG:VBSW:O5NE:IA2W:4Z6T:SS47:4BSE:A6ZT
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Username: ourcompany
Registry: https://index.docker.io/v1/
Labels:
 provider=generic
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false