moby: docker service create doesn't work when network and generic-resource are both attached
Description
Issue
In docker swarm, --generic-resource
does not work when it is used alongside --network
. This is due to an incorrect condition, if genericresource.HasResource(ta, available.Generic)
, in the constraint_enforcer.go
code when the service is brought up.
for _, ta := range t.AssignedGenericResources {
// Type change or no longer available
if genericresource.HasResource(ta, available.Generic) {
removeTasks[t.ID] = t
break loop
}
}
The code should read if !genericresource.HasResource(ta, available.Generic) {
so that the task which has an assigned and available GenericResource is not removed.
This is bug is important as it prevents the usage of generic resources in Docker Swarm; this is particularly relevant for assigning services to nodes based on GPU availability.
The generic-resources feature used to work properly in swarm in version 18.06.1
.
Reproduce
Bug Investigation + Reproduction steps
This functionality was working in version 18.06.1
but not in any version afterwards.
Each release was tested through these steps. An additional condition that is required for the bug to occur is that the service must be being brought up on a non-manager swarm node:
- Initialize the swarm
docker swarm init --advertise-addr <host_ip>
- Create an overlay network.
docker network create -d overlay --scope swarm test-network
- Modify
/etc/docker/daemon.json
on the worker to add an item tonode-generic-resources
. Restart the docker service
{
...
"node-generic-resources": [
"gpu_<type>=GPU-sample-id",
]
}
-
Add a worker node to the swarm.
-
Create a service that with the network attached as well as a generic-resource. This step will fail and the service will never get to the running state.
docker service create --network test-network --generic-resource "gpu_<type>=1" --name test-service quay.io/centos/centos:stream8 bash -c "env && sleep infinity"
- Observe the error by running
docker service ps
. These errors continue in a loop where a new task is created and subsequently rejected. This error does not resolve by itself and the service never reaches theRunning
state.
docker service ps --no-trunc test-service
# Example output
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
4tqh3odt54qhp3mzumwjt3mpj test-service.1 quay.io/centos/centos:stream8@<hash> <worker_node_hostname> Ready Rejected 4 seconds ago "assigned node no longer meets constraints"
j6d0qwxv2kglyzugpgd98qx1f \_ test-service.1 quay.io/centos/centos:stream8@<hash> <worker_node_hostname> Shutdown Rejected 9 seconds ago "assigned node no longer meets constraints"
etrnb3per1lgej9wgp9az2u4d \_ test-service.1 quay.io/centos/centos:stream8@<hash> <worker_node_hostname> Shutdown Rejected 9 seconds ago "assigned node no longer meets constraints"
inyz4ez2i95rd5vyxggo0mgbk \_ test-service.1 quay.io/centos/centos:stream8@<hash> <worker_node_hostname> Shutdown Rejected 14 seconds ago "assigned node no longer meets constraints"
ymy14a55fbv6cgdaxl4nzy06h \_ test-service.1 quay.io/centos/centos:stream8@<hash> <worker_node_hostname> Shutdown Rejected 19 seconds ago "assigned node no longer meets constraints"
- Remove the worker node from the swarm and repeat steps 4-6 for different versions of docker-ce
Expected behavior
docker service create
should create a service with a network
and generic-resource
attached.
docker version
Client: Docker Engine - Community
Version: 20.10.17
API version: 1.41
Go version: go1.17.11
Git commit: 100c701
Built: Mon Jun 6 23:03:11 2022
OS/Arch: linux/amd64
Context: default
Experimental: true
Server: Docker Engine - Community
Engine:
Version: 20.10.17
API version: 1.41 (minimum version 1.12)
Go version: go1.17.11
Git commit: a89b842
Built: Mon Jun 6 23:01:29 2022
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.6.8
GitCommit: 9cd3357b7fd7218e4aec3eae239db1f68a5a6ec6
runc:
Version: 1.1.4
GitCommit: v1.1.4-0-g5fd4c4d
docker-init:
Version: 0.19.0
GitCommit: de40ad0
docker info
Client:
Context: default
Debug Mode: false
Plugins:
app: Docker App (Docker Inc., v0.9.1-beta3)
buildx: Docker Buildx (Docker Inc., v0.9.1-docker)
scan: Docker Scan (Docker Inc., v0.17.0)
Server:
Containers: 28
Running: 6
Paused: 0
Stopped: 22
Images: 91
Server Version: 20.10.18
Storage Driver: overlay2
Backing Filesystem: xfs
Supports d_type: true
Native Overlay Diff: true
userxattr: false
Logging Driver: local
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: active
NodeID: <node_id>
Is Manager: false
Node Address: <node_address>
Manager Addresses:
<manager_1_address>
<manager_2_address>
<manager_3_address>
Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux nvidia runc
Default Runtime: nvidia
Init Binary: docker-init
containerd version: 0197261a30bf81f1ee8e6a4dd2dea0ef95d67ccb
runc version: v1.1.3-0-g6724737
init version: de40ad0
Kernel Version: <redacted>
Operating System: <redacted>
OSType: linux
Architecture: x86_64
CPUs: <redacted>
Total Memory: <redacted>
Name: <node_name>
ID: <redacted>
Debug Mode: false
Experimental: false
Live Restore Enabled: false
Additional Info
No response
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 23 (14 by maintainers)
Commits related to this issue
- Fixes https://github.com/moby/moby/issues/44378 AssignedGenericResources in constraint_enforcer.go were falsely checked inside a case that enforced Reservations to be set Furthermore, the if statemen... — committed to s4ke/swarmkit by s4ke 2 years ago
- Fixes https://github.com/moby/moby/issues/44378 AssignedGenericResources in constraint_enforcer.go were falsely checked inside a case that enforced Reservations to be set Furthermore, the if statemen... — committed to s4ke/swarmkit by s4ke 2 years ago
- Fixes https://github.com/moby/moby/issues/44378 AssignedGenericResources in constraint_enforcer.go were falsely checked inside a case that enforced Reservations to be set Furthermore, the if statemen... — committed to s4ke/swarmkit by s4ke 2 years ago
- Merge pull request #3082 from s4ke/44378-moby-fix-constraint-enforcer-generic-resources Fixes https://github.com/moby/moby/issues/44378 — committed to moby/swarmkit by dperny a year ago
The fix for this should be included in 23.0.2.