moby: Upgrading Swarm Manager to 17.12.0 from 17.09.01 breaks ingress network
Description
Steps to reproduce the issue:
- Create a 3 manager swarm cluster with 17.09.01-ce
- Drain the first manger you want to upgrade
- Upgrade that swarm manager to 17.12.0-ce
- Set manager back to active and deploy a service that uses the ingress to it
- View status of the ingress network on the new manager
Describe the results you received: The upgraded swarm manager lost it’s ability for the ingress network to work correctly and cannot find it’s peers.
"Failed to find a load balancer IP to use for network: jttyybmsk9k45p8o2w95huz52"
Created date is borked as well as no peers showing.
docker network inspect ingress
[
{
"Name": "ingress",
"Id": "jttyybmsk9k45p8o2w95huz52",
"Created": "0001-01-01T00:00:00Z",
"Scope": "swarm",
"Driver": "overlay",
"EnableIPv6": false,
"IPAM": {
"Driver": "",
"Options": null,
"Config": [
{
"Subnet": "10.255.0.0/16",
"Gateway": "10.255.0.1"
}
]
},
"Internal": false,
"Attachable": false,
"Ingress": true,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": null,
"Options": {
"com.docker.network.driver.overlay.vxlanid_list": "4096"
},
"Labels": null
}
]
> journalctl -u docker.service
Jan 05 15:54:58 dev-swarm-manager-1 dockerd[2565]: time="2018-01-05T15:54:58.292600030Z" level=error msg="error receiving response" error="rpc error: code = Unimplemented desc = unknown method StreamRaftMessage"
Jan 05 15:54:59 dev-swarm-manager-1 dockerd[2565]: time="2018-01-05T15:54:59.292307212Z" level=error msg="error receiving response" error="rpc error: code = Unimplemented desc = unknown method StreamRaftMessage"
Jan 05 15:55:00 dev-swarm-manager-1 dockerd[2565]: time="2018-01-05T15:55:00.292482839Z" level=error msg="error receiving response" error="rpc error: code = Unimplemented desc = unknown method StreamRaftMessage"
Jan 05 15:55:01 dev-swarm-manager-1 dockerd[2565]: time="2018-01-05T15:55:01.292923988Z" level=error msg="error receiving response" error="rpc error: code = Unimplemented desc = unknown method StreamRaftMessage"
Jan 05 15:55:02 dev-swarm-manager-1 dockerd[2565]: time="2018-01-05T15:55:02.293461514Z" level=error msg="error receiving response" error="rpc error: code = Unimplemented desc = unknown method StreamRaftMessage"
Jan 05 15:55:03 dev-swarm-manager-1 dockerd[2565]: time="2018-01-05T15:55:03.294105186Z" level=error msg="error receiving response" error="rpc error: code = Unimplemented desc = unknown method StreamRaftMessage"
Jan 05 15:55:04 dev-swarm-manager-1 dockerd[2565]: time="2018-01-05T15:55:04.294598332Z" level=error msg="error receiving response" error="rpc error: code = Unimplemented desc = unknown method StreamRaftMessage"
Jan 05 15:55:05 dev-swarm-manager-1 dockerd[2565]: time="2018-01-05T15:55:05.295028437Z" level=error msg="error receiving response" error="rpc error: code = Unimplemented desc = unknown method StreamRaftMessage"
Jan 05 15:55:06 dev-swarm-manager-1 dockerd[2565]: time="2018-01-05T15:55:06.295668929Z" level=error msg="error receiving response" error="rpc error: code = Unimplemented desc = unknown method StreamRaftMessage"
Jan 05 15:55:07 dev-swarm-manager-1 dockerd[2565]: time="2018-01-05T15:55:07.295948610Z" level=error msg="error receiving response" error="rpc error: code = Unimplemented desc = unknown method StreamRaftMessage"
Describe the results you expected:
[
{
"Name": "ingress",
"Id": "jttyybmsk9k45p8o2w95huz52",
"Created": "2018-01-05T15:59:00.330647797Z",
"Scope": "swarm",
"Driver": "overlay",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "10.255.0.0/16",
"Gateway": "10.255.0.1"
}
]
},
"Internal": false,
"Attachable": false,
"Ingress": true,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": {
"ingress-sbox": {
"Name": "ingress-endpoint",
"EndpointID": "43c0062ea6cac4281985a5ae83c6924b7e4e5ddb493396a5bbc467e2fcdfec46",
"MacAddress": "02:42:0a:ff:00:03",
"IPv4Address": "10.255.0.3/16",
"IPv6Address": ""
}
},
"Options": {
"com.docker.network.driver.overlay.vxlanid_list": "4096"
},
"Labels": {},
"Peers": [
{
"Name": "dev-swarm-manager-2-59f2ef982566",
"IP": "10.21.5.6"
},
{
"Name": "dev-swarm-manager-3-0fd3c4b8bb56",
"IP": "10.21.5.3"
},
{
"Name": "dev-swarm-worker-2-4cedf0af0db1",
"IP": "10.21.5.9"
},
{
"Name": "dev-swarm-worker-1-e9ec013b553c",
"IP": "10.21.5.7"
},
{
"Name": "dev-swarm-worker-3-9a9051af1700",
"IP": "10.21.5.8"
},
{
"Name": "dev-swarm-manager-1-44e67a971c61",
"IP": "10.21.5.4"
}
]
}
]
> journalctl -u docker.service
should output the standard info msg showing peer joins
Jan 05 16:01:26 dev-swarm-manager-1 dockerd[2597]: time="2018-01-05T16:01:26.408714646Z" level=info msg="Node join event for dev-swarm-worker-1-e9ec013b553c/10.21.5.7"
Jan 05 16:01:29 dev-swarm-manager-1 dockerd[2597]: time="2018-01-05T16:01:29.631089430Z" level=info msg="Node join event for dev-swarm-manager-2-59f2ef982566/10.21.5.6"
Jan 05 16:01:56 dev-swarm-manager-1 dockerd[2597]: time="2018-01-05T16:01:56.411635308Z" level=info msg="Node join event for dev-swarm-worker-3-9a9051af1700/10.21.5.8"
Jan 05 16:02:26 dev-swarm-manager-1 dockerd[2597]: time="2018-01-05T16:02:26.414415630Z" level=info msg="Node join event for dev-swarm-manager-2-59f2ef982566/10.21.5.6"
Jan 05 16:02:56 dev-swarm-manager-1 dockerd[2597]: time="2018-01-05T16:02:56.417536038Z" level=info msg="Node join event for dev-swarm-worker-3-9a9051af1700/10.21.5.8"
Jan 05 16:02:59 dev-swarm-manager-1 dockerd[2597]: time="2018-01-05T16:02:59.630183224Z" level=info msg="Node join event for dev-swarm-manager-2-59f2ef982566/10.21.5.6"
Jan 05 16:03:26 dev-swarm-manager-1 dockerd[2597]: time="2018-01-05T16:03:26.420245662Z" level=info msg="Node join event for dev-swarm-worker-2-4cedf0af0db1/10.21.5.9"
Jan 05 16:03:29 dev-swarm-manager-1 dockerd[2597]: time="2018-01-05T16:03:29.177811962Z" level=info msg="Node join event for dev-swarm-worker-1-e9ec013b553c/10.21.5.7"
Jan 05 16:03:56 dev-swarm-manager-1 dockerd[2597]: time="2018-01-05T16:03:56.422717132Z" level=info msg="Node join event for dev-swarm-manager-2-59f2ef982566/10.21.5.6"
Jan 05 16:03:59 dev-swarm-manager-1 dockerd[2597]: time="2018-01-05T16:03:59.178312215Z" level=info msg="Node join event for dev-swarm-worker-1-e9ec013b553c/10.21.5.7"
Additional information you deem important (e.g. issue happens only occasionally):
Output of docker version
:
root@dev-swarm-manager-1:/# docker version
Client:
Version: 17.12.0-ce
API version: 1.35
Go version: go1.9.2
Git commit: c97c6d6
Built: Wed Dec 27 20:11:19 2017
OS/Arch: linux/amd64
Server:
Engine:
Version: 17.12.0-ce
API version: 1.35 (minimum version 1.12)
Go version: go1.9.2
Git commit: c97c6d6
Built: Wed Dec 27 20:09:53 2017
OS/Arch: linux/amd64
Experimental: false
Output of docker info
:
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 0
Server Version: 17.12.0-ce
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
NodeID: 3i3pigwa7vtugntyi7iglrztg
Is Manager: true
ClusterID: 1pj89qddk5ttm4t7nxqb4kgld
Managers: 3
Nodes: 6
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 3
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Force Rotate: 0
Autolock Managers: false
Root Rotation In Progress: false
Node Address: 10.21.5.4
Manager Addresses:
10.21.5.3:2377
10.21.5.4:2377
10.21.5.6:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 89623f28b87a6004d4b785663257362d1658a729
runc version: b2567b37d7b75eb4cf325b77297b140ea686ce8f
init version: 949e6fa
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.13.0-1002-gcp
Operating System: Ubuntu 16.04.3 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.301GiB
Name: dev-swarm-manager-1
ID: ZGSG:LCO4:MHJS:EAUT:OILG:GSSH:QGBZ:Q3M2:BMFL:2UIW:Q7LY:DQBP
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: No swap limit support
Additional environment details (AWS, VirtualBox, physical, etc.): Google Cloud Platform GCE instances
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 9
- Comments: 40 (12 by maintainers)
Tested and run OK for me!
Tested as well. The whole cluster update to 17.12.1-CE went good. Thanks guys !
Looks like 17.12.1-CE has addressed this and is available as of Feb 27. Anyone tried upgrading yet?
I can validate that this issue has not been fixed yet, even if upgrading all the servers in the swarm, overlay networking continues to NOT work.
I’ve just joined a 17.12.0 node to a swarm of 17.09.1 nodes. It was only when I drained all the older managers ready for upgrade that the entire cluster went dark unexpectedly at the network level. They all claim to be healthy - the new node is the only one that does not feature a
docker_gwbridge
interface and features the OP’s error in the logs.This should probably be mentioned in the “known issues” perhaps?
I encountered the same issue after upgrade to 17.12.0. This is quite broken for a release! I had to revert back to 17.09 to get my global services to re-launch on a node that was drained before upgrade. In the end I needed to force the node out of the swarm and rejoin to recover, in case that is a help to anyone else in the same boat. As well I’m running with --storage-driver=devicemapper, and had to destroy/recreate my thinpool to revert.