moby: Swarm 17.07.0 unable to join nodes
Description
Hi,
Upgrade from docker-ce 17.06.2 to docker-ce-17.07.0 break my cluster. My manager is up and running but worker (or manager) nodes cannot join it.
Steps to reproduce the issue:
- install docker-ce-17.07.0 on 2 VMs (say M and W)
- run
docker swarm init --advertise-addr=<M_public_ip>:2377
on M (get the worker join command) - run the join command on W
Describe the results you received:
Error response from daemon: rpc error: code = Unavailable desc = grpc: the connection is unavailable
Describe the results you expected:
This node joined a swarm as a worker.
Additional information you deem important (e.g. issue happens only occasionally):
I used a Terrafom+Ansible that was successfully ran many time to create clusters. It even appears on an existing cluster after upgrading to 17.07.0
Output of docker version
:
Client:
Version: 17.07.0-ce
API version: 1.31
Go version: go1.8.3
Git commit: 8784753
Built: Tue Aug 29 17:42:01 2017
OS/Arch: linux/amd64
Server:
Version: 17.07.0-ce
API version: 1.31 (minimum version 1.12)
Go version: go1.8.3
Git commit: 8784753
Built: Tue Aug 29 17:43:23 2017
OS/Arch: linux/amd64
Experimental: false
Output of docker info
:
Containers: 1
Running: 1
Paused: 0
Stopped: 0
Images: 1
Server Version: 17.07.0-ce
Storage Driver: overlay
Backing Filesystem: xfs
Supports d_type: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
NodeID: eqt38cc9zdy5i15yflndsj0ze
Is Manager: true
ClusterID: u7yvqf60ew3r1n867bjstz2bh
Managers: 1
Nodes: 1
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 3
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Force Rotate: 0
Root Rotation In Progress: false
Node Address: 10.90.251.200
Manager Addresses:
10.90.251.200:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 3addd840653146c90a254301d6c3a663c7fd6429
runc version: 2d41c047c83e09a6d61d464906feb2a2f3c52aa4
init version: 949e6fa
Security Options:
seccomp
Profile: default
Kernel Version: 3.10.0-514.21.1.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 1.797GiB
Name: swarm-mode-latest-master-0
ID: W4TM:QHM2:MWD4:52SK:XMIM:TJA5:2NCX:4WAB:77C5:TF66:WVKX:XGTD
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Http Proxy: http://some-proxy:3128
Https Proxy: http://some-proxy:3128
No Proxy: localhost,172.17.42.1,.sock
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Additional environment details (AWS, VirtualBox, physical, etc.):
The swarm mode cluster run on KVM VMs
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 2
- Comments: 30 (9 by maintainers)
I fixed this on CentOS7 by adding firewalld rules on the manager nodes:
I don’t have a proxy between my manager and worker.
manager ip: 192.168.2.103 worker ip : 192.168.2.102
I ssh in to the worker from the manager node and issue the swarm join command:
docker swarm join --token SWMTKN-1-578xkqwthbmkn7fnegmtx4pn73h0wh5qgvzw03gb50xmg1t0qu-a1k7myosfu4zkruw7y1x81l1k 192.168.2.103:2377
Error response from daemon: rpc error: code = Unavailable desc = grpc: the connection is unavailable
How to fix this? Can’t understand how you guys solved this after following this thread???
Please help
Hey guys, I just init the swarm in my current mac, which have the same issue with you.
If I create a manager1 firstly, then ssh into it and init the swarm, it works.
Now, I create a worker1, then ssh into it and join it to the swarm cluster, it works perfectly.
So, the problem is that we might not init the swarm in our current macos, but in a manager we create.
BTW, I’ve published this post on Docker forum too.
Is this issue going to be fixed? Because this means that if you want to add a new master to a swarm you have to restart all the other master nodes to update their no_proxy list (no_proxy cannot use wild-cards). Which is really very annoying.
To be concrete: in the Docker daemon a http_proxy variable was set because a proxy is needed to access the DockerRegistry. This unexpectedly caused all internal gRPC traffic to go via the outgoing proxy, causing much mayhem. If you know the IP of your future nodes upfront you can mitigate it a bit, but it is very annoying.
AFAICS gRPC support the concept of a “proxy mapper” which would allow Docker control whether the proxy is used or not, and thus automatically exclude other nodes.
in any case, having to add all potential swarm participant nodes mutually to their NO_PROXY environment variable just lets me ansible some more…
Proxy issue… While it wasn’t needed in version < 17.07.0, it is now required to configure daemon proxy according your infra. Adding master ip in no_proxy (docker daemon service) fix the issue. thx @ktoublanc