moby: could not resolve address of member ID

Hello, Description

I’m facing a weird issue with swarm, and would like to have some assistance on that one. I’ve added in a swarm of 3 managers (version 17.03) a new host as a worker with no issue. After that a VMware rollback has been done on that worker after an upgrade in debian stretch (was for a test). The rollback seems have broken something in my swarm.

No data available from the swarm --> Error: rpc error: code = 4 desc = context deadline exceeded

I have not the ability to delete and create the swarm again as it is currently in use with some services.

Steps to reproduce the issue:

Add host as a worker into a swarm (Debian 8.8)
VWmare snapshot on that host
Upgrade Debian 9 stretch
Rollback on the snapshot
Promote Worker to Manager or add new host as manager

Describe the results you received: Now when I want to promote the worker as a manager, or even add a new host to my swarm as a manager, I’ve got that on my logs every seconds :

Jul 18 14:23:51 docker-5 dockerd[710]: time="2017-07-18T14:23:51.286450834+02:00" level=warning msg="sending message to an unrecognized member ID 5c9f1beef700d602" raft_id=6b9c136b96a2f655
Jul 18 14:23:51 docker-5 dockerd[710]: time="2017-07-18T14:23:51.286554667+02:00" level=error msg="could not resolve address of member ID 5c9f1beef700d602" error="rpc error: code = 9 desc = grpc: the client connection is closing" raft_id=6b9c136b96a2f655

The “could not resolve address of member ID 5c9f1beef700d602” is the same ID on both of the host that I Try to add as a manager into my swarm. But I could not find to what it is related

When I add them, it’s taking time, like if somehting was in timeout and then end normally. But I could not use any of the swarm command on those 2 managers.

Describe the results you expected:

Have a responding and working new manager in my Swarm.

Additional information you deem important (e.g. issue happens only occasionally):

Output of docker version:

Client:
 Version:      17.03.1-ce
 API version:  1.27
 Go version:   go1.7.5
 Git commit:   c6d412e
 Built:        Mon Mar 27 17:07:28 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.03.1-ce
 API version:  1.27 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   c6d412e
 Built:        Mon Mar 27 17:07:28 2017
 OS/Arch:      linux/amd64
 Experimental: false

Output of docker info:

Swarm: active
NodeID: yhpvwcm60oko20leg9z7vieop
Error: rpc error: code = 4 desc = context deadline exceeded
Is Manager: true
ClusterID:
Managers: 0
Nodes: 0
Orchestration:
Task History Retention Limit: 0
Raft:
Snapshot Interval: 0
Heartbeat Tick: 0
Election Tick: 0
Dispatcher:
Heartbeat Period: Less than a second
CA Configuration:
Expiry Duration: Less than a second

Additional environment details (AWS, VirtualBox, physical, etc.): VMWare

Many thanks !

Best regards,

Trafle73

About this issue

Original URL
State: closed
Created 7 years ago
Comments: 16 (6 by maintainers)

Most upvoted comments

Any update on this? I’m encountering the same issue with 17.09. Is there a way to cleanup the stale raft ID?

extrail on Dec 31, 2017