rqlite: Resuming instance from different network address with existing raft.db triggers unnecessary election and fails to join

What version are you running? Tested 5.9.0 and 6.0.0 on OSX. Was also having the same issues with the rqlite/rqlite:6.0.0 docker image.

What did you do? Started 3 rqlited’s locally on different ports to form a 3 proc cluster, say node-ids rqlite-0, rqlite-1 and rqlite-2.

My intent was to remove rqlite-0 running on port 4001+4002 and re-add it on another port, say 4007+4008 to simulate a different network address like one may find when pods migrate around in Kubernetes.

What did you expect to happen?

When rejoining the cluster by either using the network addresses inside the existing raft log or by using the new -join params provided it would connect to the other members and catch its log up to the latest point and continue along its merry way at the new network addresses without needing to replicate the entire raft log.

What happened instead?

No matter what I tried it would try to trigger a new vote that the other nodes would reject because there was already a healthy leader.

The only way to get it to rejoin was by wiping out the raft.db for that process and having it join fresh.

Please include the Status, Nodes, and Expvar output from each node (or at least the Leader!)

Start 3 rqlited’s. Have the 2nd and 3rd join the first to form the cluster.

$ rqlited -node-id=rqlite-0 -http-addr=localhost:4001 -raft-addr=localhost:4002 0
$ rqlited -node-id=rqlite-1 -http-addr=localhost:4003 -raft-addr=localhost:4004 -join localhost:4001 1
$ rqlited -node-id=rqlite-2 -http-addr=localhost:4005 -raft-addr=localhost:4006 -join localhost:4001 2

Check cluster state:

$ curl "http://localhost:4003/nodes?pretty"
{
    "rqlite-0": {
        "api_addr": "http://localhost:4001",
        "addr": "127.0.0.1:4002",
        "reachable": true,
        "leader": true
    },
    "rqlite-1": {
        "api_addr": "http://localhost:4003",
        "addr": "127.0.0.1:4004",
        "reachable": true,
        "leader": false
    },
    "rqlite-2": {
        "api_addr": "http://localhost:4005",
        "addr": "127.0.0.1:4006",
        "reachable": true,
        "leader": false
    }
}

Kill rqlite-0 and check to see that the leader has moved.

$ curl "http://localhost:4003/nodes?pretty"
{
    "rqlite-0": {
        "addr": "127.0.0.1:4002",
        "reachable": false,
        "leader": false
    },
    "rqlite-1": {
        "api_addr": "http://localhost:4003",
        "addr": "127.0.0.1:4004",
        "reachable": true,
        "leader": false
    },
    "rqlite-2": {
        "api_addr": "http://localhost:4005",
        "addr": "127.0.0.1:4006",
        "reachable": true,
        "leader": true
    }
}

Remove rqlite-0 from the cluster

$ MASTER=$(curl -s "http://localhost:4001/nodes" "http://localhost:4003/nodes" "http://localhost:4005/nodes" | jq -r '.[] | select(.leader==true) | .api_addr' | head -n 1)
$ curl -XDELETE -Lv $MASTER/remove -d "{\"id\": \"rqlite-0\"}"

Check cluster state:

$ curl "http://localhost:4003/nodes?pretty"
{
  "rqlite-1": {
    "api_addr": "http://localhost:4003",
    "addr": "127.0.0.1:4004",
    "reachable": true,
    "leader": false
  },
  "rqlite-2": {
    "api_addr": "http://localhost:4005",
    "addr": "127.0.0.1:4006",
    "reachable": true,
    "leader": true
  }
}

Restart rqlite-0 and you’ll see that it tries to trigger a vote forever

[rqlited] 2021/06/16 10:11:48 rqlited starting, version 6, commit unknown, branch unknown
[rqlited] 2021/06/16 10:11:48 go1.16.3, target architecture is amd64, operating system target is darwin
[rqlited] 2021/06/16 10:11:48 launch command: rqlited -node-id=rqlite-0 -http-addr=localhost:4001 -raft-addr=localhost:4002 0
[mux] 2021/06/16 10:11:48 received handler registration request for header 1
[mux] 2021/06/16 10:11:48 received handler registration request for header 2
[cluster] 2021/06/16 10:11:48 service listening on 127.0.0.1:4002
[mux] 2021/06/16 10:11:48 mux serving on 127.0.0.1:4002, advertising 127.0.0.1:4002
[rqlited] 2021/06/16 10:11:48 preexisting node state detected in /Users/brentaylor/rqlite-data/0
[rqlited] 2021/06/16 10:11:48 no join addresses set
[store] 2021/06/16 10:11:48 opening store with node ID rqlite-0
[store] 2021/06/16 10:11:48 ensuring directory at /Users/brentaylor/rqlite-data/0 exists
[store] 2021/06/16 10:11:48 0 preexisting snapshots present
[store] 2021/06/16 10:11:48 first log index: 1, last log index: 4, last command log index: 0:
2021-06-16T10:11:48.141-0700 [INFO]  raft: initial configuration: index=4 servers="[{Suffrage:Voter ID:rqlite-0 Address:127.0.0.1:4002} {Suffrage:Voter ID:rqlite-1 Address:127.0.0.1:4004} {Suffrage:Voter ID:rqlite-2 Address:127.0.0.1:4006}]"
[store] 2021/06/16 10:11:48 no cluster bootstrap requested
2021-06-16T10:11:48.141-0700 [INFO]  raft: entering follower state: follower="Node at 127.0.0.1:4002 [Follower]" leader=
2021-06-16T10:11:50.046-0700 [WARN]  raft: heartbeat timeout reached, starting election: last-leader=
2021-06-16T10:11:50.046-0700 [INFO]  raft: entering candidate state: node="Node at 127.0.0.1:4002 [Candidate]" term=3
2021-06-16T10:11:50.239-0700 [INFO]  raft: entering follower state: follower="Node at 127.0.0.1:4002 [Follower]" leader=
2021-06-16T10:11:52.016-0700 [WARN]  raft: heartbeat timeout reached, starting election: last-leader=
2021-06-16T10:11:52.016-0700 [INFO]  raft: entering candidate state: node="Node at 127.0.0.1:4002 [Candidate]" term=5
2021-06-16T10:11:53.850-0700 [WARN]  raft: Election timeout reached, restarting election
2021-06-16T10:11:53.850-0700 [INFO]  raft: entering candidate state: node="Node at 127.0.0.1:4002 [Candidate]" term=6
2021-06-16T10:11:55.282-0700 [WARN]  raft: Election timeout reached, restarting election
2021-06-16T10:11:55.282-0700 [INFO]  raft: entering candidate state: node="Node at 127.0.0.1:4002 [Candidate]" term=7
... etc etc
[rqlited] 2021/06/16 10:13:48 leader did not appear within timeout: timeout expired

And the other nodes will reject the request for a vote since a leader is already present and healthy

2021-06-16T10:13:15.361-0700 [WARN]  raft: rejecting vote request since we have a leader: from=127.0.0.1:4002 leader=127.0.0.1:4006

Kill all the rqlited’s and delete the raft state on disk. Restart the cluster fresh. Once cluster is formed kill rqlite-0, remove the node and start it on new ports and tell it to join the others:

$ MASTER=$(curl -s "http://localhost:4001/nodes" "http://localhost:4003/nodes" "http://localhost:4005/nodes" | jq -r '.[] | select(.leader==true) | .api_addr' | head -n 1)
$ curl -XDELETE -Lv $MASTER/remove -d "{\"id\": \"rqlite-0\"}"
rqlited -node-id=rqlite-0 -http-addr=localhost:4007 -raft-addr=localhost:4008 -join localhost:4003,localhost:4005 0

The same thing happens as before where rqlite-0 triggers a vote and the others reject. Eventually rqlite-0 fails. The other nodes never consider it joined.

$ curl "http://localhost:4003/nodes?pretty"
{
  "rqlite-1": {
    "api_addr": "http://localhost:4003",
    "addr": "127.0.0.1:4004",
    "reachable": true,
    "leader": true
  },
  "rqlite-2": {
    "api_addr": "http://localhost:4005",
    "addr": "127.0.0.1:4006",
    "reachable": true,
    "leader": false
  }
}

If we remove rqlite-0’s state it can rejoin at the new address without issue and it just replicates the raft log from the other members. This does not require the node to be removed, it gets replaced properly.

$ rm -rf 0
rqlited -node-id=rqlite-0 -http-addr=localhost:4007 -raft-addr=localhost:4008 -join localhost:4003,localhost:4005 0
[rqlited] 2021/06/16 10:28:04 rqlited starting, version 6, commit unknown, branch unknown
[rqlited] 2021/06/16 10:28:04 go1.16.3, target architecture is amd64, operating system target is darwin
[rqlited] 2021/06/16 10:28:04 launch command: rqlited -node-id=rqlite-0 -http-addr=localhost:4007 -raft-addr=localhost:4008 -join localhost:4003,localhost:4005 0
[mux] 2021/06/16 10:28:04 received handler registration request for header 1
[mux] 2021/06/16 10:28:04 received handler registration request for header 2
[cluster] 2021/06/16 10:28:04 service listening on 127.0.0.1:4008
[mux] 2021/06/16 10:28:04 mux serving on 127.0.0.1:4008, advertising 127.0.0.1:4008
[rqlited] 2021/06/16 10:28:04 no preexisting node state detected in /Users/brentaylor/rqlite-data/0, node may be bootstrapping
[rqlited] 2021/06/16 10:28:04 join addresses specified, node is not bootstrapping
[store] 2021/06/16 10:28:04 opening store with node ID rqlite-0
[store] 2021/06/16 10:28:04 ensuring directory at /Users/brentaylor/rqlite-data/0 exists
[store] 2021/06/16 10:28:04 0 preexisting snapshots present
[store] 2021/06/16 10:28:04 first log index: 0, last log index: 0, last command log index: 0:
2021-06-16T10:28:04.534-0700 [INFO]  raft: initial configuration: index=0 servers=[]
[store] 2021/06/16 10:28:04 no cluster bootstrap requested
[rqlited] 2021/06/16 10:28:04 join addresses are: [localhost:4003 localhost:4005]
2021-06-16T10:28:04.534-0700 [INFO]  raft: entering follower state: follower="Node at 127.0.0.1:4008 [Follower]" leader=
[rqlited] 2021/06/16 10:28:04 successfully joined cluster at http://localhost:4003/join
2021-06-16T10:28:04.675-0700 [WARN]  raft: failed to get previous log: previous-index=7 last-index=0 error="log not found"
[store] 2021/06/16 10:28:04 waiting for up to 2m0s for application of initial logs
[rqlited] 2021/06/16 10:28:04 store has reached consensus
[http] 2021/06/16 10:28:04 service listening on 127.0.0.1:4007
[rqlited] 2021/06/16 10:28:04 node is ready

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 36 (23 by maintainers)

Most upvoted comments

OK, I read your issue. Hashicorp Raft doesn’t work like that AFAIK. If you explicitly remove the node, you must also remove the data directory. Removing node means just that – it’s completely removed, and if you start it up with any preexisting state, it won’t just become part of the cluster and use the state underneath.

I believe the best way to think about this is one you remove the node, you should consider it gone. You can’t decide to use it again. Don’t think of a node “changing network addresses” – instead think of adding a new node at the network address.

I’m interested in making rqlite work better with Kubernetes. If you can provide examples of clustered systems working in the manner you wish with Kubernetes, I’d be interested in studying them. But the behaviour you are seeing is the Hashicorp system – the same one that powers Consul.