rqlite: cannot run quries to cluster members - only the leader responds
after a while (a couple of hours/days) our rqlite clusters (V5.11.1) reach a point where we can only query the leader:
for example running the query using any member other than the leader returns:
bash-4.4$ /rqlite -H rqlite-0.rqlite
Welcome to the rqlite CLI. Enter ".help" for usage hints.
Connected to version v5.11.1
rqlite-0.rqlite:4001> select * from rules
ERR! server responded with 503 Service Unavailable: not leader
rqlite-0.rqlite:4001> select * from issues
ERR! server responded with 503 Service Unavailable: not leader
the status response from one of the followers is :
curl http://rqlite-0.rqlite:4001/status?pretty
{
"build": {
"branch": "master",
"build_time": "2021-04-13T10:24:42-0400",
"commit": "927611c82c72056a99e20cde3279fac7fdf51484",
"version": "v5.11.1"
},
"http": {
"addr": "10.233.122.246:4001",
"auth": "disabled",
"redirect": ""
},
"node": {
"start_time": "2021-04-22T19:32:14.932309505Z",
"uptime": "16h38m28.56345708s"
},
"runtime": {
"GOARCH": "amd64",
"GOMAXPROCS": 8,
"GOOS": "linux",
"num_cpu": 8,
"num_goroutine": 17,
"version": "go1.15.7"
},
"store": {
"addr": "10.233.122.246:4002",
"apply_timeout": "10s",
"db_conf": {
"DSN": "",
"Memory": true
},
"dir": "/node",
"dir_size": 14560570,
"election_timeout": "5s",
"heartbeat_timeout": "4s",
"leader": {
"addr": "10.233.92.133:4002",
"node_id": "rqlite-4"
},
"metadata": {
"rqlite-0": {
"api_addr": "rqlite-0.rqlite:4001",
"api_proto": "http"
},
"rqlite-1": {
"api_addr": "rqlite-1.rqlite:4001",
"api_proto": "http"
},
"rqlite-2": {
"api_addr": "rqlite-2.rqlite:4001",
"api_proto": "http"
}
},
"node_id": "rqlite-0",
"nodes": [
{
"id": "rqlite-0",
"addr": "10.233.122.246:4002"
},
{
"id": "rqlite-1",
"addr": "10.233.89.103:4002"
},
{
"id": "rqlite-2",
"addr": "10.233.67.182:4002"
},
{
"id": "rqlite-3",
"addr": "10.233.100.143:4002"
},
{
"id": "rqlite-4",
"addr": "10.233.92.133:4002"
}
],
"raft": {
"applied_index": 496115,
"commit_index": 496115,
"fsm_pending": 0,
"last_contact": "21.858677ms",
"last_log_index": 496115,
"last_log_term": 28,
"last_snapshot_index": 494286,
"last_snapshot_term": 22,
"latest_configuration": "[{Suffrage:Voter ID:rqlite-3 Address:10.233.100.143:4002} {Suffrage:Voter ID:rqlite-4 Address:10.233.92.133:4002} {Suffrage:Voter ID:rqlite-0 Address:10.233.122.246:4002} {Suffrage:Voter ID:rqlite-2 Address:10.233.67.182:4002} {Suffrage:Voter ID:rqlite-1 Address:10.233.89.103:4002}]",
"latest_configuration_index": 0,
"log_size": 8388608,
"num_peers": 4,
"protocol_version": 3,
"protocol_version_max": 3,
"protocol_version_min": 0,
"snapshot_version_max": 1,
"snapshot_version_min": 0,
"state": "Follower",
"term": 28
},
"request_marshaler": {
"compression_batch": 5,
"compression_size": 150,
"force_compression": false
},
"snapshot_interval": 30000000000,
"snapshot_threshold": 4096,
"sqlite3": {
"db_size": 22736896,
"dsn": "",
"fk_constraints": "disabled",
"path": ":memory:",
"version": "3.34.0"
},
"trailing_logs": 5120
}
}
please notice that in the status metadata there are only 3 nodes and not all 5 members appear under:
.store.metadata
they do however appear under:
.store.nodes
the leader actually does not appear in .store.metadata.
Can yo please assist?
Thanks in advance, L
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 38 (21 by maintainers)
I’m about 33% the way through this, few more days to go. Hopefully you can work around the issues in the meantime.
These changes will probably warrant the start of the 6.0 series. It’ll be straightforward to upgrade however, from 5.0.