openraft: How to handle and remove unreachable nodes
It seems that currently - should a node disconnect abruptly, and we did not remove the node explicitly beforehand - the leader “spins” forever, retrying the request repeatedly.
I tried sending returning both an RPCError::Timeout and RPCError::Network from my implementation, but it seems that there’s no upper limit currently, and it just retries the request forever. This means that the leader cannot call remove_member after the fact (even in that case, it will fail the preflight is_leader() check).
I tried to return RPCError::NodeNotFound, which resulted in a panic.
Am I doing something wrong? Is there a proper way to remove nodes that can no longer be reached?
(sidenote: I am on the current main branch if that is of any significance.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 17 (7 by maintainers)
Commits related to this issue
- Feature: add config: remove-replication to specify when to stop replication to a unreachable removed node - Fix: #333 — committed to drmingdrmer/openraft by drmingdrmer 2 years ago
- Feature: remove replication stream for unreachable node The default trigger event to remove the replication to a node that is not in membership is uniform membership log being committed and replicate... — committed to drmingdrmer/openraft by drmingdrmer 2 years ago
- Feature: remove replication stream for unreachable node The default trigger event to remove the replication to a node that is not in membership is uniform membership log being committed and replicate... — committed to drmingdrmer/openraft by drmingdrmer 2 years ago
- Feature: remove replication stream for unreachable node The default trigger event to remove the replication to a node that is not in membership is uniform membership log being committed and replicate... — committed to drmingdrmer/openraft by drmingdrmer 2 years ago
- Feature: remove replication stream for unreachable node The default trigger event to remove the replication to a node that is not in membership is uniform membership log being committed and replicate... — committed to drmingdrmer/openraft by drmingdrmer 2 years ago
- Feature: remove replication stream for unreachable node The default trigger event to remove the replication to a node that is not in membership is uniform membership log being committed and replicate... — committed to drmingdrmer/openraft by drmingdrmer 2 years ago
- Feature: remove replication stream for unreachable node The default trigger event to remove the replication to a node that is not in membership is uniform membership log being committed and replicate... — committed to drmingdrmer/openraft by drmingdrmer 2 years ago
- Feature: remove replication stream for unreachable node The default trigger event to remove the replication to a node that is not in membership is uniform membership log being committed and replicate... — committed to datafuselabs/openraft by drmingdrmer 2 years ago
I just tested the fix, and it works! Thanks
I have compiled a full log here: https://gist.github.com/indietyp/f899ef6b4e0ba10c4d4be987d6f12692 (it is quite log)
I will try to upload a complete log via gist (I hope that’s ok) as soon as possible.