qdrant: The seed server cannot re-join cluster after restarting.

In distributed mode, if the seed server gets restarted, it cannot re-join the cluster.

Current Behavior

The seed server cannot re-join the cluster after restarting.

Steps to Reproduce

start three servers, by running command ./qdrant --uri host-0, ./qdrant --bootstrap host-0 --uri host-1, ./qdrant --bootstrap host-0 --uri host-2
The cluster works normally.
restart the seed server without any states kept (e.g. the container gets recreated), re-join the cluster by command ./qdrant --bootstrap host-2 --uri host-0,

Following is the logs from the first container.

[2023-06-24T11:39:27.919Z INFO storage::content_manager::consensus::persistent] Initializing new raft state at ./storage/raft_state
[2023-06-24T11:39:37.930Z ERROR qdrant::startup] Panic backtrace:
0: qdrant::startup::setup_panic_hook::{{closure}}
1: as core::ops::function::Fn>::call
at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/alloc/src/boxed.rs:1987:9
2: std::panicking::rust_panic_with_hook
at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:695:13
3: std::panicking::begin_panic_handler::{{closure}}
at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:582:13
4: std::sys_common::backtrace::__rust_end_short_backtrace
at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/sys_common/backtrace.rs:150:18
5: rust_begin_unwind
at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:578:5
6: core::panicking::panic_fmt
at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:67:14
7: core::result::unwrap_failed
at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/result.rs:1687:5
8: qdrant::main
9: std::sys_common::backtrace::__rust_begin_short_backtrace
10: main
11:
12: __libc_start_main
13:

[2023-06-24T11:39:37.930Z ERROR qdrant::startup] Panic occurred in file src/main.rs at line 277: Can't initialize consensus: Failed to initialize Consensus for new Raft state: Failed to add peer to known: status: Internal, message: "Failed to add peer: Service internal error: Failed to propose operation: leader is not established within 10 secs", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "date": "Sat, 24 Jun 2023 11:39:37 GMT", "content-length": "0"} }

Expected Behavior

The seed server should join the cluster successfully.

Possible Solution

Context (Environment)

Detailed Description

Possible Implementation

About this issue

Original URL
State: closed
Created a year ago
Reactions: 1
Comments: 15 (5 by maintainers)

Most upvoted comments

Some other distributed systems do work in this scenario. For example, Apache Cassandra, if I remember it right. I’d expect qdrant to work too. Thanks for letting me know the limitation.

Happy to see that clarifies what’s going on. I assume that fixed the problem.

Currently this is required to make sure the same node (machine) gets the same peer ID assigned. Maybe there’s something we can improve upon here, but we haven’t planned anything like that.

Closing this now, feel fee to open it if you have further questions.

timvisee on Jan 3, 2024