qdrant: storage::content_manager::consensus_manager: Failed to apply collection meta operation entry with user error: Bad request: There is no transfer for shard X from N to M

After some time from the deployment there seems to be a bug with raft consensus in Qdrant. I am using custom sharding in Qdrant with 50 shards, replication factor of 3 and 3 nodes. After indexing about 700K vectors, it seems like the nodes cannot sync. I observe the following errors:

qdrant 2024-02-17T08:49:33.620466Z  WARN storage::content_manager::consensus_manager: Failed to apply collection meta operation entry with user error: Bad request: There is no transfer for shard 23 from 163633349341480 to 5558157984746553                               

qdrant 2024-02-17T08:49:33.688112Z  WARN storage::content_manager::consensus_manager: Failed to apply collection meta operation entry with user error: Bad request: There is no transfer for shard 5 from 163633349341480 to 5558157984746553                                   

qdrant 2024-02-17T08:49:33.740899Z  WARN storage::content_manager::consensus_manager: Failed to apply collection meta operation entry with user error: Bad request: There is no transfer for shard 49 from 163633349341480 to 5558157984746553                                  

qdrant 2024-02-17T08:49:33.811285Z  WARN storage::content_manager::consensus_manager: Failed to apply collection meta operation entry with user error: Bad request: There is no transfer for shard 18 from 163633349341480 to 5558157984746553                                  

 qdrant 2024-02-17T08:49:33.869499Z  WARN storage::content_manager::consensus_manager: Failed to apply collection meta operation entry with user error: Bad request: There is no transfer for shard 4 from 163633349341480 to 5558157984746553                                   

qdrant 2024-02-17T08:49:33.934505Z  WARN storage::content_manager::consensus_manager: Failed to apply collection meta operation entry with user error: Bad request: There is no transfer for shard 30 from 163633349341480 to 5558157984746553                                  

qdrant 2024-02-17T08:49:33.986275Z  WARN storage::content_manager::consensus_manager: Failed to apply collection meta operation entry with user error: Bad request: Shard 26 is already involved in transfer 163633349341480 -> 7098510233670791                                

qdrant 2024-02-17T08:49:34.038138Z  WARN storage::content_manager::consensus_manager: Failed to apply collection meta operation entry with user error: Bad request: There is no transfer for shard 40 from 163633349341480 to 5558157984746553                                  

qdrant 2024-02-17T08:49:34.092200Z  WARN storage::content_manager::consensus_manager: Failed to apply collection meta operation entry with user error: Bad request: There is no transfer for shard 23 from 163633349341480 to 5558157984746553                                  

above error can be seen in one of the nodes, and other nodes seem to not be able to synchronize with the aformentioned node, hence the following error in other nodes:

2024-02-17T08:52:24.668625Z ERROR qdrant::consensus: Failed to forward message Message { msg_type: MsgAppend, to: 163633349341480, from: 7098510233670791, term: 1088, log_term: 992, index: 237286, entries: [Entry { entry_type: EntryNormal, term: 992, index: 237287, data: [161, 110, 67, 111, 108, 108, 101, 99, 116, 105, 111, 110, 77, 101, 116, 97, 161, 110, 116, 114, 97, 110, 115, 102, 101, 114, 95, 115, 104, 97, 114, 100, 130, 107, 117, 114, 108, 115, 108, 97, 98, 95, 98, 111, 116, 161, 101, 65, 98, 111, 114, 116, 162, 104, 116, 114, 97, 110, 115, 102, 101, 114, 163, 104, 115, 104, 97, 114, 100, 95, 105, 100, 24, 30, 100, 102, 114, 111, 109, 27, 0, 0, 148, 210, 219, 169, 49, 40, 98, 116, 111, 27, 0, 19, 191, 29, 128, 73, 76, 57, 102, 114, 101, 97, 115, 111, 110, 111, 116, 114, 97, 110, 115, 102, 101, 114, 32, 102, 97, 105, 108, 101, 100], context: [], sync_log: false }], commit: 338981, commit_term: 0, snapshot: None, request_snapshot: 0, reject: false, reject_hint: 0, context: [], deprecated_priority: 0, priority: 0 } to message sender task 163633349341480: message sender task queue is full. Message will be dropped.    

Current Behavior

The cluster nodes cannot synchronise after few indexes

Steps to Reproduce

  1. create custom shard collection with 50 shards
  2. index 700K or more vectors
  3. the nodes cannot synchronise and above warnings are thrown, however, the pod in K8s doesn’t die

Following is my collection config:

{
  "params": {
    "vectors": {
      "size": 768,
      "distance": "Cosine"
    },
    "shard_number": 1,
    "sharding_method": "custom",
    "replication_factor": 3,
    "write_consistency_factor": 1,
    "on_disk_payload": true
  },
  "hnsw_config": {
    "m": 16,
    "ef_construct": 100,
    "full_scan_threshold": 10000,
    "max_indexing_threads": 0,
    "on_disk": false
  },
  "optimizer_config": {
    "deleted_threshold": 0.2,
    "vacuum_min_vector_number": 1000,
    "default_segment_number": 0,
    "max_segment_size": null,
    "memmap_threshold": 20000,
    "indexing_threshold": 20000,
    "flush_interval_sec": 5,
    "max_optimization_threads": 1
  },
  "wal_config": {
    "wal_capacity_mb": 32,
    "wal_segments_ahead": 0
  },
  "quantization_config": {
    "scalar": {
      "type": "int8",
      "always_ram": true
    }
  }
}

and /cluster:

{
  "result": {
    "status": "enabled",
    "peer_id": 7098510233670791,
    "peers": {
      "7098510233670791": {
        "uri": "http://qdrant-0.qdrant-headless:6335/"
      },
      "5558157984746553": {
        "uri": "http://qdrant-1.qdrant-headless:6335/"
      },
      "163633349341480": {
        "uri": "http://qdrant-2.qdrant-headless:6335/"
      }
    },
    "raft_info": {
      "term": 1088,
      "commit": 339032,
      "pending_operations": 0,
      "leader": 7098510233670791,
      "role": "Leader",
      "is_voter": true
    },
    "consensus_thread_status": {
      "consensus_thread_status": "working",
      "last_update": "2024-02-17T08:55:57.214602900Z"
    },
    "message_send_failures": {}
  },
  "status": "ok",
  "time": 0.000007711
}

Expected Behavior

consensus without any bug

Context (Environment)

Using Helm chart with v1.7.4 qdrant image

About this issue

  • Original URL
  • State: closed
  • Created 4 months ago
  • Comments: 20 (9 by maintainers)

Most upvoted comments

I’ve seen this being asked before, so it might be good to implement this based on user demand. We’ll decide on this internally.

Information about optimizers is already exposed however. It is not in Prometheus format at this time, but it is listed at an HTTP endpoint.

Simply fetch this, note that the URL parameter is important here:

GET /telemetry?details_level=3

In the response it lists the optimization status per shard:

                "optimizations": {
                  "status": "ok",
                  "optimizations": {
                    "count": 0,
                    "fail_count": 2,
                    "last_responded": "2024-02-22T10:21:23.900Z"
                  },
                  "log": [
                    {
                      "name": "indexing",
                      "segment_ids": [
                        7510136630065649000,
                        491536206735816100
                      ],
                      "status": "done",
                      "start_at": "2024-02-22T10:21:23.856747599Z",
                      "end_at": "2024-02-22T10:21:23.904835842Z"
                    },
                    {
                      "name": "indexing",
                      "segment_ids": [
                        7510136630065649000,
                        491536206735816100
                      ],
                      "status": "optimizing",
                      "start_at": "2024-02-22T10:21:23.850787904Z",
                      "end_at": null
                    }
                  ]
                }

Note that this is per shard, so I can imagine that you’ll be having a lot of them. Also, this log of optimizations is not persisted, and is lost when a node restarts.

Have you experimented with e5 model, to measure how much accuracy is decreased when using binary quantization?

I don’t think we have. I’d recommend to benchmark this yourself to ensure your data is compatible. Using binary quantization would compress the quantized data by another factor of 8.

Update: qdrant didn’t crash more than 24 hours with following configuration:

    "memmap_threshold": 4000,
    "indexing_threshold": 8000,

(We changed vector size to 1024 recently and we have 2.4M vectors)

Even there was problem with 2 nodes (from 3 nodes), cluster eventually recovered from Dead shards.

Qdrant crash with this message minimum once a day. When this event happens we see following: 2 of 3 nodes has 100% CPU overpowered (even logs are not shown) and one node is accessible but overpowered, we see messages like these: Screenshot from 2024-02-20 05-46-30

This is how performance looks like on crashed nodes: Screenshot from 2024-02-20 06-02-57 Screenshot from 2024-02-20 06-02-52 Screenshot from 2024-02-20 06-02-46

Our server setup: 3 pods, 8GB ram, 4 CPUs

Cluster will never recover after the crash, it is not enough to restart single pod or one by one - we always need to drop all pods and restart them in the same time. After restart it takes many mnutes 20+ to recover.

Collection:

  • 20 shards with 3 replicas (on each server should be one copy of each shard)
  • vector length 768
  • about 800k documents
  • we store just 10 metadatafields with string data shorter as 200 characters (no file contents or similar)
  • we upsert documents and delete old documents
  • custom sharding, we hash domain name into shards (number of documents in shards is not equal)