milvus: [Bug]: Loading multiple replicas succeeds but search gets empty results

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: master-20220512-08ad77c7
- Deployment mode(standalone or cluster): cluster
- SDK version(e.g. pymilvus v2.0.0rc2): 
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others: 
queryNode:
  replicas: 5
  resources:
    limits:
      memory: 6Gi

Current Behavior

  1. Deploy milvus cluster with 5 querynodes and limit memory 6Gi
  2. Create collection shards=2, dim=512
  3. Insert 250,000 entities
  4. Load with replica_number=2 Load collection didn’t raise any exception and return None, get loading progress is 0%, search didn’t return error but returned an empty result.
[2022-05-12 16:14:21,580 - DEBUG - ci_test]: (api_request)  : [Collection.load] args: [None, 2, 20], kwargs: {} (api_request.py:55)
[2022-05-12 16:14:41,890 - DEBUG - ci_test]: (api_response) : None  (api_request.py:27)
[2022-05-12 16:14:41,892 - DEBUG - ci_test]: (api_request)  : [loading_progress] args: ['stress_replicas_eLGkWaPH', None, 'default'], kwargs: {} (api_request.py:55)
[2022-05-12 16:14:41,941 - DEBUG - ci_test]: (api_response) : {'loading_progress': '0%', 'num_loaded_partitions': 0, 'not_loaded_partitions': ['_default']}  (api_request.py:27)
[2022-05-12 16:14:41,944 - DEBUG - ci_test]: (api_request)  : [Collection.search] args: [[[0.05953799588918755, 0.014457956435659617, 0.07216101581145963, 0.050750420991643265, 0.023499699808140893, 0.05601602483594028, 0.04207865574093704, 0.0025386970858502257, 0.06873906193401776, 0.030596640603098225, 0.02152956230301486, 0.009472897494950696, 0.015672369503394006, 0.03309160718232......, kwargs: {} (api_request.py:55)
[2022-05-12 16:14:42,247 - DEBUG - ci_test]: (api_response) : ['[]']  (api_request.py:27)
c.get_replicas('stress_replicas_eLGkWaPH')
Replica groups:
- Group: <group_id:433151409781211167>, <group_nodes:(5, 3)>, <shards:[Shard: <channel_name:by-dev-rootcoord-dml_0_433151539763940673v0>, <shard_leader:3>, <shard_nodes:[3]>, Shard: <channel_name:by-dev-rootcoord-dml_1_433151539763940673v1>, <shard_leader:5>, <shard_nodes:[5]>]>
- Group: <group_id:433151409781211166>, <group_nodes:(2, 1, 10)>, <shards:[Shard: <channel_name:by-dev-rootcoord-dml_0_433151539763940673v0>, <shard_leader:1>, <shard_nodes:[1, 2]>, Shard: <channel_name:by-dev-rootcoord-dml_1_433151539763940673v1>, <shard_leader:10>, <shard_nodes:[10, 2, 1]>]>

Server logs:
mic_logs.tar.gz

Expected Behavior

If load failed, return exception; If load succeeds, search return exception not loaded or load failed

Steps To Reproduce

nb = 25000
        dim = 512
        collection_w = ApiCollectionWrapper()
        utility_w = ApiUtilityWrapper()
        c_name = "stress_replicas_eLGkWaPH"
        collection_w.init_collection(name=c_name,
                                     schema=cf.gen_default_collection_schema(dim=dim))

        # insert 10 sealed segments
        for i in range(10):
            t0 = datetime.datetime.now()
            df = cf.gen_default_dataframe_data(nb=nb, dim=dim)
            res = collection_w.insert(df)[0]
            assert res.insert_count == nb
            log.info(f'After {i + 1} insert, num_entities: {collection_w.num_entities}')
            tt = datetime.datetime.now() - t0
            log.info(f"{i} insert and flush data cost: {tt}")

        log.debug(collection_w.num_entities)
        collection_w.load(replica_number=2)
        utility_w.loading_progress(collection_w.name)

        collection_w.search(cf.gen_vectors(1, 512),
                            ct.default_float_vec_field_name, ct.default_search_params,
                            ct.default_limit, timeout=60)

Milvus Log

No response

Anything else?

No response

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 1
  • Comments: 16 (16 by maintainers)

Most upvoted comments

@soothing-rain the posted log happens after the collection released. Previous the segment with id 433333143667474433 loaded successfully:

5465:[2022/05/20 08:19:20.905 +00:00] [INFO] [segment.go:748] ["load field done"] [fieldID=1] ["row count"=57] [segmentID=433333143667474433]
5466:[2022/05/20 08:19:20.905 +00:00] [INFO] [segment.go:748] ["load field done"] [fieldID=100] ["row count"=57] [segmentID=433333143667474433]
5467:[2022/05/20 08:19:20.905 +00:00] [INFO] [segment.go:748] ["load field done"] [fieldID=101] ["row count"=57] [segmentID=433333143667474433]
5468:[2022/05/20 08:19:20.905 +00:00] [INFO] [segment.go:748] ["load field done"] [fieldID=0] ["row count"=57] [segmentID=433333143667474433]
5469:[2022/05/20 08:19:20.905 +00:00] [DEBUG] [segment_loader.go:281] ["loading bloom filter..."] [segmentID=433333143667474433]
5476:[2022/05/20 08:19:20.917 +00:00] [DEBUG] [segment_loader.go:289] ["loading delta..."] [segmentID=433333143667474433]
5477:[2022/05/20 08:19:20.917 +00:00] [INFO] [segment_loader.go:586] ["there are no delta logs saved with segment, skip loading delete record"] [segmentID=433333143667474433]
5482:[2022/05/20 08:19:20.920 +00:00] [INFO] [partition.go:52] ["add a segment to replica"] [collectionID=433333143588569089] [partitionID=433333143588569090] [segmentID=433333143667474433]
5484:[2022/05/20 08:19:20.920 +00:00] [INFO] [impl.go:346] ["loadSegmentsTask WaitToFinish done"] [collectionID=433333143588569089] [segmentIDs="[433333143615245633,433333143667474433]"] [nodeID=1]
5574:[2022/05/20 08:19:21.133 +00:00] [INFO] [shard_cluster.go:253] ["ShardCluster sync segments"] ["replica segments"="[{\"node_id\":1,\"partition_id\":433333143588569090,\"segment_ids\":[433333143667474433]}]"] [state=3]
5575:[2022/05/20 08:19:21.133 +00:00] [INFO] [shard_cluster_service.go:142] ["successfully sync segments"] [channel=by-dev-rootcoord-dml_7_433333143588569089v1] [distribution="[{\"node_id\":1,\"partition_id\":433333143588569090,\"segment_ids\":[433333143667474433]}]"]