milvus: [Bug]: [benchmark] Concurrency stability test, search raise error: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded, flush time is getting longer and longer

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: master-20220513-168935f4
- Deployment mode(standalone or cluster):cluster
- SDK version(e.g. pymilvus v2.0.0rc2): pymilvus-2.1.0.dev39
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

argo task: benchmark-backup-hpw72

test yaml: client-configmap:client-random-locust-compaction server-configmap:server-cluster-8c64m-compaction

server:

NAME                                                          READY   STATUS      RESTARTS   AGE     IP             NODE                      NOMINATED NODE   READINESS GATES
benchmark-backup-hpw72-1-etcd-0                               1/1     Running     0          39m     10.97.17.238   qa-node014.zilliz.local   <none>           <none>
benchmark-backup-hpw72-1-etcd-1                               1/1     Running     0          39m     10.97.16.65    qa-node013.zilliz.local   <none>           <none>
benchmark-backup-hpw72-1-etcd-2                               1/1     Running     0          39m     10.97.17.239   qa-node014.zilliz.local   <none>           <none>
benchmark-backup-hpw72-1-milvus-datacoord-64f99f8646-crb4q    1/1     Running     0          39m     10.97.18.218   qa-node017.zilliz.local   <none>           <none>
benchmark-backup-hpw72-1-milvus-datanode-6f4b555d4f-h8dzk     1/1     Running     0          39m     10.97.20.74    qa-node018.zilliz.local   <none>           <none>
benchmark-backup-hpw72-1-milvus-indexcoord-77bbf5ccf9-j459b   1/1     Running     0          39m     10.97.18.217   qa-node017.zilliz.local   <none>           <none>
benchmark-backup-hpw72-1-milvus-indexnode-55f84ff6b4-4qmkn    1/1     Running     0          39m     10.97.20.75    qa-node018.zilliz.local   <none>           <none>
benchmark-backup-hpw72-1-milvus-proxy-559878747c-c25tp        1/1     Running     0          39m     10.97.5.20     qa-node003.zilliz.local   <none>           <none>
benchmark-backup-hpw72-1-milvus-querycoord-59d5bf964-dgtdx    1/1     Running     0          39m     10.97.18.219   qa-node017.zilliz.local   <none>           <none>
benchmark-backup-hpw72-1-milvus-querynode-77b577f786-rprjw    1/1     Running     0          39m     10.97.19.8     qa-node016.zilliz.local   <none>           <none>
benchmark-backup-hpw72-1-milvus-rootcoord-784c89554b-96fqd    1/1     Running     0          39m     10.97.5.19     qa-node003.zilliz.local   <none>           <none>
benchmark-backup-hpw72-1-minio-0                              1/1     Running     0          39m     10.97.19.9     qa-node016.zilliz.local   <none>           <none>
benchmark-backup-hpw72-1-minio-1                              1/1     Running     0          39m     10.97.12.116   qa-node015.zilliz.local   <none>           <none>
benchmark-backup-hpw72-1-minio-2                              1/1     Running     0          39m     10.97.12.118   qa-node015.zilliz.local   <none>           <none>
benchmark-backup-hpw72-1-minio-3                              1/1     Running     0          39m     10.97.16.66    qa-node013.zilliz.local   <none>           <none>
benchmark-backup-hpw72-1-pulsar-bookie-0                      1/1     Running     0          39m     10.97.12.121   qa-node015.zilliz.local   <none>           <none>
benchmark-backup-hpw72-1-pulsar-bookie-1                      1/1     Running     0          39m     10.97.17.242   qa-node014.zilliz.local   <none>           <none>
benchmark-backup-hpw72-1-pulsar-bookie-2                      1/1     Running     0          39m     10.97.11.168   qa-node009.zilliz.local   <none>           <none>
benchmark-backup-hpw72-1-pulsar-bookie-init-9r2zz             0/1     Completed   0          39m     10.97.17.227   qa-node014.zilliz.local   <none>           <none>
benchmark-backup-hpw72-1-pulsar-broker-0                      1/1     Running     0          39m     10.97.17.235   qa-node014.zilliz.local   <none>           <none>
benchmark-backup-hpw72-1-pulsar-proxy-0                       1/1     Running     0          39m     10.97.17.236   qa-node014.zilliz.local   <none>           <none>
benchmark-backup-hpw72-1-pulsar-pulsar-init-9v49h             0/1     Completed   0          39m     10.97.17.226   qa-node014.zilliz.local   <none>           <none>
benchmark-backup-hpw72-1-pulsar-recovery-0                    1/1     Running     0          39m     10.97.17.233   qa-node014.zilliz.local   <none>           <none>
benchmark-backup-hpw72-1-pulsar-zookeeper-0                   1/1     Running     0          39m     10.97.5.21     qa-node003.zilliz.local   <none>           <none>
benchmark-backup-hpw72-1-pulsar-zookeeper-1                   1/1     Running     0          38m     10.97.18.221   qa-node017.zilliz.local   <none>           <none>
benchmark-backup-hpw72-1-pulsar-zookeeper-2                   1/1     Running     0          37m     10.97.5.23     qa-node003.zilliz.local   <none>           <none>

client pod: benchmark-backup-hpw72-2114025457

locust_report_2022-05-13_979.log

client log:

[2022-05-13 09:50:03,125] [   ERROR] - grpc RpcError: [_execute_search_requests], <_MultiThreadedRendezvous: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2022-05-13 09:49:03.124009', 'gRPC error': '2022-05-13 09:50:03.125469'}> (pymilvus.decorators:81)
[2022-05-13 09:50:03,126] [   ERROR] - grpc RpcError: [search], <_MultiThreadedRendezvous: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2022-05-13 09:49:03.117607', 'gRPC error': '2022-05-13 09:50:03.125993'}> (pymilvus.decorators:81)
[2022-05-13 09:51:03,133] [   ERROR] - grpc RpcError: [_execute_search_requests], <_MultiThreadedRendezvous: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2022-05-13 09:50:03.131939', 'gRPC error': '2022-05-13 09:51:03.133210'}> (pymilvus.decorators:81)
[2022-05-13 09:51:03,133] [   ERROR] - grpc RpcError: [search], <_MultiThreadedRendezvous: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2022-05-13 09:50:03.126620', 'gRPC error': '2022-05-13 09:51:03.133532'}> (pymilvus.decorators:81)
[2022-05-13 09:52:03,142] [   ERROR] - grpc RpcError: [_execute_search_requests], <_MultiThreadedRendezvous: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2022-05-13 09:51:03.141215', 'gRPC error': '2022-05-13 09:52:03.142180'}> (pymilvus.decorators:81)
[2022-05-13 09:52:03,142] [   ERROR] - grpc RpcError: [search], <_MultiThreadedRendezvous: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2022-05-13 09:51:03.134110', 'gRPC error': '2022-05-13 09:52:03.142481'}> (pymilvus.decorators:81)
[2022-05-13 09:53:03,149] [   ERROR] - grpc RpcError: [_execute_search_requests], <_MultiThreadedRendezvous: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2022-05-13 09:52:03.148507', 'gRPC error': '2022-05-13 09:53:03.149408'}> (pymilvus.decorators:81)
[2022-05-13 09:53:03,149] [   ERROR] - grpc RpcError: [search], <_MultiThreadedRendezvous: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2022-05-13 09:52:03.143086', 'gRPC error': '2022-05-13 09:53:03.149743'}> (pymilvus.decorators:81)
[2022-05-13 09:54:03,157] [   ERROR] - grpc RpcError: [_execute_search_requests], <_MultiThreadedRendezvous: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2022-05-13 09:53:03.155895', 'gRPC error': '2022-05-13 09:54:03.157084'}> (pymilvus.decorators:81)
[2022-05-13 09:54:03,157] [   ERROR] - grpc RpcError: [search], <_MultiThreadedRendezvous: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2022-05-13 09:53:03.150364', 'gRPC error': '2022-05-13 09:54:03.157360'}> (pymilvus.decorators:81)
[2022-05-13 09:59:03,183] [   ERROR] - grpc RpcError: [_execute_search_requests], <_MultiThreadedRendezvous: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2022-05-13 09:54:03.180251', 'gRPC error': '2022-05-13 09:59:03.183795'}> (pymilvus.decorators:81)
[2022-05-13 09:59:03,184] [   ERROR] - grpc RpcError: [search], <_MultiThreadedRendezvous: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2022-05-13 09:54:03.161304', 'gRPC error': '2022-05-13 09:59:03.184079'}> (pymilvus.decorators:81)
[2022-05-13 09:59:03,184] [   ERROR] - grpc RpcError: [query], <_MultiThreadedRendezvous: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2022-05-13 09:54:03.176755', 'gRPC error': '2022-05-13 09:59:03.184459'}> (pymilvus.decorators:81)
[2022-05-13 10:04:03,185] [   ERROR] - grpc RpcError: [_execute_search_requests], <_MultiThreadedRendezvous: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2022-05-13 09:59:03.186355', 'gRPC error': '2022-05-13 10:04:03.185349'}> (pymilvus.decorators:81)
[2022-05-13 10:04:03,185] [   ERROR] - grpc RpcError: [search], <_MultiThreadedRendezvous: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2022-05-13 09:54:03.177134', 'gRPC error': '2022-05-13 10:04:03.185616'}> (pymilvus.decorators:81)
[2022-05-13 10:04:03,186] [   ERROR] - grpc RpcError: [query], <_MultiThreadedRendezvous: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2022-05-13 09:54:03.176934', 'gRPC error': '2022-05-13 10:04:03.186000'}> (pymilvus.decorators:81)
[2022-05-13 10:08:57,787] [   ERROR] - grpc RpcError: [_execute_search_requests], <_MultiThreadedRendezvous: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2022-05-13 10:04:03.194410', 'gRPC error': '2022-05-13 10:08:57.787903'}> (pymilvus.decorators:81)
[2022-05-13 10:08:57,788] [   ERROR] - grpc RpcError: [search], <_MultiThreadedRendezvous: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2022-05-13 09:59:03.185953', 'gRPC error': '2022-05-13 10:08:57.788153'}> (pymilvus.decorators:81)
[2022-05-13 09:36:35,735] [   DEBUG] - Milvus flush run in 5.754s (milvus_benchmark.client:53)
[2022-05-13 09:36:42,265] [   DEBUG] - Milvus flush run in 6.5257s (milvus_benchmark.client:53)
[2022-05-13 09:36:49,482] [   DEBUG] - Milvus flush run in 7.1028s (milvus_benchmark.client:53)
[2022-05-13 09:36:56,261] [   DEBUG] - Milvus flush run in 6.8103s (milvus_benchmark.client:53)
[2022-05-13 09:37:02,750] [   DEBUG] - Milvus flush run in 6.512s (milvus_benchmark.client:53)
[2022-05-13 09:37:09,626] [   DEBUG] - Milvus flush run in 6.8717s (milvus_benchmark.client:53)
[2022-05-13 09:37:17,254] [   DEBUG] - Milvus flush run in 7.5172s (milvus_benchmark.client:53)
[2022-05-13 09:37:30,373] [   DEBUG] - Milvus flush run in 13.114s (milvus_benchmark.client:53)
[2022-05-13 09:37:48,964] [   DEBUG] - Milvus flush run in 17.0714s (milvus_benchmark.client:53)
[2022-05-13 09:38:01,543] [   DEBUG] - Milvus flush run in 24.8865s (milvus_benchmark.client:53)
[2022-05-13 09:38:01,546] [   DEBUG] - Milvus flush run in 24.7389s (milvus_benchmark.client:53)
[2022-05-13 09:38:14,033] [   DEBUG] - Milvus flush run in 30.994s (milvus_benchmark.client:53)
[2022-05-13 09:38:27,493] [   DEBUG] - Milvus flush run in 38.4083s (milvus_benchmark.client:53)
[2022-05-13 09:38:34,883] [   DEBUG] - Milvus flush run in 39.6026s (milvus_benchmark.client:53)
[2022-05-13 09:38:40,944] [   DEBUG] - Milvus flush run in 39.3927s (milvus_benchmark.client:53)
[2022-05-13 09:39:00,373] [   DEBUG] - Milvus flush run in 52.5873s (milvus_benchmark.client:53)
[2022-05-13 09:39:06,871] [   DEBUG] - Milvus flush run in 52.6703s (milvus_benchmark.client:53)
[2022-05-13 09:39:24,568] [   DEBUG] - Milvus flush run in 49.681s (milvus_benchmark.client:53)
[2022-05-13 09:39:24,572] [   DEBUG] - Milvus flush run in 49.6859s (milvus_benchmark.client:53)
[2022-05-13 09:40:01,422] [   DEBUG] - Milvus flush run in 80.5371s (milvus_benchmark.client:53)
[2022-05-13 09:40:01,425] [   DEBUG] - Milvus flush run in 80.3503s (milvus_benchmark.client:53)
[2022-05-13 09:40:21,634] [   DEBUG] - Milvus flush run in 93.8664s (milvus_benchmark.client:53)
[2022-05-13 09:41:09,958] [   DEBUG] - Milvus flush run in 129.6856s (milvus_benchmark.client:53)
[2022-05-13 09:41:10,444] [   DEBUG] - Milvus flush run in 128.8304s (milvus_benchmark.client:53)
[2022-05-13 09:41:40,440] [   DEBUG] - Milvus flush run in 154.0377s (milvus_benchmark.client:53)
[2022-05-13 09:42:10,939] [   DEBUG] - Milvus flush run in 175.2927s (milvus_benchmark.client:53)
[2022-05-13 09:42:45,034] [   DEBUG] - Milvus flush run in 200.3226s (milvus_benchmark.client:53)
[2022-05-13 09:43:22,931] [   DEBUG] - Milvus flush run in 231.9771s (milvus_benchmark.client:53)
[2022-05-13 09:44:03,428] [   DEBUG] - Milvus flush run in 263.1046s (milvus_benchmark.client:53)
[2022-05-13 09:44:45,771] [   DEBUG] - Milvus flush run in 298.2958s (milvus_benchmark.client:53)
[2022-05-13 09:46:22,541] [   DEBUG] - Milvus flush run in 381.8169s (milvus_benchmark.client:53)
[2022-05-13 09:46:24,006] [   DEBUG] - Milvus flush run in 382.93s (milvus_benchmark.client:53)
[2022-05-13 09:47:14,939] [   DEBUG] - Milvus flush run in 426.8938s (milvus_benchmark.client:53)
[2022-05-13 09:48:09,220] [   DEBUG] - Milvus flush run in 474.5191s (milvus_benchmark.client:53)
[2022-05-13 09:49:03,110] [   DEBUG] - Milvus flush run in 521.4714s (milvus_benchmark.client:53)
[2022-05-13 09:54:03,161] [   DEBUG] - Milvus flush run in 814.5621s (milvus_benchmark.client:53)
[2022-05-13 09:54:03,168] [   DEBUG] - Milvus flush run in 808.3961s (milvus_benchmark.client:53)
[2022-05-13 09:54:03,172] [   DEBUG] - Milvus flush run in 801.177s (milvus_benchmark.client:53)
[2022-05-13 10:04:03,194] [   DEBUG] - Milvus flush run in 1270.8543s (milvus_benchmark.client:53)
[2022-05-13 10:04:03,197] [   DEBUG] - Milvus flush run in 1365.4799s (milvus_benchmark.client:53)
[2022-05-13 10:04:03,201] [   DEBUG] - Milvus flush run in 1334.3907s (milvus_benchmark.client:53)
[2022-05-13 10:04:03,204] [   DEBUG] - Milvus flush run in 1305.8575s (milvus_benchmark.client:53)
[2022-05-13 10:04:03,207] [   DEBUG] - Milvus flush run in 1359.403s (milvus_benchmark.client:53)
[2022-05-13 10:04:03,211] [   DEBUG] - Milvus flush run in 1374.7734s (milvus_benchmark.client:53)
[2022-05-13 10:04:03,215] [   DEBUG] - Milvus flush run in 1388.1768s (milvus_benchmark.client:53)
[2022-05-13 10:04:03,219] [   DEBUG] - Milvus flush run in 1381.4467s (milvus_benchmark.client:53)
[2022-05-13 10:04:03,223] [   DEBUG] - Milvus flush run in 1387.9407s (milvus_benchmark.client:53)
[2022-05-13 10:09:13,611] [   DEBUG] - Milvus flush run in 1296.0893s (milvus_benchmark.client:53)
[2022-05-13 10:09:13,617] [   DEBUG] - Milvus flush run in 1544.3275s (milvus_benchmark.client:53)
[2022-05-13 10:09:13,623] [   DEBUG] - Milvus flush run in 1252.7102s (milvus_benchmark.client:53)
[2022-05-13 10:09:13,629] [   DEBUG] - Milvus flush run in 1356.4952s (milvus_benchmark.client:53)
[2022-05-13 10:09:13,634] [   DEBUG] - Milvus flush run in 1458.0787s (milvus_benchmark.client:53)
[2022-05-13 10:09:13,639] [   DEBUG] - Milvus flush run in 1503.0927s (milvus_benchmark.client:53)
[2022-05-13 10:09:13,643] [   DEBUG] - Milvus flush run in 1356.4098s (milvus_benchmark.client:53)

Expected Behavior

No response

Steps To Reproduce

1、create collection
2、build index of ivf_sq8
3、insert one hundred thousand vectors
4、flush collection
5、build index with the same params
6、load collection
7、locust concurrency: query<-search、load、get<-query、scene_insert_delete_flush 《- raise error

Milvus Log

No response

Anything else?

server-cluster-8c64m-compaction:

    server:
      server_tag: "8c64m"
    milvus:
      deploy_mode: "cluster"
    server_resource:
      dataCoordinator:
        compaction: 
          retentionDuration: 1800

client-random-locust-compaction:

locust_random_performance:
      collections:
        -
          collection_name: sift_10w_128_l2
          ni_per: 50000
          other_fields: float1
          build_index: true
          index_type: ivf_sq8
          index_param:
            nlist: 2048
          task:
            types:
              -
                type: query
                weight: 20
                params:
                  top_k: 10
                  nq: 10
                  search_param:
                    nprobe: 16
                  filters:
                    -
                      range: "{'range': {'float1': {'GT': -1.0, 'LT': collection_size * 0.5}}}"
              -
                type: load
                weight: 1
              -
                type: get
                weight: 10
                params:
                  ids_length: 10
              -
                type: scene_insert_delete_flush
                weight: 1
            connection_num: 1
            clients_num: 20
            spawn_rate: 2
            during_time: 30m
    @time_wrapper
    def scene_insert_delete_flush(self, collection_name=None, vectors=None, ids=None):
        if collection_name is None:
            collection_name = self._collection_name

        logger.debug("[scene_insert_delete_flush] Start scene_insert_delete_flush : %s" % collection_name)
        collection_info = self.get_info(collection_name)

        entities = utils.generate_entities(collection_info, vectors, ids)
        logger.debug("[scene_insert_delete_flush] Start insert : %s" % collection_name)
        self.insert(entities, collection_name=collection_name)

        logger.debug("[scene_insert_delete_flush] Start delete : %s" % collection_name)
        self.delete(expr="id in {}".format(ids[:30000]), collection_name=collection_name, timeout=300)

        logger.debug("[scene_insert_delete_flush] Start flush : %s" % collection_name)
        self.flush(collection_name=collection_name)
        logger.debug("[scene_insert_delete_flush] Finished scene_insert_delete_flush : %s" % collection_name)

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 24 (24 by maintainers)

Commits related to this issue

Most upvoted comments

argo benchmark-backup-66c7g server-configmap server-cluster-8c64m-compaction client-configmap client-random-locust-compaction master-20220601-63a31ccb pymilvus==2.1.0.dev67

[2022-06-01 07:00:39,506] [   DEBUG] - Milvus flush run in 531.876s (milvus_benchmark.client:54)
[2022-06-01 07:00:39,506] [   DEBUG] - [scene_insert_delete_flush] Finished scene_insert_delete_flush : sift_10w_128_l2 (milvus_benchmark.client:574)
[2022-06-01 07:00:39,509] [   DEBUG] - Milvus scene_insert_delete_flush run in 538.299s (milvus_benchmark.client:54)
[2022-06-01 07:01:39,524] [   ERROR] - grpc RpcError: [_execute_search_requests], <_MultiThreadedRendezvous: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2022-06-01 07:00:39.522885', 'gRPC error': '2022-06-01 07:01:39.524278'}> (pymilvus.decorators:86)
[2022-06-01 07:01:39,525] [   ERROR] - grpc RpcError: [search], <_MultiThreadedRendezvous: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2022-06-01 07:00:39.513383', 'gRPC error': '2022-06-01 07:01:39.525924'}> (pymilvus.decorators:86)
[2022-06-01 07:02:39,539] [   ERROR] - grpc RpcError: [_execute_search_requests], <_MultiThreadedRendezvous: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2022-06-01 07:01:39.538530', 'gRPC error': '2022-06-01 07:02:39.539050'}> (pymilvus.decorators:86)
[2022-06-01 07:02:39,539] [   ERROR] - grpc RpcError: [search], <_MultiThreadedRendezvous: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2022-06-01 07:01:39.527302', 'gRPC error': '2022-06-01 07:02:39.539705'}> (pymilvus.decorators:86)
[2022-06-01 07:07:39,546] [   ERROR] - grpc RpcError: [query], <_MultiThreadedRendezvous: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2022-06-01 07:02:39.541333', 'gRPC error': '2022-06-01 07:07:39.546336'}> (pymilvus.decorators:86)
[2022-06-01 07:08:39,554] [   ERROR] - grpc RpcError: [_execute_search_requests], <_MultiThreadedRendezvous: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2022-06-01 07:07:39.553384', 'gRPC error': '2022-06-01 07:08:39.554529'}> (pymilvus.decorators:86)
[2022-06-01 07:08:39,556] [   ERROR] - grpc RpcError: [search], <_MultiThreadedRendezvous: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2022-06-01 07:07:39.547262', 'gRPC error': '2022-06-01 07:08:39.556221'}> (pymilvus.decorators:86)
[2022-06-01 07:14:41,054] [   ERROR] - grpc RpcError: [query], <_MultiThreadedRendezvous: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2022-06-01 07:09:39.832003', 'gRPC error': '2022-06-01 07:14:41.054192'}> (pymilvus.decorators:86)
[2022-06-01 07:19:41,062] [   DEBUG] - Milvus flush run in 1626.0471s (milvus_benchmark.client:54)
[2022-06-01 07:19:41,063] [   DEBUG] - [scene_insert_delete_flush] Finished scene_insert_delete_flush : sift_10w_128_l2 (milvus_benchmark.client:574)
[2022-06-01 07:19:41,064] [   DEBUG] - Milvus scene_insert_delete_flush run in 1632.4253s (milvus_benchmark.client:54)
[2022-06-01 07:19:41,066] [   DEBUG] - Milvus flush run in 1598.1871s (milvus_benchmark.client:54)
[2022-06-01 07:19:41,067] [   DEBUG] - [scene_insert_delete_flush] Finished scene_insert_delete_flush : sift_10w_128_l2 (milvus_benchmark.client:574)
[2022-06-01 07:19:41,069] [   DEBUG] - Milvus scene_insert_delete_flush run in 1606.7004s (milvus_benchmark.client:54)
[2022-06-01 07:19:41,071] [   DEBUG] - Milvus flush run in 1618.6452s (milvus_benchmark.client:54)
[2022-06-01 07:19:41,072] [   DEBUG] - [scene_insert_delete_flush] Finished scene_insert_delete_flush : sift_10w_128_l2 (milvus_benchmark.client:574)
[2022-06-01 07:19:41,074] [   DEBUG] - Milvus scene_insert_delete_flush run in 1624.9866s (milvus_benchmark.client:54)
[2022-06-01 07:19:41,076] [   DEBUG] - Milvus flush run in 1608.7748s (milvus_benchmark.client:54)
[2022-06-01 07:19:41,076] [   DEBUG] - [scene_insert_delete_flush] Finished scene_insert_delete_flush : sift_10w_128_l2 (milvus_benchmark.client:574)
[2022-06-01 07:19:41,078] [   DEBUG] - Milvus scene_insert_delete_flush run in 1616.3535s (milvus_benchmark.client:54)
[2022-06-01 07:19:41,080] [   DEBUG] - Milvus insert run in 601.2245s (milvus_benchmark.client:54)
[2022-06-01 07:19:41,080] [   DEBUG] - [scene_insert_delete_flush] Start delete : sift_10w_128_l2 (milvus_benchmark.client:569)
[2022-06-01 07:19:41,083] [   ERROR] - grpc RpcError: [_execute_search_requests], <_MultiThreadedRendezvous: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2022-06-01 07:14:39.847173', 'gRPC error': '2022-06-01 07:19:41.083693'}> (pymilvus.decorators:86)
[2022-06-01 07:19:41,084] [   ERROR] - grpc RpcError: [search], <_MultiThreadedRendezvous: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2022-06-01 07:09:39.835141', 'gRPC error': '2022-06-01 07:19:41.084021'}> (pymilvus.decorators:86)
[2022-06-01 07:19:41,084] [   ERROR] - grpc RpcError: [_execute_search_requests], <_MultiThreadedRendezvous: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2022-06-01 07:14:41.046662', 'gRPC error': '2022-06-01 07:19:41.084752'}> (pymilvus.decorators:86)
[2022-06-01 07:19:41,085] [   ERROR] - grpc RpcError: [search], <_MultiThreadedRendezvous: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2022-06-01 07:09:39.834858', 'gRPC error': '2022-06-01 07:19:41.085223'}> (pymilvus.decorators:86)
[2022-06-01 07:19:41,085] [   ERROR] - grpc RpcError: [_execute_search_requests], <_MultiThreadedRendezvous: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2022-06-01 07:14:41.055295', 'gRPC error': '2022-06-01 07:19:41.085890'}> (pymilvus.decorators:86)
[2022-06-01 07:19:41,086] [   ERROR] - grpc RpcError: [search], <_MultiThreadedRendezvous: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2022-06-01 07:09:39.835013', 'gRPC error': '2022-06-01 07:19:41.086403'}> (pymilvus.decorators:86)
[2022-06-01 07:19:41,086] [   ERROR] - grpc RpcError: [query], <_MultiThreadedRendezvous: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2022-06-01 07:14:39.846290', 'gRPC error': '2022-06-01 07:19:41.086903'}> (pymilvus.decorators:86)
[2022-06-01 07:19:41,089] [   DEBUG] - [scene_insert_delete_flush] Start scene_insert_delete_flush : sift_10w_128_l2 (milvus_benchmark.client:562)
[2022-06-01 07:19:41,090] [   ERROR] - grpc RpcError: [has_collection], <_MultiThreadedRendezvous: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2022-06-01 07:14:41.055913', 'gRPC error': '2022-06-01 07:19:41.090706'}> (pymilvus.decorators:86)
[2022-06-01 07:19:41,207] [   DEBUG] - Milvus delete run in 0.1245s (milvus_benchmark.client:54)
[2022-06-01 07:19:41,208] [   DEBUG] - [scene_insert_delete_flush] Start flush : sift_10w_128_l2 (milvus_benchmark.client:572)
[2022-06-01 07:19:41,226] [   DEBUG] - Milvus get_info run in 0.1359s (milvus_benchmark.client:54)
[2022-06-01 07:19:41,231] [   DEBUG] - [scene_insert_delete_flush] Start insert : sift_10w_128_l2 (milvus_benchmark.client:566)

NAME                                                              READY   STATUS      RESTARTS   AGE     IP             NODE                      NOMINATED NODE   READINESS GATES
benchmark-backup-66c7g-1-etcd-0                                   1/1     Running     0          6m7s    10.97.16.170   qa-node013.zilliz.local   <none>           <none>
benchmark-backup-66c7g-1-etcd-1                                   1/1     Running     0          6m6s    10.97.17.200   qa-node014.zilliz.local   <none>           <none>
benchmark-backup-66c7g-1-etcd-2                                   1/1     Running     0          6m6s    10.97.16.174   qa-node013.zilliz.local   <none>           <none>
benchmark-backup-66c7g-1-milvus-datacoord-74bc75856b-cp945        1/1     Running     0          6m6s    10.97.4.96     qa-node002.zilliz.local   <none>           <none>
benchmark-backup-66c7g-1-milvus-datanode-6bf787f74c-dkd56         1/1     Running     0          6m6s    10.97.10.23    qa-node008.zilliz.local   <none>           <none>
benchmark-backup-66c7g-1-milvus-indexcoord-7cb869b669-mf9c4       1/1     Running     0          6m7s    10.97.4.95     qa-node002.zilliz.local   <none>           <none>
benchmark-backup-66c7g-1-milvus-indexnode-c64b6c68d-lqh87         1/1     Running     0          6m7s    10.97.20.225   qa-node018.zilliz.local   <none>           <none>
benchmark-backup-66c7g-1-milvus-proxy-5dc8fcd585-dl6g2            1/1     Running     0          6m7s    10.97.4.97     qa-node002.zilliz.local   <none>           <none>
benchmark-backup-66c7g-1-milvus-querycoord-57987cccbb-v9n4m       1/1     Running     0          6m7s    10.97.4.94     qa-node002.zilliz.local   <none>           <none>
benchmark-backup-66c7g-1-milvus-querynode-6b745654b-49k4x         1/1     Running     0          6m7s    10.97.12.146   qa-node015.zilliz.local   <none>           <none>
benchmark-backup-66c7g-1-milvus-rootcoord-8b6c55f4b-8zxts         1/1     Running     0          6m7s    10.97.5.193    qa-node003.zilliz.local   <none>           <none>
benchmark-backup-66c7g-1-minio-0                                  1/1     Running     0          6m7s    10.97.12.148   qa-node015.zilliz.local   <none>           <none>
benchmark-backup-66c7g-1-minio-1                                  1/1     Running     0          6m6s    10.97.12.153   qa-node015.zilliz.local   <none>           <none>
benchmark-backup-66c7g-1-minio-2                                  1/1     Running     0          6m6s    10.97.12.155   qa-node015.zilliz.local   <none>           <none>
benchmark-backup-66c7g-1-minio-3                                  1/1     Running     0          6m6s    10.97.19.200   qa-node016.zilliz.local   <none>           <none>
benchmark-backup-66c7g-1-pulsar-bookie-0                          1/1     Running     0          6m6s    10.97.12.158   qa-node015.zilliz.local   <none>           <none>
benchmark-backup-66c7g-1-pulsar-bookie-1                          1/1     Running     0          6m5s    10.97.9.35     qa-node007.zilliz.local   <none>           <none>
benchmark-backup-66c7g-1-pulsar-bookie-2                          1/1     Running     0          6m4s    10.97.11.141   qa-node009.zilliz.local   <none>           <none>
benchmark-backup-66c7g-1-pulsar-bookie-init-7nglt                 0/1     Completed   0          6m7s    10.97.9.30     qa-node007.zilliz.local   <none>           <none>
benchmark-backup-66c7g-1-pulsar-broker-0                          1/1     Running     0          6m7s    10.97.3.152    qa-node001.zilliz.local   <none>           <none>
benchmark-backup-66c7g-1-pulsar-proxy-0                           1/1     Running     0          6m6s    10.97.18.173   qa-node017.zilliz.local   <none>           <none>
benchmark-backup-66c7g-1-pulsar-pulsar-init-kqgnd                 0/1     Completed   0          6m7s    10.97.9.31     qa-node007.zilliz.local   <none>           <none>
benchmark-backup-66c7g-1-pulsar-recovery-0                        1/1     Running     0          6m7s    10.97.5.192    qa-node003.zilliz.local   <none>           <none>
benchmark-backup-66c7g-1-pulsar-zookeeper-0                       1/1     Running     0          6m7s    10.97.9.32     qa-node007.zilliz.local   <none>           <none>
benchmark-backup-66c7g-1-pulsar-zookeeper-1                       1/1     Running     0          5m18s   10.97.5.195    qa-node003.zilliz.local   <none>           <none>
benchmark-backup-66c7g-1-pulsar-zookeeper-2                       1/1     Running     0          4m51s   10.97.3.154    qa-node001.zilliz.local   <none>           <none>

It just occurred to me by looking at this log that there are multiple threads running [scene_insert_delete_flush].

The client-side log tells us that there are at least 3 threads running [scene_insert_delete_flush] concurrently. As at least 3 [scene_insert_delete_flush] operations finished in one second: 07:14:41 while each of them took ~1600 sec.

Here’s my guess below. Theoretically, this issue might occur with 3 or more threads (please correct if I made a mistake)

ideal case (1 thread): time1: start [insert_delete_flush] and generated segments s1, s2. t2: flush done on s1, s2, start [insert_delete_flush] and generate segments s3, s4. t3: flush done on s3, s4, start [insert_delete_flush] and generate segments s5, s6.

ideal case (2 threads): time1: thread1 started [insert_delete_flush] and generated segments s1, s2. t2: thread2 started another [insert_delete_flush] and generated segments s3, s4. t3: flush done on s1, s2, thread1 started another [insert_delete_flush], but since flush is not done on s3, s4 the collection will end up with s3, s4, s5, s6 (that need to be flushed, where s5, s6 are newly inserted segments). t4: flush done on s3, s4, thread2 started another [insert_delete_flush], but since flush is not done on s5, s6, the collection will end up with s5, s6, s7, s8. t5: flush done on s3, s4, s5, s6, thread1 started another [insert_delete_flush], but since flush is not done on s7, s8 the collection will end up with s7, s8, s9, s10 t6: flush done on s5, s6, s7, s8, thread2 started another [insert_delete_flush], but since flush is not done on s9, s10, the collection will end up with s9, s10, s11, s12.

possible bad case (with 3 or more threads): time1: thread1 started [insert_delete_flush] and generated segments s1, s2. t2: thread2 started another [insert_delete_flush] and generated segments s3, s4. t3: thread3 started another [insert_delete_flush] and generated segments s5, s6. t4: flush done on s1, s2, thread1 started another [insert_delete_flush], but since flush is not done on s3, s4, s5, s6 the collection will end up with s3, s4, s5, s6, s7, s8 (that need to be flushed, where s7, s8 are newly inserted segments). t5: flush done on s3, s4, thread2 started another [insert_delete_flush], but since flush is not done on s4, s5, s6, s7, the collection will end up with s5, s6, s7, s8, s9, s10 . t6: flush done on s5, s6, thread3 started another [insert_delete_flush], but since flush is not done on s7, s8, s9, s10, the collection will end up with s7, s9, s10, s10, s11, s12. t7: flush done on s3, s4, s5, s6, thread1 started another [insert_delete_flush], but since flush is not done on s7, s8, s9, s10, s11, s12, the collection will end up with s7, s8, s9, s10, s11, s12, s13, s14.

You see # of segments can possible pile up with 3 or more threads. You could then end up with more and more segments to flush by each [insert_delete_flush] loop.

Can you help check if multiple client threads could lead to this issue and suggest how we may resolve this? @wayblink

This sounds reasonable and match what we got——the segments number keeps growing. And I found flush time for each segment grows as well(about 1min30s max). I will do some test to approve this. It can explain all If the flush speed and flushing segment number has an inverse relationship. But what is the bottleneck is still unclear. The CPU, memory usage is not very large.

@wayblink any updates on this? @XuanYang-cn might be the right person who can give some help

Yeah, I have done some tests last week. The result shows that this issue no longer exist in our latest master branch. The flush time are now in a reasonable range (in 10s~) and no error occurs. I will double check it today