milvus: [Bug]: [benchmark] diskann index inserts 100 million data, querynode disk usage peaks at over 100G

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:2.2.0-20230626-eac54cbb
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):    pulsar
- SDK version(e.g. pymilvus v2.0.0rc2): pymilvus==2.4.0.dev36
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

argo task : fouramf-concurrent-n5lrq, id : 2 case: test_concurrent_locust_100m_diskann_ddl_dql_filter_cluster This is a frequently run test case, which was available in previous versions by testing.

server:

fouram-45-7069-etcd-0                                             1/1     Running       0               6m4s    10.104.4.106    4am-node11   <none>           <none>
fouram-45-7069-etcd-1                                             1/1     Running       0               6m4s    10.104.20.203   4am-node22   <none>           <none>
fouram-45-7069-etcd-2                                             1/1     Running       0               6m3s    10.104.15.13    4am-node20   <none>           <none>
fouram-45-7069-milvus-datacoord-7685b67fc-pl6r5                   1/1     Running       1 (2m3s ago)    6m4s    10.104.4.87     4am-node11   <none>           <none>
fouram-45-7069-milvus-datanode-f87b86d88-n4xwz                    1/1     Running       1 (2m4s ago)    6m4s    10.104.21.17    4am-node24   <none>           <none>
fouram-45-7069-milvus-indexcoord-79b9795579-jzl68                 1/1     Running       1 (2m3s ago)    6m4s    10.104.9.239    4am-node14   <none>           <none>
fouram-45-7069-milvus-indexnode-86c4d777c4-q4brg                  1/1     Running       0               6m4s    10.104.9.238    4am-node14   <none>           <none>
fouram-45-7069-milvus-proxy-78d5df4cdc-27znx                      1/1     Running       1 (2m3s ago)    6m4s    10.104.4.88     4am-node11   <none>           <none>
fouram-45-7069-milvus-querycoord-7cb6c4ddb8-wstvc                 1/1     Running       1 (2m3s ago)    6m4s    10.104.4.89     4am-node11   <none>           <none>
fouram-45-7069-milvus-querynode-867596d85b-hk6rz                  1/1     Running       0               6m4s    10.104.6.50     4am-node13   <none>           <none>
fouram-45-7069-milvus-rootcoord-d7c486488-kqwhq                   1/1     Running       1 (2m3s ago)    6m4s    10.104.9.240    4am-node14   <none>           <none>
fouram-45-7069-minio-0                                            1/1     Running       0               6m4s    10.104.6.54     4am-node13   <none>           <none>
fouram-45-7069-minio-1                                            1/1     Running       0               6m4s    10.104.4.104    4am-node11   <none>           <none>
fouram-45-7069-minio-2                                            1/1     Running       0               6m4s    10.104.16.227   4am-node21   <none>           <none>
fouram-45-7069-minio-3                                            1/1     Running       0               6m3s    10.104.20.205   4am-node22   <none>           <none>
fouram-45-7069-pulsar-bookie-0                                    1/1     Running       0               6m4s    10.104.4.103    4am-node11   <none>           <none>
fouram-45-7069-pulsar-bookie-1                                    1/1     Running       0               6m4s    10.104.15.11    4am-node20   <none>           <none>
fouram-45-7069-pulsar-bookie-2                                    1/1     Running       0               6m4s    10.104.16.230   4am-node21   <none>           <none>
fouram-45-7069-pulsar-bookie-init-shm2z                           0/1     Completed     0               6m4s    10.104.15.5     4am-node20   <none>           <none>
fouram-45-7069-pulsar-broker-0                                    1/1     Running       0               6m4s    10.104.15.6     4am-node20   <none>           <none>
fouram-45-7069-pulsar-proxy-0                                     1/1     Running       0               6m4s    10.104.16.225   4am-node21   <none>           <none>
fouram-45-7069-pulsar-pulsar-init-8jk8d                           0/1     Completed     0               6m4s    10.104.15.254   4am-node20   <none>           <none>
fouram-45-7069-pulsar-recovery-0                                  1/1     Running       0               6m4s    10.104.4.90     4am-node11   <none>           <none>
fouram-45-7069-pulsar-zookeeper-0                                 1/1     Running       0               6m4s    10.104.21.19    4am-node24   <none>           <none>
fouram-45-7069-pulsar-zookeeper-1                                 1/1     Running       0               4m57s   10.104.6.57     4am-node13   <none>           <none>
fouram-45-7069-pulsar-zookeeper-2                                 1/1     Running       0               4m20s   10.104.5.94     4am-node12   <none>           <none>

client log:

[2023-06-26 12:02:38,308 -  INFO - fouram]: [Base] Number of vectors in the collection(fouram_0dreLHRo): 99900000 (base.py:468)
[2023-06-26 12:02:38,459 -  INFO - fouram]: [Base] Start inserting, ids: 99950000 - 99999999, data size: 100,000,000 (base.py:308)
[2023-06-26 12:02:40,008 -  INFO - fouram]: [Time] Collection.insert run in 1.5493s (api_request.py:45)
[2023-06-26 12:02:40,011 -  INFO - fouram]: [Base] Number of vectors in the collection(fouram_0dreLHRo): 99900000 (base.py:468)
[2023-06-26 12:02:40,062 -  INFO - fouram]: [Base] Total time of insert: 3187.9628s, average number of vector bars inserted per second: 31367.9946, average time to insert 50000 vectors per time: 1.594s (base.py:379)
[2023-06-26 12:02:40,062 -  INFO - fouram]: [Base] Start flush collection fouram_0dreLHRo (base.py:277)
[2023-06-26 12:02:43,125 -  INFO - fouram]: [Base] Params of index: [{'float_vector': {'index_type': 'DISKANN', 'metric_type': 'L2', 'params': {}}}] (base.py:441)
[2023-06-26 12:02:43,125 -  INFO - fouram]: [Base] Start release collection fouram_0dreLHRo (base.py:288)
[2023-06-26 12:02:43,127 -  INFO - fouram]: [Base] Start build index of DISKANN for collection fouram_0dreLHRo, params:{'index_type': 'DISKANN', 'metric_type': 'L2', 'params': {}} (base.py:427)
[2023-06-26 17:34:27,390 -  INFO - fouram]: [Time] Index run in 19904.2613s (api_request.py:45)
[2023-06-26 17:34:27,391 -  INFO - fouram]: [CommonCases] RT of build index DISKANN: 19904.2613s (common_cases.py:96)
[2023-06-26 17:34:27,416 -  INFO - fouram]: [Base] Params of index: [{'float_vector': {'index_type': 'DISKANN', 'metric_type': 'L2', 'params': {}}}] (base.py:441)
[2023-06-26 17:34:27,416 -  INFO - fouram]: [CommonCases] Prepare index DISKANN done. (common_cases.py:99)
[2023-06-26 17:34:27,416 -  INFO - fouram]: [CommonCases] No scalars need to be indexed. (common_cases.py:107)
[2023-06-26 17:34:27,418 -  INFO - fouram]: [Base] Number of vectors in the collection(fouram_0dreLHRo): 100000000 (base.py:468)
[2023-06-26 17:34:27,418 -  INFO - fouram]: [Base] Start load collection fouram_0dreLHRo,replica_number:1,kwargs:{} (base.py:283)
[2023-06-26 18:51:04,491 - ERROR - fouram]: RPC error: [get_loading_progress], <MilvusException: (code=1, message=failed to load segment: follower 12 failed to load segment, reason load segment failed, disk space is not enough, collectionID = 442440457093644855, usedDiskAfterLoad = 100294 MB, totalDisk = 102400 MB, thresholdFactor = 0.950000)>, <Time:{'RPC start': '2023-06-26 18:51:04.489527', 'RPC error': '2023-06-26 18:51:04.491093'}> (decorators.py:108)
[2023-06-26 18:51:04,493 - ERROR - fouram]: RPC error: [wait_for_loading_collection], <MilvusException: (code=1, message=failed to load segment: follower 12 failed to load segment, reason load segment failed, disk space is not enough, collectionID = 442440457093644855, usedDiskAfterLoad = 100294 MB, totalDisk = 102400 MB, thresholdFactor = 0.950000)>, <Time:{'RPC start': '2023-06-26 17:34:27.474903', 'RPC error': '2023-06-26 18:51:04.493072'}> (decorators.py:108)
[2023-06-26 18:51:04,493 - ERROR - fouram]: RPC error: [load_collection], <MilvusException: (code=1, message=failed to load segment: follower 12 failed to load segment, reason load segment failed, disk space is not enough, collectionID = 442440457093644855, usedDiskAfterLoad = 100294 MB, totalDisk = 102400 MB, thresholdFactor = 0.950000)>, <Time:{'RPC start': '2023-06-26 17:34:27.418905', 'RPC error': '2023-06-26 18:51:04.493252'}> (decorators.py:108)
[2023-06-26 18:51:04,494 - ERROR - fouram]: (api_response) : <MilvusException: (code=1, message=failed to load segment: follower 12 failed to load segment, reason load segment failed, disk space is not enough, collectionID = 442440457093644855, usedDiskAfterLoad = 100294 MB, totalDisk = 102400 MB, thresholdFactor = 0.950000)> (api_request.py:53)
[2023-06-26 18:51:04,495 - ERROR - fouram]: [CheckFunc] load request check failed, response:<MilvusException: (code=1, message=failed to load segment: follower 12 failed to load segment, reason load segment failed, disk space is not enough, collectionID = 442440457093644855, usedDiskAfterLoad = 100294 MB, totalDisk = 102400 MB, thresholdFactor = 0.950000)> (func_check.py:52)
FAILED[

client pod : fouramf-concurrent-n5lrq-1120963268

Expected Behavior

load success.

Steps To Reproduce

1. create a collection or use an existing collection
        2. build index on vector column  => diskann
        3. insert a certain number of vectors   => 100m
        4. flush collection
        5. build index on vector column with the same parameters
        6. build index on on scalars column or not
        7. count the total number of rows
        8. load collection  ==> failed
       # 9. perform concurrent operations
       # 10. clean all collections or not

Milvus Log

No response

Anything else?

No response

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 23 (21 by maintainers)

Most upvoted comments

@xige-16 pls take a glance at it