milvus: [Bug]: [benchmark] diskann index inserts 100 million data, querynode disk usage peaks at over 100G
Is there an existing issue for this?
- I have searched the existing issues
Environment
- Milvus version:2.2.0-20230626-eac54cbb
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka): pulsar
- SDK version(e.g. pymilvus v2.0.0rc2): pymilvus==2.4.0.dev36
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
argo task : fouramf-concurrent-n5lrq, id : 2 case: test_concurrent_locust_100m_diskann_ddl_dql_filter_cluster This is a frequently run test case, which was available in previous versions by testing.
server:
fouram-45-7069-etcd-0 1/1 Running 0 6m4s 10.104.4.106 4am-node11 <none> <none>
fouram-45-7069-etcd-1 1/1 Running 0 6m4s 10.104.20.203 4am-node22 <none> <none>
fouram-45-7069-etcd-2 1/1 Running 0 6m3s 10.104.15.13 4am-node20 <none> <none>
fouram-45-7069-milvus-datacoord-7685b67fc-pl6r5 1/1 Running 1 (2m3s ago) 6m4s 10.104.4.87 4am-node11 <none> <none>
fouram-45-7069-milvus-datanode-f87b86d88-n4xwz 1/1 Running 1 (2m4s ago) 6m4s 10.104.21.17 4am-node24 <none> <none>
fouram-45-7069-milvus-indexcoord-79b9795579-jzl68 1/1 Running 1 (2m3s ago) 6m4s 10.104.9.239 4am-node14 <none> <none>
fouram-45-7069-milvus-indexnode-86c4d777c4-q4brg 1/1 Running 0 6m4s 10.104.9.238 4am-node14 <none> <none>
fouram-45-7069-milvus-proxy-78d5df4cdc-27znx 1/1 Running 1 (2m3s ago) 6m4s 10.104.4.88 4am-node11 <none> <none>
fouram-45-7069-milvus-querycoord-7cb6c4ddb8-wstvc 1/1 Running 1 (2m3s ago) 6m4s 10.104.4.89 4am-node11 <none> <none>
fouram-45-7069-milvus-querynode-867596d85b-hk6rz 1/1 Running 0 6m4s 10.104.6.50 4am-node13 <none> <none>
fouram-45-7069-milvus-rootcoord-d7c486488-kqwhq 1/1 Running 1 (2m3s ago) 6m4s 10.104.9.240 4am-node14 <none> <none>
fouram-45-7069-minio-0 1/1 Running 0 6m4s 10.104.6.54 4am-node13 <none> <none>
fouram-45-7069-minio-1 1/1 Running 0 6m4s 10.104.4.104 4am-node11 <none> <none>
fouram-45-7069-minio-2 1/1 Running 0 6m4s 10.104.16.227 4am-node21 <none> <none>
fouram-45-7069-minio-3 1/1 Running 0 6m3s 10.104.20.205 4am-node22 <none> <none>
fouram-45-7069-pulsar-bookie-0 1/1 Running 0 6m4s 10.104.4.103 4am-node11 <none> <none>
fouram-45-7069-pulsar-bookie-1 1/1 Running 0 6m4s 10.104.15.11 4am-node20 <none> <none>
fouram-45-7069-pulsar-bookie-2 1/1 Running 0 6m4s 10.104.16.230 4am-node21 <none> <none>
fouram-45-7069-pulsar-bookie-init-shm2z 0/1 Completed 0 6m4s 10.104.15.5 4am-node20 <none> <none>
fouram-45-7069-pulsar-broker-0 1/1 Running 0 6m4s 10.104.15.6 4am-node20 <none> <none>
fouram-45-7069-pulsar-proxy-0 1/1 Running 0 6m4s 10.104.16.225 4am-node21 <none> <none>
fouram-45-7069-pulsar-pulsar-init-8jk8d 0/1 Completed 0 6m4s 10.104.15.254 4am-node20 <none> <none>
fouram-45-7069-pulsar-recovery-0 1/1 Running 0 6m4s 10.104.4.90 4am-node11 <none> <none>
fouram-45-7069-pulsar-zookeeper-0 1/1 Running 0 6m4s 10.104.21.19 4am-node24 <none> <none>
fouram-45-7069-pulsar-zookeeper-1 1/1 Running 0 4m57s 10.104.6.57 4am-node13 <none> <none>
fouram-45-7069-pulsar-zookeeper-2 1/1 Running 0 4m20s 10.104.5.94 4am-node12 <none> <none>
client log:
[2023-06-26 12:02:38,308 - INFO - fouram]: [Base] Number of vectors in the collection(fouram_0dreLHRo): 99900000 (base.py:468)
[2023-06-26 12:02:38,459 - INFO - fouram]: [Base] Start inserting, ids: 99950000 - 99999999, data size: 100,000,000 (base.py:308)
[2023-06-26 12:02:40,008 - INFO - fouram]: [Time] Collection.insert run in 1.5493s (api_request.py:45)
[2023-06-26 12:02:40,011 - INFO - fouram]: [Base] Number of vectors in the collection(fouram_0dreLHRo): 99900000 (base.py:468)
[2023-06-26 12:02:40,062 - INFO - fouram]: [Base] Total time of insert: 3187.9628s, average number of vector bars inserted per second: 31367.9946, average time to insert 50000 vectors per time: 1.594s (base.py:379)
[2023-06-26 12:02:40,062 - INFO - fouram]: [Base] Start flush collection fouram_0dreLHRo (base.py:277)
[2023-06-26 12:02:43,125 - INFO - fouram]: [Base] Params of index: [{'float_vector': {'index_type': 'DISKANN', 'metric_type': 'L2', 'params': {}}}] (base.py:441)
[2023-06-26 12:02:43,125 - INFO - fouram]: [Base] Start release collection fouram_0dreLHRo (base.py:288)
[2023-06-26 12:02:43,127 - INFO - fouram]: [Base] Start build index of DISKANN for collection fouram_0dreLHRo, params:{'index_type': 'DISKANN', 'metric_type': 'L2', 'params': {}} (base.py:427)
[2023-06-26 17:34:27,390 - INFO - fouram]: [Time] Index run in 19904.2613s (api_request.py:45)
[2023-06-26 17:34:27,391 - INFO - fouram]: [CommonCases] RT of build index DISKANN: 19904.2613s (common_cases.py:96)
[2023-06-26 17:34:27,416 - INFO - fouram]: [Base] Params of index: [{'float_vector': {'index_type': 'DISKANN', 'metric_type': 'L2', 'params': {}}}] (base.py:441)
[2023-06-26 17:34:27,416 - INFO - fouram]: [CommonCases] Prepare index DISKANN done. (common_cases.py:99)
[2023-06-26 17:34:27,416 - INFO - fouram]: [CommonCases] No scalars need to be indexed. (common_cases.py:107)
[2023-06-26 17:34:27,418 - INFO - fouram]: [Base] Number of vectors in the collection(fouram_0dreLHRo): 100000000 (base.py:468)
[2023-06-26 17:34:27,418 - INFO - fouram]: [Base] Start load collection fouram_0dreLHRo,replica_number:1,kwargs:{} (base.py:283)
[2023-06-26 18:51:04,491 - ERROR - fouram]: RPC error: [get_loading_progress], <MilvusException: (code=1, message=failed to load segment: follower 12 failed to load segment, reason load segment failed, disk space is not enough, collectionID = 442440457093644855, usedDiskAfterLoad = 100294 MB, totalDisk = 102400 MB, thresholdFactor = 0.950000)>, <Time:{'RPC start': '2023-06-26 18:51:04.489527', 'RPC error': '2023-06-26 18:51:04.491093'}> (decorators.py:108)
[2023-06-26 18:51:04,493 - ERROR - fouram]: RPC error: [wait_for_loading_collection], <MilvusException: (code=1, message=failed to load segment: follower 12 failed to load segment, reason load segment failed, disk space is not enough, collectionID = 442440457093644855, usedDiskAfterLoad = 100294 MB, totalDisk = 102400 MB, thresholdFactor = 0.950000)>, <Time:{'RPC start': '2023-06-26 17:34:27.474903', 'RPC error': '2023-06-26 18:51:04.493072'}> (decorators.py:108)
[2023-06-26 18:51:04,493 - ERROR - fouram]: RPC error: [load_collection], <MilvusException: (code=1, message=failed to load segment: follower 12 failed to load segment, reason load segment failed, disk space is not enough, collectionID = 442440457093644855, usedDiskAfterLoad = 100294 MB, totalDisk = 102400 MB, thresholdFactor = 0.950000)>, <Time:{'RPC start': '2023-06-26 17:34:27.418905', 'RPC error': '2023-06-26 18:51:04.493252'}> (decorators.py:108)
[2023-06-26 18:51:04,494 - ERROR - fouram]: (api_response) : <MilvusException: (code=1, message=failed to load segment: follower 12 failed to load segment, reason load segment failed, disk space is not enough, collectionID = 442440457093644855, usedDiskAfterLoad = 100294 MB, totalDisk = 102400 MB, thresholdFactor = 0.950000)> (api_request.py:53)
[2023-06-26 18:51:04,495 - ERROR - fouram]: [CheckFunc] load request check failed, response:<MilvusException: (code=1, message=failed to load segment: follower 12 failed to load segment, reason load segment failed, disk space is not enough, collectionID = 442440457093644855, usedDiskAfterLoad = 100294 MB, totalDisk = 102400 MB, thresholdFactor = 0.950000)> (func_check.py:52)
FAILED[
client pod : fouramf-concurrent-n5lrq-1120963268
Expected Behavior
load success.
Steps To Reproduce
1. create a collection or use an existing collection
2. build index on vector column => diskann
3. insert a certain number of vectors => 100m
4. flush collection
5. build index on vector column with the same parameters
6. build index on on scalars column or not
7. count the total number of rows
8. load collection ==> failed
# 9. perform concurrent operations
# 10. clean all collections or not
Milvus Log
No response
Anything else?
No response
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 23 (21 by maintainers)
@xige-16 pls take a glance at it