milvus: [Bug]: [laion-1b] Search 100m-768d collection with 100 concurrent just gets 43 vps
Is there an existing issue for this?
- I have searched the existing issues
Environment
- Milvus version: master-20230818-74fb244b
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka): pulsar
- SDK version(e.g. pymilvus v2.0.0rc2): pymilvus 2.4.0.dev109
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
- resources and config
components:
proxy:
paused: false
replicas: 3
resources:
limits:
cpu: "4"
memory: 16Gi
requests:
cpu: "2"
memory: 8Gi
serviceType: ClusterIP
queryNode:
paused: false
replicas: 5
resources:
limits:
cpu: "16"
memory: 128Gi
requests:
cpu: 15500m
memory: 127Gi
config:
dataCoord:
segment:
expansionRate: 1.15
maxSize: 4096
sealProportion: 0.08
log:
level: debug
rootCoord:
dmlChannelNum: 16
- Test steps:
- create a collection with a int64 pk field and a vector field and other scalar field
Collection schema: {'auto_id': False, 'description': '', 'fields': [{'name': 'id', 'description': '', 'type': <DataType.INT64: 5>, 'is_primary': True, 'auto_id': False}, {'name': 'float_vector', 'description': '', 'type': <DataType.FLOAT_VECTOR: 101>, 'params': {'dim': 768}}, {'name': 'varchar_caption', 'description': '', 'type': <DataType.VARCHAR: 21>, 'params': {'max_length': 256}}, {'name': 'varchar_NSFW', 'description': '', 'type': <DataType.VARCHAR: 21>, 'params': {'max_length': 256}}, {'name': 'float64_similarity', 'description': '', 'type': <DataType.FLOAT: 10>}, {'name': 'int64_width', 'description': '', 'type': <DataType.INT64: 5>}, {'name': 'int64_height', 'description': '', 'type': <DataType.INT64: 5>}, {'name': 'int64_original_width', 'description': '', 'type': <DataType.INT64: 5>}, {'name': 'int64_original_height', 'description': '', 'type': <DataType.INT64: 5>}, {'name': 'varchar_md5', 'description': '', 'type': <DataType.VARCHAR: 21>, 'params': {'max_length': 256}}]}
- create hnsw index with params: {‘index_type’: ‘HNSW’, ‘metric_type’: ‘COSINE’, ‘params’: {‘M’: 30, ‘efConstruction’: 360}}
- insert 100m-768d data
- create same index again
- load collection with 1 replica
- concurrent search with params
'concurrent_params': {'concurrent_number': 100,
'during_time': '2h',
'interval': 60,
'spawn_rate': None},
'concurrent_tasks': [{'type': 'search',
'weight': 10,
'params': {'nq': 10,
'top_k': 100,
'search_param': {'ef': 100},
'timeout': 6000}}]},
- client Test result
'search': {'Requests': 31053,
'Fails': 0,
'RPS': 4.32,
'fail_s': 0.0,
'RT_max': 44285.58,
'RT_avg': 22961.36,
'TP50': 22000.0,
'TP99': 39000.0}
-
server proxy vps
-
segments config segment.maxSize 4096 and expansionRate 1.15, but the compaction result doesn’t look good
[2023-08-22 02:54:51,807 - INFO - fouram]: [Base] Parser segment info:
{'segment_counts': 1053,
'segment_total_vectors': 100000000,
'max_segment_raw_count': 990077,
'min_segment_raw_count': 45173,
'avg_segment_raw_count': 94966.8,
'std_segment_raw_count': 59884.9,
'shards_num': 2,
'truncated_avg_segment_raw_count': 95057.1,
'truncated_std_segment_raw_count': 59905.9,
'top_percentile': [{'TP_10': 89727.0},
{'TP_20': 89817.0},
{'TP_30': 89891.0},
{'TP_40': 89950.0},
{'TP_50': 90001.0},
{'TP_60': 90054.2},
{'TP_70': 90115.0},
{'TP_80': 90190.0},
{'TP_90': 90281.0}]} (base.py:670)
- server metrics
Expected Behavior
No response
Steps To Reproduce
4am argo: https://argo-workflows.zilliz.cc/archived-workflows/qa/3f250c89-4cd4-4d8d-bc10-1d60fd980327?nodeId=laion-test-100m-4
grafana link: https://grafana-4am.zilliz.cc/d/uLf5cJ3Ga/milvus2-0?orgId=1&var-datasource=prometheus&var-cluster=&var-namespace=qa-milvus&var-instance=laion-test-3&var-collection=All&var-app_name=milvus&from=1692599896954&to=1692741479885
Milvus Log
pods:
laion-test-3-etcd-0 1/1 Running 0 42h 10.104.14.95 4am-node18 <none> <none>
laion-test-3-etcd-1 1/1 Running 0 42h 10.104.12.76 4am-node17 <none> <none>
laion-test-3-etcd-2 1/1 Running 0 42h 10.104.24.195 4am-node29 <none> <none>
laion-test-3-milvus-datanode-79bdc9c6c-4bs55 1/1 Running 0 42h 10.104.20.62 4am-node22 <none> <none>
laion-test-3-milvus-datanode-79bdc9c6c-b8scm 1/1 Running 0 42h 10.104.16.169 4am-node21 <none> <none>
laion-test-3-milvus-indexnode-f4c7dd98c-6lxc9 1/1 Running 0 42h 10.104.16.170 4am-node21 <none> <none>
laion-test-3-milvus-indexnode-f4c7dd98c-bd7pl 1/1 Running 0 42h 10.104.20.63 4am-node22 <none> <none>
laion-test-3-milvus-indexnode-f4c7dd98c-ftwxk 1/1 Running 0 42h 10.104.20.64 4am-node22 <none> <none>
laion-test-3-milvus-indexnode-f4c7dd98c-gctdb 1/1 Running 0 42h 10.104.15.122 4am-node20 <none> <none>
laion-test-3-milvus-mixcoord-d795878f7-m4k9h 1/1 Running 0 42h 10.104.16.171 4am-node21 <none> <none>
laion-test-3-milvus-proxy-646b4484fd-flr2s 1/1 Running 0 42h 10.104.16.168 4am-node21 <none> <none>
laion-test-3-milvus-proxy-646b4484fd-hrfc7 1/1 Running 0 42h 10.104.15.125 4am-node20 <none> <none>
laion-test-3-milvus-proxy-646b4484fd-qwc85 1/1 Running 0 41h 10.104.20.66 4am-node22 <none> <none>
laion-test-3-milvus-querynode-778d649d78-6dntl 1/1 Running 0 42h 10.104.15.123 4am-node20 <none> <none>
laion-test-3-milvus-querynode-778d649d78-886rt 1/1 Running 0 42h 10.104.15.124 4am-node20 <none> <none>
laion-test-3-milvus-querynode-778d649d78-shdx9 1/1 Running 0 42h 10.104.16.172 4am-node21 <none> <none>
laion-test-3-milvus-querynode-778d649d78-wln95 1/1 Running 0 42h 10.104.20.65 4am-node22 <none> <none>
laion-test-3-milvus-querynode-778d649d78-zxvth 1/1 Running 0 42h 10.104.16.174 4am-node21 <none> <none>
laion-test-3-minio-0 1/1 Running 0 42h 10.104.13.182 4am-node16 <none> <none>
laion-test-3-minio-1 1/1 Running 0 42h 10.104.14.97 4am-node18 <none> <none>
laion-test-3-minio-2 1/1 Running 0 42h 10.104.12.78 4am-node17 <none> <none>
laion-test-3-minio-3 1/1 Running 0 42h 10.104.1.2 4am-node10 <none> <none>
laion-test-3-pulsar-bookie-0 1/1 Running 0 42h 10.104.24.199 4am-node29 <none> <none>
laion-test-3-pulsar-bookie-1 1/1 Running 0 42h 10.104.1.3 4am-node10 <none> <none>
laion-test-3-pulsar-bookie-2 1/1 Running 0 42h 10.104.12.82 4am-node17 <none> <none>
laion-test-3-pulsar-bookie-init-jkh2k 0/1 Completed 0 42h 10.104.24.193 4am-node29 <none> <none>
laion-test-3-pulsar-broker-0 1/1 Running 0 42h 10.104.23.166 4am-node27 <none> <none>
laion-test-3-pulsar-proxy-0 1/1 Running 0 42h 10.104.13.180 4am-node16 <none> <none>
laion-test-3-pulsar-pulsar-init-wr8kc 0/1 Completed 0 42h 10.104.14.91 4am-node18 <none> <none>
laion-test-3-pulsar-recovery-0 1/1 Running 0 42h 10.104.14.93 4am-node18 <none> <none>
laion-test-3-pulsar-zookeeper-0 1/1 Running 0 42h 10.104.14.98 4am-node18 <none> <none>
laion-test-3-pulsar-zookeeper-1 1/1 Running 0 42h 10.104.12.89 4am-node17 <none> <none>
laion-test-3-pulsar-zookeeper-2 1/1 Running 1 (15h ago) 42h 10.104.13.185 4am-node16 <none> <none>
Anything else?
No response
About this issue
- Original URL
- State: closed
- Created 10 months ago
- Comments: 15 (14 by maintainers)
Commits related to this issue
- Fix timeout task never release queue See also: #26413, #26566 Signed-off-by: yangxuan <xuan.yang@zilliz.com> — committed to XuanYang-cn/milvus by XuanYang-cn 10 months ago
- Fix timeout task never release queue See also: #26413, #26566 pr: #26593 Signed-off-by: yangxuan <xuan.yang@zilliz.com> — committed to XuanYang-cn/milvus by XuanYang-cn 10 months ago
- Fix timeout task never release queue See also: #26413, #26566 Signed-off-by: yangxuan <xuan.yang@zilliz.com> — committed to XuanYang-cn/milvus by XuanYang-cn 10 months ago
- Fix timeout task never release queue See also: #26413, #26566 pr: #26593 Signed-off-by: yangxuan <xuan.yang@zilliz.com> — committed to XuanYang-cn/milvus by XuanYang-cn 10 months ago
- Fix timeout task never release queue See also: #26413, #26566 Signed-off-by: yangxuan <xuan.yang@zilliz.com> — committed to XuanYang-cn/milvus by XuanYang-cn 10 months ago
- Fix timeout task never release queue (#26593) See also: #26413, #26566 Signed-off-by: yangxuan <xuan.yang@zilliz.com> — committed to milvus-io/milvus by XuanYang-cn 10 months ago
- Fix timeout task never release queue (#26594) See also: #26413, #26566 pr: #26593 Signed-off-by: yangxuan <xuan.yang@zilliz.com> — committed to milvus-io/milvus by XuanYang-cn 10 months ago
- Change default compaction timeout to 15mins See also: #26566 Signed-off-by: yangxuan <xuan.yang@zilliz.com> — committed to XuanYang-cn/milvus by XuanYang-cn 10 months ago
- Change default compaction timeout to 15mins (#26757) See also: #26566 Signed-off-by: yangxuan <xuan.yang@zilliz.com> — committed to milvus-io/milvus by XuanYang-cn 10 months ago
What about the set the default timeout to 30 min?