milvus: [Bug]: Flush performance degrade for the collection created during chaos after datacoord pod recovered from pod kill
Is there an existing issue for this?
- I have searched the existing issues
Environment
- Milvus version: master-20220321-2078b24d
- Deployment mode(standalone or cluster):cluster
- SDK version(e.g. pymilvus v2.0.0rc2):2.0.2.dev5
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
Get collection entities cost 15.4309 seconds
check collection CreateChecker__Yfx24DUW
collection exists
Create collection...
Insert 3000 vectors cost 0.5203 seconds
Get collection entities...
3000
Get collection entities cost 15.4309 seconds
Expected Behavior
Get collection entities cost 3.7377 seconds
check collection Checker__m9FOw5zU
collection exists
Create collection...
Insert 3000 vectors cost 0.4555 seconds
Get collection entities...
8200
Get collection entities cost 3.7377 seconds
Steps To Reproduce
see https://github.com/milvus-io/milvus/runs/5632714245?check_suite_focus=true
Anything else?
failed job: https://github.com/milvus-io/milvus/runs/5632714245?check_suite_focus=true logs: https://github.com/milvus-io/milvus/suites/5742511778/artifacts/190470457
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 27 (23 by maintainers)
chaos type: pod-failure image tag: 2.1.0-20220921-a0ab90ea target pod: datacoord failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test-kafka-for-release/detail/chaos-test-kafka-for-release/596/pipeline log: artifacts-datacoord-pod-failure-596-pytest-logs.tar.gz artifacts-datacoord-pod-failure-596-server-logs.tar.gz
for datacoord pod kill, this issue still exists
Kafka as MQ, chaos type: pod-kill image tag: 2.1.0-20220913-3c3ba55 target pod: datacoord failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test-kafka-for-release/detail/chaos-test-kafka-for-release/426/pipeline log artifacts-datacoord-pod-kill-426-server-logs.tar.gz
artifacts-datacoord-pod-kill-426-pytest-logs.tar.gz
for the collection with the prefix CreateChecker, the flush time is much longer than other collections
Same for pulsar version pulsar, chaos type: pod-kill image tag: 2.1.0-20220913-3c3ba55 target pod: datacoord failed job:https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test-for-release/detail/chaos-test-for-release/540/pipeline