milvus: [Bug]: Flush performance degrade for the collection created during chaos after datacoord pod recovered from pod kill

Is there an existing issue for this?

I have searched the existing issues

Environment

- Milvus version: master-20220321-2078b24d
- Deployment mode(standalone or cluster):cluster
- SDK version(e.g. pymilvus v2.0.0rc2):2.0.2.dev5
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

Get collection entities cost 15.4309 seconds

check collection CreateChecker__Yfx24DUW
collection exists

Create collection...

Insert 3000 vectors cost 0.5203 seconds

Get collection entities...
3000

Get collection entities cost 15.4309 seconds

Expected Behavior

Get collection entities cost 3.7377 seconds

check collection Checker__m9FOw5zU
collection exists

Create collection...
Insert 3000 vectors cost 0.4555 seconds

Get collection entities...
8200

Get collection entities cost 3.7377 seconds

Steps To Reproduce

see https://github.com/milvus-io/milvus/runs/5632714245?check_suite_focus=true

Anything else?

failed job: https://github.com/milvus-io/milvus/runs/5632714245?check_suite_focus=true logs: https://github.com/milvus-io/milvus/suites/5742511778/artifacts/190470457

About this issue

Original URL
State: closed
Created 2 years ago
Comments: 27 (23 by maintainers)

Most upvoted comments

chaos type: pod-failure image tag: 2.1.0-20220921-a0ab90ea target pod: datacoord failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test-kafka-for-release/detail/chaos-test-kafka-for-release/596/pipeline log: artifacts-datacoord-pod-failure-596-pytest-logs.tar.gz artifacts-datacoord-pod-failure-596-server-logs.tar.gz

zhuwenxing on Sep 22, 2022

for datacoord pod kill, this issue still exists

Kafka as MQ, chaos type: pod-kill image tag: 2.1.0-20220913-3c3ba55 target pod: datacoord failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test-kafka-for-release/detail/chaos-test-kafka-for-release/426/pipeline log artifacts-datacoord-pod-kill-426-server-logs.tar.gz

artifacts-datacoord-pod-kill-426-pytest-logs.tar.gz

for the collection with the prefix CreateChecker, the flush time is much longer than other collections

Same for pulsar version pulsar, chaos type: pod-kill image tag: 2.1.0-20220913-3c3ba55 target pod: datacoord failed job:https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test-for-release/detail/chaos-test-for-release/540/pipeline

zhuwenxing on Sep 15, 2022