milvus: [Bug]: [chaos][rootcoord]Flush hang when running hello_milvus.py after rootcoord pod recovered from pod kill
Is there an existing issue for this?
- I have searched the existing issues
Environment
- Milvus version: master-20220309-5fdef607
- Deployment mode(standalone or cluster): cluster
- SDK version(e.g. pymilvus v2.0.0rc2): 2.0.2.dev5
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
hangs at flush
[2022/03/09 18:50:47.084 +00:00] [INFO] [impl.go:3925] ["received get flush state response"] [response="status:<> "]
[2022/03/09 18:50:47.587 +00:00] [INFO] [impl.go:3911] ["received get flush state request"] [request="segmentIDs:431712294019792897 segmentIDs:431712294019792898 "]
[2022/03/09 18:50:47.589 +00:00] [INFO] [impl.go:3925] ["received get flush state response"] [response="status:<> "]
[2022/03/09 18:50:48.095 +00:00] [INFO] [impl.go:3911] ["received get flush state request"] [request="segmentIDs:431712294019792897 segmentIDs:431712294019792898 "]
[2022/03/09 18:50:48.098 +00:00] [INFO] [impl.go:3925] ["received get flush state response"] [response="status:<> "]
[2022/03/09 18:50:48.604 +00:00] [INFO] [impl.go:3911] ["received get flush state request"] [request="segmentIDs:431712294019792897 segmentIDs:431712294019792898 "]
[2022/03/09 18:50:48.608 +00:00] [INFO] [impl.go:3925] ["received get flush state response"] [response="status:<> "]
[2022/03/09 18:50:49.111 +00:00] [INFO] [impl.go:3911] ["received get flush state request"] [request="segmentIDs:431712294019792897 segmentIDs:431712294019792898 "]
[2022/03/09 18:50:49.114 +00:00] [INFO] [impl.go:3925] ["received get flush state response"] [response="status:<> "]
[2022/03/09 18:50:49.622 +00:00] [INFO] [impl.go:3911] ["received get flush state request"] [request="segmentIDs:431712294019792897 segmentIDs:431712294019792898 "]
[2022/03/09 18:50:49.625 +00:00] [INFO] [impl.go:3925] ["received get flush state response"] [response="status:<> "]
[2022/03/09 18:50:50.128 +00:00] [INFO] [impl.go:3911] ["received get flush state request"] [request="segmentIDs:431712294019792897 segmentIDs:431712294019792898 "]
[2022/03/09 18:50:50.130 +00:00] [INFO] [impl.go:3925] ["received get flush state response"] [response="status:<> "]
[2022/03/09 18:50:50.637 +00:00] [INFO] [impl.go:3911] ["received get flush state request"] [request="segmentIDs:431712294019792897 segmentIDs:431712294019792898 "]
[2022/03/09 18:50:50.644 +00:00] [INFO] [impl.go:3925] ["received get flush state response"] [response="status:<> "]
[2022/03/09 18:50:51.147 +00:00] [INFO] [impl.go:3911] ["received get flush state request"] [request="segmentIDs:431712294019792897 segmentIDs:431712294019792898 "]
[2022/03/09 18:50:51.148 +00:00] [INFO] [impl.go:3925] ["received get flush state response"] [response="status:<> "]
[2022/03/09 18:50:51.653 +00:00] [INFO] [impl.go:3911] ["received get flush state request"] [request="segmentIDs:431712294019792897 segmentIDs:431712294019792898 "]
[2022/03/09 18:50:51.654 +00:00] [INFO] [impl.go:3925] ["received get flush state response"] [response="status:<> "]
Expected Behavior
hello_milvus.py running well
Steps To Reproduce
see https://github.com/milvus-io/milvus/runs/5485150266?check_suite_focus=true
Anything else?
job link: https://github.com/milvus-io/milvus/runs/5485150266?check_suite_focus=true logs: https://github.com/milvus-io/milvus/suites/5597828790/artifacts/181714111
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 31 (29 by maintainers)
Commits related to this issue
- Fix DataNode processes event out of order The probability is low so very unlikly to reproduce See also: #15966 Signed-off-by: yangxuan <xuan.yang@zilliz.com> — committed to XuanYang-cn/milvus by XuanYang-cn 2 years ago
- Fix DataNode processes event out of order (#17440) The probability is low so very unlikly to reproduce See also: #15966 Signed-off-by: yangxuan <xuan.yang@zilliz.com> — committed to milvus-io/milvus by XuanYang-cn 2 years ago
- Skip remove if reassigns to the original node See also: #15966, #17432 Signed-off-by: yangxuan <xuan.yang@zilliz.com> — committed to XuanYang-cn/milvus by XuanYang-cn 2 years ago
- Skip remove if reassigns to the original node See also: #15966, #17432 Signed-off-by: yangxuan <xuan.yang@zilliz.com> — committed to XuanYang-cn/milvus by XuanYang-cn 2 years ago
- Skip remove if reassigns to the original node See also: #15966, #17432 Signed-off-by: yangxuan <xuan.yang@zilliz.com> — committed to XuanYang-cn/milvus by XuanYang-cn 2 years ago
- Skip remove if reassigns to the original node Fix ut race See also: #15966, #17432 Signed-off-by: yangxuan <xuan.yang@zilliz.com> — committed to XuanYang-cn/milvus by XuanYang-cn 2 years ago
- Skip remove if reassigns to the original node Fix ut race See also: #15966, #17432 Signed-off-by: yangxuan <xuan.yang@zilliz.com> — committed to XuanYang-cn/milvus by XuanYang-cn 2 years ago
- Skip remove if reassigns to the original node (#17450) Fix ut race See also: #15966, #17432 Signed-off-by: yangxuan <xuan.yang@zilliz.com> — committed to milvus-io/milvus by XuanYang-cn 2 years ago
Goroutines aren’t guaranteed to be executed in order.
@XuanYang-cn
It is still reproduced.
failed job: https://github.com/milvus-io/milvus/runs/6760970487?check_suite_focus=true log: https://github.com/milvus-io/milvus/suites/6814256315/artifacts/262399001