milvus: [Bug]: Load failed with error message `collection xxx has not been loaded to memory or load failed` after `allcluster` pod kill chaos test

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: master-20220619-516d9147
- Deployment mode(standalone or cluster): cluster
- SDK version(e.g. pymilvus v2.0.0rc2): pymilvus==2.1.0.dev69
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

Traceback (most recent call last):
  File "chaos/scripts/hello_milvus.py", line 116, in <module>
    hello_milvus(args.host)
  File "chaos/scripts/hello_milvus.py", line 79, in hello_milvus
    collection.load()
  File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/pymilvus/orm/collection.py", line 450, in load
    conn.load_collection(self._name, replica_number=replica_number, timeout=timeout, **kwargs)
  File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/pymilvus/decorators.py", line 61, in handler
    raise e
  File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/pymilvus/decorators.py", line 44, in handler
    return func(self, *args, **kwargs)
  File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/pymilvus/decorators.py", line 79, in handler
    raise e
  File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/pymilvus/decorators.py", line 75, in handler
    return func(*args, **kwargs)
  File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/pymilvus/client/grpc_handler.py", line 686, in load_collection
    self.wait_for_loading_collection(collection_name, timeout)
  File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/pymilvus/decorators.py", line 61, in handler
    raise e
  File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/pymilvus/decorators.py", line 44, in handler
    return func(self, *args, **kwargs)
  File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/pymilvus/decorators.py", line 79, in handler
    raise e
  File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/pymilvus/decorators.py", line 75, in handler
    return func(*args, **kwargs)
  File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/pymilvus/client/grpc_handler.py", line 697, in wait_for_loading_collection
    return self._wait_for_loading_collection(collection_name, timeout)
  File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/pymilvus/client/grpc_handler.py", line 726, in _wait_for_loading_collection
    progress = self.get_collection_loading_progress(collection_name, timeout)
  File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/pymilvus/client/grpc_handler.py", line 705, in get_collection_loading_progress
    raise MilvusException(response.status.error_code, response.status.reason)
pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=collection hello_milvus has not been loaded to memory or load failed)>

Expected Behavior

all test cases passed

Steps To Reproduce

see https://github.com/milvus-io/milvus/runs/6956973459?check_suite_focus=true

Milvus Log

failed job: https://github.com/milvus-io/milvus/runs/6956973459?check_suite_focus=true log: https://github.com/milvus-io/milvus/suites/6997534572/artifacts/274320003

Anything else?

some related issue

https://github.com/milvus-io/milvus/issues/17607

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 19 (17 by maintainers)

Most upvoted comments

According the offline disccussion with @letian-jiang @xiaocai2333 @xige-16, we shall:

  • Add reference lock during the whole procedure of hand-off
  • Let compaction respect reference lock (Done by #17649)
  • Leave LoadCollection & LoadPartition since retry with new meta shall be alright. @letian-jiang is fixing this problem