milvus: [Bug]: [Memory]Release and load collection get exception no queryNode to allocate

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: master-20211123-73f18c5
- Deployment mode(standalone or cluster): cluster
- SDK version(e.g. pymilvus v2.0.0rc2): pymilvus 2.0.0rc9.dev11
- OS(Ubuntu or CentOS): 
- CPU/Memory: limit querynode memory 2Gi
- GPU: 
- Others:

Current Behavior

  1. deploy milvus cluster master-20211123-73f18c5 and limit querynode memory 2Gi
  2. load collection (dim=512, num_entities=399360 about 780Mi)
  3. search successfully
  4. release collection (No exception, but the memory usage of querynode did not drop)
  5. re-load failed
[2021-11-23 04:13:43,651 - INFO - ci_test]: ################################################################################ (conftest.py:162)
[2021-11-23 04:13:43,651 - INFO - ci_test]: [initialize_milvus] Log cleaned up, start testing... (conftest.py:163)
[2021-11-23 04:13:44,673 - DEBUG - ci_test]: {
  auto_id: False
  description: 
  fields: [{
    name: int64
    description: 
    type: 5
    is_primary: True
    auto_id: False
  }, {
    name: float
    description: 
    type: 10
  }, {
    name: float_vector
    description: 
    type: 101
    params: {'dim': 512}
  }]
}
 (test_chaos_memory_stress.py:63)
[2021-11-23 04:13:44,673 - DEBUG - ci_test]: 2 (test_chaos_memory_stress.py:64)
[2021-11-23 04:13:48,047 - ERROR - ci_test]: Traceback (most recent call last):
  File "/root/milvus/tests/python_client/utils/api_request.py", line 18, in inner_wrapper
    res = func(*args, **kwargs)
  File "/root/milvus/tests/python_client/utils/api_request.py", line 45, in api_request
    return func(*arg, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/pymilvus/orm/collection.py", line 449, in load
    conn.load_collection(self._name, timeout=timeout, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/pymilvus/client/stub.py", line 58, in handler
    raise e
  File "/usr/local/lib/python3.6/site-packages/pymilvus/client/stub.py", line 42, in handler
    return func(self, *args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/pymilvus/client/stub.py", line 322, in load_collection
    return handler.load_collection("", collection_name=collection_name, timeout=timeout, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/pymilvus/client/grpc_handler.py", line 75, in handler
    raise e
  File "/usr/local/lib/python3.6/site-packages/pymilvus/client/grpc_handler.py", line 67, in handler
    return func(self, *args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/pymilvus/client/grpc_handler.py", line 820, in load_collection
    raise BaseException(response.error_code, response.reason)
pymilvus.client.exceptions.BaseException: <BaseException: (code=1, message=call query coordinator LoadCollection: rpc error: code = Unknown desc = , no queryNode to allocate)>
 (api_request.py:26)
[2021-11-23 04:13:48,047 - ERROR - ci_test]: (api_response) : <BaseException: (code=1, message=call query coordinator LoadCollection: rpc error: code = Unknown desc = , no queryNode to allocate)> (api_request.py:27)

Server log: milvus_logs.tar.gz

Memory usage: image

Expected Behavior

No response

Steps To Reproduce

No response

Anything else?

No response

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 18 (18 by maintainers)

Most upvoted comments

This issue exposed many problems.

  1. The golang runtime returns the memory to os after about two minutes. I test it on my local machine and go version is 1.16. I test insert 500K data with dim 128 in to milvus, load and release for 100 times. The memory usage increase from 58M to 1.5G and fell back to 700M after two minutes. So the problem mentioned in this issue is unsolvable. When the memory is not very large, we cannot load the data immediately after the release but need to wait for golang to return the memory before loading. Otherwise, re-load will inevitably fail due to insufficient memory
  2. The same experiment as in 1 is on @bigsheeper computer, and the go version is 1.15. The memory never drops after it rises to 1.5G. It is a golang bug in @czs007 previous survey. We need to upgrade the version of go from 1.15 to 1.16. https://github.com/milvus-io/milvus/blob/a03bbbfddc7e81a5318519f16bdb4a422b2def8e/go.mod#L3
  3. At the end of the experiment in 1 there are more than 700 M in the memory, there should be a memory leak. After checking with pprof, I found that golang does not occupy much memory, so the memory leak may be the c++ layer. I tried to annotate the loadFieldData function of segcore. That is, do not load data into segcore. The memory still has more than 700M in the memory, indicating that the memory leak did not occur in the segcore. After that, I tried to annotate the deserialization function, and the memory usage rose to more than 200M and finally returned to more than 100M (including timetick, which occupied tens of megabytes of memory). It can be judged from this that there may be a memory leak in the parquet inverse sequence.

I will open two more issues to solve 2 and 3 issues. Problem 1 mentioned in this issue can be closed now.

Experiment codes:

import random

from pymilvus import (
    connections, list_collections,
    FieldSchema, CollectionSchema, DataType,
    Collection,
    utility
)


def hello_milvus():
    # create connection
    connections.connect(host="127.0.0.1",port="19530")

    print(f"\nList collections...")
    print(list_collections())

    # create collection
    dim = 128
    default_fields = [
        FieldSchema(name="count", dtype=DataType.INT64, is_primary=True),
        FieldSchema(name="random_value", dtype=DataType.DOUBLE),
        FieldSchema(name="float_vector", dtype=DataType.FLOAT_VECTOR, dim=dim)
    ]
    default_schema = CollectionSchema(fields=default_fields, description="test collection")

    print(f"\nCreate collection...")
    collection = Collection(name="hello_milvus", schema=default_schema)

    print(f"\nList collections...")
    print(list_collections())

    #  insert data
    nb = 500000
    vectors = [[random.random() for _ in range(dim)] for _ in range(nb)]
    collection.insert(
        [
            [i for i in range(nb)],
            [float(random.randrange(-20, -10)) for _ in range(nb)],
            vectors
        ]
    )

    print(f"\nGet collection entities...")
    print(collection.num_entities)

    import time
    try_time = 100
    while(try_time>0):
        print(f"\nload collection...", try_time)
        collection.load()

        print(f"\nrelease collection...")
        collection.release()
        try_time-=1

hello_milvus()