lmdeploy: [Bug] Memory leak for api_server

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.

Describe the bug

When starting the api_server and sending requests from client, the memory usage of the api_server will increase. If kill the client, the memory usage of server will not drop.

In our test environment, for every 1000 prompts, the memory usage will increase 0.1% (116G memory overall).

Reproduction

  1. start server:
lmdeploy serve api_server ./workspace --server-name 0.0.0.0 --server-port 23333 --tp 1
  1. start profiling script:
python benchmark/profile_restful_api.py --server_addr 0.0.0.0:23333 --tokenizer_path /path/to/tokenizer --dataset /path/to/ShareGPT_V3_unfiltered_cleaned_split.json --concurrency 128 --num_prompts 50000
  1. observe the memory usage change of api_server process by htop

Environment

sys.platform: linux
Python: 3.9.16 (main, Aug 15 2023, 19:38:56) [GCC 8.3.1 20190311 (Red Hat 8.3.1-3)]
CUDA available: True
MUSA available: False
numpy_random_seed: 2147483648
GPU 0: NVIDIA A100-SXM4-80GB
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 11.8, V11.8.89
GCC: gcc (GCC) 10.2.1 20210130 (Red Hat 10.2.1-11)
PyTorch: 2.1.2+cu118
TorchVision: 0.16.2+cu121
LMDeploy: 0.2.5+
transformers: 4.37.1
gradio: 3.50.2
fastapi: 0.104.1
pydantic: 2.6.0

About this issue

  • Original URL
  • State: closed
  • Created 3 months ago
  • Comments: 22 (12 by maintainers)

Most upvoted comments

Hi @AllentDan After fixing #1344, the memory leak issue has not been fully resolved. Can you please revisit this problem when you have a moment?

Yeah, I’ve got time for this now. After some research and testing, it seems to me that we have to collect the garbage from the server by the interval.

I will pull a merge request ASAP.

@zhulinJulia24 I mean the memory leak is about cpu memory, not gpu memory. You can use htop or other tools to check the cpu memory change.

Thanks, I can reproduce it!

Maybe we could call show_memory in the while loop at intervals. Some code snippets like this:

import gc
import sys

def show_memory():
    print("*" * 60)
    objects_list = []
    for obj in gc.get_objects():
        size = sys.getsizeof(obj)
        objects_list.append((obj, size))

    sorted_values = sorted(objects_list,
                           key=lambda x: x[1],
                           reverse=True)

    for obj, size in sorted_values[:10]:
        print(f"OBJ: {id(obj)}, "
              f"TYPE: {type(obj)}, "
              f"SIZE: {size/1024/1024:.2f}MB, "
              f"REPR: {str(obj)[:100]}")

@AllentDan I tried to add the following code to debug:

import tracemalloc
tracemalloc.start()
lis = []
@app.get('/v1/mem_info', dependencies=[Depends(check_api_key)])
def get_info():
    snapshot = tracemalloc.take_snapshot()
    if len(lis) == 0:
        lis.append(snapshot)
        return "N/A"
    last = lis[-1]
    top_stats = snapshot.compare_to(last, 'lineno')
    return ' \n '.join([stat.__str__() for stat in top_stats[:10]])

And found the memory diff is mainly caused by the dlpack:

 /workdir/miniconda3/lib/python3.11/site-packages/torch/utils/dlpack.py:117: size=74.3 MiB (+74.3 MiB), count=1623178 (+1623178), average=48 B 
 <unknown>:0: size=1415 KiB (+1415 KiB), count=25860 (+25860), average=56 B 
 /workdir/lmdeploy/lmdeploy/turbomind/turbomind.py:516: size=1404 KiB (+1404 KiB), count=25681 (+25681), average=56 B 
 /workdir/lmdeploy/lmdeploy/serve/async_engine.py:533: size=583 KiB (+583 KiB), count=7372 (+7372), average=81 B 
 /workdir/lmdeploy/lmdeploy/serve/async_engine.py:347: size=583 KiB (+583 KiB), count=7372 (+7372), average=81 B 
 /workdir/miniconda3/lib/python3.11/threading.py:258: size=652 KiB (+364 KiB), count=1758 (+982), average=380 B 
 /workdir/miniconda3/lib/python3.11/queue.py:252: size=217 KiB (+217 KiB), count=121 (+121), average=1834 B 
 /workdir/miniconda3/lib/python3.11/tracemalloc.py:115: size=156 KiB (+156 KiB), count=1998 (+1998), average=80 B 
 /workdir/miniconda3/lib/python3.11/tracemalloc.py:193: size=93.4 KiB (+93.4 KiB), count=1993 (+1993), average=48 B 
 /workdir/miniconda3/lib/python3.11/_weakrefset.py:88: size=68.5 KiB (+65.8 KiB), count=139 (+130), average=505 B