fastapi: Gunicorn Workers Hangs And Consumes Memory Forever

Describe the bug I have deployed FastAPI which queries the database and returns the results. I made sure closing the DB connection and all. I’m running gunicorn with this line ; gunicorn -w 8 -k uvicorn.workers.UvicornH11Worker -b 0.0.0.0 app:app --timeout 10 So after exposing it to the web, I run a load test which makes 30-40 requests in parallel to the fastapi. And the problem starts here. I’m watching the ‘HTOP’ in the mean time and I see that RAM usage is always growing, seems like no task is killed after completing it’s job. Then I checked the Task numbers, same goes for it too, seems like gunicorn workers do not get killed. After some time RAM usage gets at it’s maximum, and starts to throw errors. So I killed the gunicorn app but the thing is processes spawned by main gunicorn proces did not get killed and still using all the memory.

Environment:

  • OS: Ubuntu 18.04

  • FastAPI Version : 0.38.1

  • Python version : 3.7.4

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 63 (6 by maintainers)

Most upvoted comments

Hi everyone,

I just read the source code of fastAPI and test it myself. First of all, this should not be a memory leak issue, but the problem is if your machine has a lot of CPUs, it will occupy a lot of memory.

The only difference is in starlette.routing.py methodrequest_response()

async def run_endpoint_function(
    *, dependant: Dependant, values: Dict[str, Any], is_coroutine: bool
) -> Any:
    # Only called by get_request_handler. Has been split into its own function to
    # facilitate profiling endpoints, since inner functions are harder to profile.
    assert dependant.call is not None, "dependant.call must be a function"

    if is_coroutine:
        return await dependant.call(**values)
    else:
        return await run_in_threadpool(dependant.call, **values)
  
 
async def run_in_threadpool(
    func: typing.Callable[..., T], *args: typing.Any, **kwargs: typing.Any
) -> T:
    loop = asyncio.get_event_loop()
    if contextvars is not None:  # pragma: no cover
        # Ensure we run in the same context
        child = functools.partial(func, *args, **kwargs)
        context = contextvars.copy_context()
        func = context.run
        args = (child,)
    elif kwargs:  # pragma: no cover
        # loop.run_in_executor doesn't accept 'kwargs', so bind them in here
        func = functools.partial(func, **kwargs)
    return await loop.run_in_executor(None, func, *args)

If the your rest interface is not async, it will run in loop.run_in_executor, but starlette do not specify the executor here, so the default thread pool size should be os.cpu_count() * 5, my test machine has 40 cpus so I should have 200 threads in the pool. And after each request it will not release the object in these threads, unless the thread be reused by next request, which will occupy a lot of memory, but at the end it’s not memory leak.

below is my test code if you want to reproduce it

import asyncio

import cv2 as cv
import gc
from pympler import tracker
from concurrent import futures

# you can change worker number here
executor = futures.ThreadPoolExecutor(max_workers=1)

memory_tracker = tracker.SummaryTracker()

def mm():
    img = cv.imread("cap.jpg", 0)
    detector = cv.AKAZE_create()
    kpts, desc = detector.detectAndCompute(img, None)
    gc.collect()
    memory_tracker.print_diff()
    return None

async def main():
    while True:
        loop = asyncio.get_event_loop()
        await loop.run_in_executor(executor, mm)


if __name__=='__main__':
    loop = asyncio.get_event_loop()
    loop.run_until_complete(main())

Even though it’s not memory leak, I still think it’s not a good implementation cuz it’s sensitive to your cpu count and when you run large deep learning model in fastAPI, you will find it occupy a ton of memory. So I suggest could we make the thread pool size configurable?

If you are interested in my process reading the source code, pls refer to my blog and give me a like(https://www.jianshu.com/p/e4595c48d091)

Sorry for only write blogs in Chinese 😃

Current Solution

  1. python 3.9 already limit the threads in thread pool as below,

         if max_workers is None:
                # ThreadPoolExecutor is often used to:
                # * CPU bound task which releases GIL
                # * I/O bound task (which releases GIL, of course)
                #
                # We use cpu_count + 4 for both types of tasks.
                # But we limit it to 32 to avoid consuming surprisingly large resource
                # on many core machine.
                max_workers = min(32, (os.cpu_count() or 1) + 4)
            if max_workers <= 0:
                raise ValueError("max_workers must be greater than 0")
    

    If for ur program, 32 thread is not too large, you can upgrade python to 3.9 to avoid this issue.

  2. Use async to define ur interface, then the request will run in an event loop, but the throughput maybe infected.

Some statistics for python3.7, python3.8, and async.

Initial Mem Usage
==========================================
fastapi-py37: 76.21MiB / 7.353GiB
fastapi-py38: 75.86MiB / 7.353GiB
fastapi-py37-async: 75.44MiB / 7.353GiB
fastapi-py38-async: 75.62MiB / 7.353GiB
==========================================
Run 1000 Requests....
==========================================
Run fastapi-py37
real: 0m16.632s; user 0m4.748s; system 0m2.855s
Run fastapi-py38
real: 0m15.319s; user 0m4.750s; system 0m2.722s
Run fastapi-py37-async
real: 0m21.276s; user 0m4.877s; system 0m2.823s
Run fastapi-py38-async
real: 0m22.568s; user 0m5.218s; system 0m2.935s
==========================================
After 1000 Requests Mem Usage
==========================================
fastapi-py37: 1.266GiB / 7.353GiB
fastapi-py38: 144.8MiB / 7.353GiB
fastapi-py37-async: 84.07MiB / 7.353GiB
fastapi-py38-async: 83.63MiB / 7.353GiB
==========================================

hi all i can reproduce memory leak

from fastapi import FastAPI
import uvicorn
app = FastAPI()

@app.get("/")
def mm():
    data = [0] * 1000000
    data2 = [0] * 1000000
    return {"message": "Hello World"}
#uvicorn.run(app, host="0.0.0.0", port=8080)

memory usage rises to about 700mb on request 1000

just for info: I run the same function under flask -> and the memory is constant (!) -> so for me there is something wrong with async…

Hi, I have write a simple test to validate this issue. It seems that python3.8 fix this problem.

Initial Mem Usage
==========================================
fastapi-py37: 183.6MiB / 31.16GiB
fastapi-py38: 187.6MiB / 31.16GiB
==========================================
After 1000 Requests Mem Usage
==========================================
fastapi-py37: 6.943GiB / 31.16GiB
fastapi-py38: 386.4MiB / 31.16GiB
==========================================

Sample Code: https://github.com/kevchentw/fastapi-memory-leak

still got this problem when using fastapi with defining a router function with a model to run in. The RAM has continuous goes high.

the solution here maybe:

  1. try python3.8 (in which ThreadPoolExecutor has default worker number)
  2. using async def and loop.run_in_executor with a global ThreadPoolExecutor to run the model function

update: RAM consume change after change to python3.8 python3 7

py3 8

I also found this problem, when i use gunicorn+flask ,memory would increase fastly ,and my application on k8s paltform can handle 1000000 requests, how to solve this problem?

You’re using the PyPy compliant uvicorn worker class - is your system based on PyPy? If you’re running on cpython then I suggest you try out the cpython implementation uvicorn.workers.UvicornWorker.

I have noticed that too, under high load memory is left allocated but for single requests memory gets cleared up. And I already tried it making async but it is not deallocating the memory as well.

@wanaryytel

This is probably an issue with starlette’s run_in_threadpool, or maybe even the python ThreadPoolExecutor. If you port that endpoint to starlette, I expect you’ll get the same behavior.

Recently the starlette and uvicorn teams have been pretty good about addressing issues; if you can reproduce the memory leak in starlette, I’d recommend creating an issue demonstrating it in the starlette (and possible uvicorn?) repos.

Hmm, reproducing it in Starlette makes sense. I will reproduce the issue and open an issue on Starlette repo. Thanks for the idea

@wanaryytel

This is probably an issue with starlette’s run_in_threadpool, or maybe even the python ThreadPoolExecutor. If you port that endpoint to starlette, I expect you’ll get the same behavior.

Recently the starlette and uvicorn teams have been pretty good about addressing issues; if you can reproduce the memory leak in starlette, I’d recommend creating an issue demonstrating it in the starlette (and possible uvicorn?) repos.

still got this problem, memory does not get deallocated, my conditions:

python 3.8.9

fastapi==0.63.0
gunicorn==20.0.4
uvicorn==0.11.8`

["gunicorn", "-b", "0.0.0.0:8080", "-w", "3",'-k', 'uvicorn.workers.UvicornWorker', "palette.main:app", '--timeout', '0', "--graceful-timeout", "5", '--access-logfile', '-', '--error-logfile', '-', '--log-level', 'error']

events

def create_start_app_handler(app: FastAPI) -> Callable:  # type: ignore
    async def start_app() -> None:
        app.state.executor = ProcessPoolExecutor(max_workers=max(cpu_count()-1, 1))

    return start_app


def create_stop_app_handler(app: FastAPI) -> Callable:  # type: ignore
    @logger.catch
    async def stop_app() -> None:
        app.state.executor.shutdown()

    return stop_app

executor

async def run_fn_in_executor(request: Request, fn, *args):
    loop = asyncio.get_running_loop()
    return await loop.run_in_executor(request.app.state.executor, fn, *args)

routes

@router.post("/image", status_code=200, response_model=ImageResponse)
async def extract_color_palette(request: Request, image: ImageRequest):
    file = await fetch_content(image.url)
    colors = await run_fn_in_executor(request, process_image_to_palette, file)
    return ImageResponse(url=image.url, colors=colors)

I don’t think this is solved but seems like we all found some work arounds. So I think it is up to you closing it @tiangolo

I’m seeing same issue with

fastapi==0.74.1
uvicorn[standard]==0.17.5
isort==5.10.1
black==22.3.0
flake8==4.0.1
pandas==1.4.1
numpy==1.22.2
pymysql==1.0.2
pytest==7.0.1
requests==2.27.1
coverage==6.3.2
pytest-cov==3.0.0
python-configuration==0.8.2
google-cloud-secret-manager==2.9.2
SQLAlchemy==1.4.32
cryptography==36.0.2
tenacity==8.0.0
httpx==0.22.0
pytest-asyncio==0.18.3
ddtrace==0.60.1
alembic==1.7.7
sqlalchemy[asyncio]==1.4.32
aiomysql==0.1.0
pytest-env==0.6.2
pytest-mock==3.7.0
datadog>=0.42.0
pyhumps==3.5.3
dave-metrics>=0.5.1
hypercorn[uvloop]==0.13.2
gunicorn==20.1.0

Running python:3.10.1-slim-buster in k8s with gunicorn -k uvicorn.workers.UvicornWorker app.api.server:app --workers 1 --bind 0.0.0.0:8000

I have solved this issue with following settings:

  • python=3.8.9
  • fastapi=0.63.0
  • uvicorn=0.17.6
  • uvloop=0.16.0

@tiangolo would it be appropriate to close this issue with a wont-fix? It seems like the memory leak bug arises from uvloop on python versions older than python3.8, so it is unlikely to be fixed. Regardless, it is external to fastapi.

Anecdotally, I have encountered memory leaks when mixing uvloop and code with a lot of c extensions in python3.8+ outside of the context of a web service like fastapi/uvicorn/gunicorn. In light of this, perhaps an example of how to run fastapi without uvloop would be appropriate.

Great, it seems this the issue was solved!

@MeteHanC may we close it now?

If anyone else is having other problems, please create a new issue so we can keep the conversation focused 🤓

Yeah, in this case I suggested it specifically because it looked like the function involved would be compatible with a subprocess call.

There is a ProcessPool interface similar to ThreadPool that you can use very similarly to how run_in_threadpool works in starlette. I think that mostly solves the problems related to managing the worker processes. But yes, the arguments/return types need to be pickleable, so there are many cases where you wouldn’t want to take this approach.

would be happy to take a look a it @madkote if you get a simple reproductible example

Hi all

In My flask restful web application I tried setting the max_requests option in gunicorn config file as 500 requests. so after every 500 requests the worker will reboot, this helped me reducing some amount of memory but still I face the increasing memory issue

hi @HsengivS , you can add threaded=False for Flask like app.run(host='0.0.0.0', port=9333, threaded= False). I used it successfully to avoid Flask memory leaking issue.

@tiangolo would it be appropriate to close this issue with a wont-fix? It seems like the memory leak bug arises from uvloop on python versions older than python3.8, so it is unlikely to be fixed. Regardless, it is external to fastapi. Anecdotally, I have encountered memory leaks when mixing uvloop and code with a lot of c extensions in python3.8+ outside of the context of a web service like fastapi/uvicorn/gunicorn. In light of this, perhaps an example of how to run fastapi without uvloop would be appropriate.

I’m using python 3.9 and still have the issue. The memory consumption is always >90%.

After further investigation, in my case, the memory consumption was normal. For testing, I added more RAM and saw the memory consumption pretty stable. I ended up migrating to uvicorn and lowering the number of workers from 4 to 3, in my case was enough.

Thanks for the help!

similar to @madkote (and im using no ctypes) gunicorn & flask see no issue

hey @madkote thanks for asking. That involves a request via requests and some scikit-learn / numpy operations on it, basically. I’ll dig on them, if this appears to be the problem!

I just stopped using Python’s multiprocessing pool. I merged the queries I wanted to execute concurrently and let my DB to parallelize the queries. Thanks @tiangolo and everyone here.

I encountered the same problem, a endpoint defined with def and hit it again and again, memory leak will happen. but defined with async, it will not. Looking forward to the outcome of the discussion.

@dmontagu @euri10 I still have a suspect on two aspects:

  • my model library - I will do even more testing and profiling
  • I found out that blocking code performs better in terms of memory. With async every request increase by ~5MB (some model operation, and data processing). But with blocking manner it is about ~1mb every 2-3 requests…

everywhere I use the same model, even I tried with many (custom and free available)… the result is always the same.

So, please give me some more time to make a reasonable example to reproduce the issue.

@madkote this starlette issue includes a good discussion of the problem (and some patterns to avoid that might be exacerbating it): https://github.com/encode/starlette/issues/667

@madkote I’m not 100% confident this explains the issue, but I think this may actually just be how python works. I think this article does a good job explaining it:

You’ll notice that I’ve been saying “free” in quotes quite a bit. The reason is that when a block is deemed “free”, that memory is not actually freed back to the operating system. The Python process keeps it allocated and will use it later for new data. Truly freeing memory returns it to the operating system to use.

Edit: just realized I already replied in this issue with a link to similar discussions 😄.

hi all, I have noticed the same issue with fastapi. the function is async def - inside I do load a resource ~200mb, do something with this and return the response. the memory is not given free.

Example:

import gc
@router_dev.post(...)
async def endpoint(...):
  model = Model(....)  # load model from file
  results = []
  try:
       ... # do something with model
       ... # alternative - also do something with the model in thread pool
       ... # doing something with the model - is some computation and each step increases memory for about ~1mb (this is expected and gets free once done
      ... # above is tested in the library (normal function, exact same way) and there are no mem leaks - memory gets free as expected
     results.append(...)  # append here a string
  finally:
     # model = None # this is also not working well
     del model
     gc.collect()
  return dict(results=results)

This occurs with:

  • gunicron uvicorn.workers.UvicornWorker
  • uvicorn
  • hypecorn
  • this is a simple one requests after another… and the memory keeps growing

So to me it seems to be a bug in starlette or uvicorn…

@wanaryytel It looks like this might actually be related to how python manages memory – it’s not guaranteed to release memory back to the os.

The top two answers to this stack overflow question have a lot of good info on this topic, and might point in the right direction.

That said, given you are just executing the same call over and over, it’s not clear to me why it wouldn’t reuse the memory – there could be something leaking here (possibly related to the ThreadPoolExecutor…). You could check if it was related to the ThreadPoolExecutor by checking if you got the same behavior with an async def endpoint, which would not run in the ThreadPoolExecutor.

If the requests were being made concurrently, I suppose that could explain the use of more memory, which would then feed into the above stack overflow answer’s explanation. But if you just kept calling the endpoint one call at a time in sequence, I think it’s harder to explain.

If you really wanted to dig into this, it might be worth looking at the gc module and seeing if manually calling the garbage collector helps at all.

@MeteHanC whether this explains/addresses your issue or not definitely depends on what your endpoints are doing.

You’re using the PyPy compliant uvicorn worker class - is your system based on PyPy? If you’re running on cpython then I suggest you try out the cpython implementation uvicorn.workers.UvicornWorker.

But in other news, I’m seeing something similar. I just run uvicorn with this: uvicorn --host 0.0.0.0 --port 7001 app:api --reload but in some cases the memory is never freed up.

For example this function:

@api.post("/randompath")
def get_xoxo(file: UploadFile = File(...)):
    k = []
    for i in range(10):
        k.append('gab' * 9999999)

When I hit the endpoint once, the memory is cleared up, but when I hit it 10x, some of the memory is left allocated and when I hit it another 10x, even more memory is left allocated. This continues until I run out of memory or restart the process. If I change the get_xoxo function to be async, then the memory is always cleared up, but the function also blocks much more (which makes sense since I’m not taking advantage of any awaits in there).

So - is there a memory leak? I’m not sure, but something is handled incorrectly.

My system is running on python:3.7 Docker container. Basically the same problem occurs in production where uvicorn is run with uvicorn --host 0.0.0.0 --port %(ENV_UVICORN_PORT)s --workers %(ENV_UVICORN_WORKERS)s --timeout-keep-alive %(ENV_UVICORN_KEEPELIVE)s --log-level %(ENV_UVICORN_LOGLEVEL)s app:api.