fastapi: Gunicorn Workers Hangs And Consumes Memory Forever

Describe the bug I have deployed FastAPI which queries the database and returns the results. I made sure closing the DB connection and all. I’m running gunicorn with this line ; gunicorn -w 8 -k uvicorn.workers.UvicornH11Worker -b 0.0.0.0 app:app --timeout 10 So after exposing it to the web, I run a load test which makes 30-40 requests in parallel to the fastapi. And the problem starts here. I’m watching the ‘HTOP’ in the mean time and I see that RAM usage is always growing, seems like no task is killed after completing it’s job. Then I checked the Task numbers, same goes for it too, seems like gunicorn workers do not get killed. After some time RAM usage gets at it’s maximum, and starts to throw errors. So I killed the gunicorn app but the thing is processes spawned by main gunicorn proces did not get killed and still using all the memory.

Environment:

OS: Ubuntu 18.04
FastAPI Version : 0.38.1
Python version : 3.7.4

About this issue

Original URL
State: closed
Created 5 years ago
Comments: 63 (6 by maintainers)

Most upvoted comments

Hi everyone,

I just read the source code of fastAPI and test it myself. First of all, this should not be a memory leak issue, but the problem is if your machine has a lot of CPUs, it will occupy a lot of memory.

The only difference is in starlette.routing.py methodrequest_response()

async def run_endpoint_function(
    *, dependant: Dependant, values: Dict[str, Any], is_coroutine: bool
) -> Any:
    # Only called by get_request_handler. Has been split into its own function to
    # facilitate profiling endpoints, since inner functions are harder to profile.
    assert dependant.call is not None, "dependant.call must be a function"

    if is_coroutine:
        return await dependant.call(**values)
    else:
        return await run_in_threadpool(dependant.call, **values)
  
 
async def run_in_threadpool(
    func: typing.Callable[..., T], *args: typing.Any, **kwargs: typing.Any
) -> T:
    loop = asyncio.get_event_loop()
    if contextvars is not None:  # pragma: no cover
        # Ensure we run in the same context
        child = functools.partial(func, *args, **kwargs)
        context = contextvars.copy_context()
        func = context.run
        args = (child,)
    elif kwargs:  # pragma: no cover
        # loop.run_in_executor doesn't accept 'kwargs', so bind them in here
        func = functools.partial(func, **kwargs)
    return await loop.run_in_executor(None, func, *args)

If the your rest interface is not async, it will run in loop.run_in_executor, but starlette do not specify the executor here, so the default thread pool size should be os.cpu_count() * 5, my test machine has 40 cpus so I should have 200 threads in the pool. And after each request it will not release the object in these threads, unless the thread be reused by next request, which will occupy a lot of memory, but at the end it’s not memory leak.

below is my test code if you want to reproduce it

import asyncio

import cv2 as cv
import gc
from pympler import tracker
from concurrent import futures

# you can change worker number here
executor = futures.ThreadPoolExecutor(max_workers=1)

memory_tracker = tracker.SummaryTracker()

def mm():
    img = cv.imread("cap.jpg", 0)
    detector = cv.AKAZE_create()
    kpts, desc = detector.detectAndCompute(img, None)
    gc.collect()
    memory_tracker.print_diff()
    return None

async def main():
    while True:
        loop = asyncio.get_event_loop()
        await loop.run_in_executor(executor, mm)


if __name__=='__main__':
    loop = asyncio.get_event_loop()
    loop.run_until_complete(main())

Even though it’s not memory leak, I still think it’s not a good implementation cuz it’s sensitive to your cpu count and when you run large deep learning model in fastAPI, you will find it occupy a ton of memory. So I suggest could we make the thread pool size configurable?

If you are interested in my process reading the source code, pls refer to my blog and give me a like(https://www.jianshu.com/p/e4595c48d091)

Sorry for only write blogs in Chinese 😃

Current Solution

python 3.9 already limit the threads in thread pool as below,

     if max_workers is None:
            # ThreadPoolExecutor is often used to:
            # * CPU bound task which releases GIL
            # * I/O bound task (which releases GIL, of course)
            #
            # We use cpu_count + 4 for both types of tasks.
            # But we limit it to 32 to avoid consuming surprisingly large resource
            # on many core machine.
            max_workers = min(32, (os.cpu_count() or 1) + 4)
        if max_workers <= 0:
            raise ValueError("max_workers must be greater than 0")

If for ur program, 32 thread is not too large, you can upgrade python to 3.9 to avoid this issue.

Use async to define ur interface, then the request will run in an event loop, but the throughput maybe infected.

+32

ZackJiang21 on Aug 26, 2021

Some statistics for python3.7, python3.8, and async.

Initial Mem Usage
==========================================
fastapi-py37: 76.21MiB / 7.353GiB
fastapi-py38: 75.86MiB / 7.353GiB
fastapi-py37-async: 75.44MiB / 7.353GiB
fastapi-py38-async: 75.62MiB / 7.353GiB
==========================================
Run 1000 Requests....
==========================================
Run fastapi-py37
real: 0m16.632s; user 0m4.748s; system 0m2.855s
Run fastapi-py38
real: 0m15.319s; user 0m4.750s; system 0m2.722s
Run fastapi-py37-async
real: 0m21.276s; user 0m4.877s; system 0m2.823s
Run fastapi-py38-async
real: 0m22.568s; user 0m5.218s; system 0m2.935s
==========================================
After 1000 Requests Mem Usage
==========================================
fastapi-py37: 1.266GiB / 7.353GiB
fastapi-py38: 144.8MiB / 7.353GiB
fastapi-py37-async: 84.07MiB / 7.353GiB
fastapi-py38-async: 83.63MiB / 7.353GiB
==========================================

+12

kevchentw on May 28, 2020

hi all i can reproduce memory leak

from fastapi import FastAPI
import uvicorn
app = FastAPI()

@app.get("/")
def mm():
    data = [0] * 1000000
    data2 = [0] * 1000000
    return {"message": "Hello World"}
#uvicorn.run(app, host="0.0.0.0", port=8080)

memory usage rises to about 700mb on request 1000

+11

bestend on May 27, 2020

just for info: I run the same function under flask -> and the memory is constant (!) -> so for me there is something wrong with async…

+11

madkote on Dec 4, 2019

Hi, I have write a simple test to validate this issue. It seems that python3.8 fix this problem.

Initial Mem Usage
==========================================
fastapi-py37: 183.6MiB / 31.16GiB
fastapi-py38: 187.6MiB / 31.16GiB
==========================================
After 1000 Requests Mem Usage
==========================================
fastapi-py37: 6.943GiB / 31.16GiB
fastapi-py38: 386.4MiB / 31.16GiB
==========================================

Sample Code: https://github.com/kevchentw/fastapi-memory-leak

+10

kevchentw on May 28, 2020

still got this problem when using fastapi with defining a router function with a model to run in. The RAM has continuous goes high.

the solution here maybe:

try python3.8 (in which ThreadPoolExecutor has default worker number)
using async def and loop.run_in_executor with a global ThreadPoolExecutor to run the model function

update: RAM consume change after change to python3.8 python3 7

py3 8

lexee on Sep 14, 2020

I also found this problem, when i use gunicorn+flask ，memory would increase fastly ，and my application on k8s paltform can handle 1000000 requests， how to solve this problem?

ty2009137128 on Dec 15, 2020

You’re using the PyPy compliant uvicorn worker class - is your system based on PyPy? If you’re running on cpython then I suggest you try out the cpython implementation uvicorn.workers.UvicornWorker.

I have noticed that too, under high load memory is left allocated but for single requests memory gets cleared up. And I already tried it making async but it is not deallocating the memory as well.

@wanaryytel

This is probably an issue with starlette’s run_in_threadpool, or maybe even the python ThreadPoolExecutor. If you port that endpoint to starlette, I expect you’ll get the same behavior.

Recently the starlette and uvicorn teams have been pretty good about addressing issues; if you can reproduce the memory leak in starlette, I’d recommend creating an issue demonstrating it in the starlette (and possible uvicorn?) repos.

Hmm, reproducing it in Starlette makes sense. I will reproduce the issue and open an issue on Starlette repo. Thanks for the idea

MeteHanC on Oct 11, 2019

@wanaryytel

This is probably an issue with starlette’s run_in_threadpool, or maybe even the python ThreadPoolExecutor. If you port that endpoint to starlette, I expect you’ll get the same behavior.

Recently the starlette and uvicorn teams have been pretty good about addressing issues; if you can reproduce the memory leak in starlette, I’d recommend creating an issue demonstrating it in the starlette (and possible uvicorn?) repos.

dmontagu on Oct 11, 2019

still got this problem, memory does not get deallocated, my conditions:

python 3.8.9

fastapi==0.63.0
gunicorn==20.0.4
uvicorn==0.11.8`

["gunicorn", "-b", "0.0.0.0:8080", "-w", "3",'-k', 'uvicorn.workers.UvicornWorker', "palette.main:app", '--timeout', '0', "--graceful-timeout", "5", '--access-logfile', '-', '--error-logfile', '-', '--log-level', 'error']

events

def create_start_app_handler(app: FastAPI) -> Callable:  # type: ignore
    async def start_app() -> None:
        app.state.executor = ProcessPoolExecutor(max_workers=max(cpu_count()-1, 1))

    return start_app


def create_stop_app_handler(app: FastAPI) -> Callable:  # type: ignore
    @logger.catch
    async def stop_app() -> None:
        app.state.executor.shutdown()

    return stop_app

executor

async def run_fn_in_executor(request: Request, fn, *args):
    loop = asyncio.get_running_loop()
    return await loop.run_in_executor(request.app.state.executor, fn, *args)

routes

@router.post("/image", status_code=200, response_model=ImageResponse)
async def extract_color_palette(request: Request, image: ImageRequest):
    file = await fetch_content(image.url)
    colors = await run_fn_in_executor(request, process_image_to_palette, file)
    return ImageResponse(url=image.url, colors=colors)

Rokfordchez on Apr 21, 2021

I don’t think this is solved but seems like we all found some work arounds. So I think it is up to you closing it @tiangolo

MeteHanC on Apr 12, 2020

I’m seeing same issue with

fastapi==0.74.1
uvicorn[standard]==0.17.5
isort==5.10.1
black==22.3.0
flake8==4.0.1
pandas==1.4.1
numpy==1.22.2
pymysql==1.0.2
pytest==7.0.1
requests==2.27.1
coverage==6.3.2
pytest-cov==3.0.0
python-configuration==0.8.2
google-cloud-secret-manager==2.9.2
SQLAlchemy==1.4.32
cryptography==36.0.2
tenacity==8.0.0
httpx==0.22.0
pytest-asyncio==0.18.3
ddtrace==0.60.1
alembic==1.7.7
sqlalchemy[asyncio]==1.4.32
aiomysql==0.1.0
pytest-env==0.6.2
pytest-mock==3.7.0
datadog>=0.42.0
pyhumps==3.5.3
dave-metrics>=0.5.1
hypercorn[uvloop]==0.13.2
gunicorn==20.1.0

Running python:3.10.1-slim-buster in k8s with gunicorn -k uvicorn.workers.UvicornWorker app.api.server:app --workers 1 --bind 0.0.0.0:8000

joshlincoln on Jun 2, 2022

I have solved this issue with following settings:

python=3.8.9
fastapi=0.63.0
uvicorn=0.17.6
uvloop=0.16.0

yusufcakmakk on Mar 16, 2022

@tiangolo would it be appropriate to close this issue with a wont-fix? It seems like the memory leak bug arises from uvloop on python versions older than python3.8, so it is unlikely to be fixed. Regardless, it is external to fastapi.

Anecdotally, I have encountered memory leaks when mixing uvloop and code with a lot of c extensions in python3.8+ outside of the context of a web service like fastapi/uvicorn/gunicorn. In light of this, perhaps an example of how to run fastapi without uvloop would be appropriate.

vdwees on Mar 17, 2021

Great, it seems this the issue was solved!

@MeteHanC may we close it now?

If anyone else is having other problems, please create a new issue so we can keep the conversation focused 🤓

tiangolo on Apr 12, 2020

Yeah, in this case I suggested it specifically because it looked like the function involved would be compatible with a subprocess call.

There is a ProcessPool interface similar to ThreadPool that you can use very similarly to how run_in_threadpool works in starlette. I think that mostly solves the problems related to managing the worker processes. But yes, the arguments/return types need to be pickleable, so there are many cases where you wouldn’t want to take this approach.

dmontagu on Jan 8, 2020

would be happy to take a look a it @madkote if you get a simple reproductible example

euri10 on Dec 4, 2019

Hi all

In My flask restful web application I tried setting the max_requests option in gunicorn config file as 500 requests. so after every 500 requests the worker will reboot, this helped me reducing some amount of memory but still I face the increasing memory issue

hi @HsengivS , you can add threaded=False for Flask like app.run(host='0.0.0.0', port=9333, threaded= False). I used it successfully to avoid Flask memory leaking issue.

boy-be-ambitious on Dec 6, 2021

@tiangolo would it be appropriate to close this issue with a wont-fix? It seems like the memory leak bug arises from uvloop on python versions older than python3.8, so it is unlikely to be fixed. Regardless, it is external to fastapi. Anecdotally, I have encountered memory leaks when mixing uvloop and code with a lot of c extensions in python3.8+ outside of the context of a web service like fastapi/uvicorn/gunicorn. In light of this, perhaps an example of how to run fastapi without uvloop would be appropriate.

I’m using python 3.9 and still have the issue. The memory consumption is always >90%.

After further investigation, in my case, the memory consumption was normal. For testing, I added more RAM and saw the memory consumption pretty stable. I ended up migrating to uvicorn and lowering the number of workers from 4 to 3, in my case was enough.

Thanks for the help!

Trinkes on Sep 8, 2021

similar to @madkote (and im using no ctypes) gunicorn & flask see no issue

backnotprop on May 18, 2020

hey @madkote thanks for asking. That involves a request via requests and some scikit-learn / numpy operations on it, basically. I’ll dig on them, if this appears to be the problem!

stefanondisponibile on Apr 2, 2020

I just stopped using Python’s multiprocessing pool. I merged the queries I wanted to execute concurrently and let my DB to parallelize the queries. Thanks @tiangolo and everyone here.

MeteHanC on Feb 20, 2020

I encountered the same problem, a endpoint defined with def and hit it again and again, memory leak will happen. but defined with async, it will not. Looking forward to the outcome of the discussion.

FLming on Dec 26, 2019

@dmontagu @euri10 I still have a suspect on two aspects:

my model library - I will do even more testing and profiling
I found out that blocking code performs better in terms of memory. With async every request increase by ~5MB (some model operation, and data processing). But with blocking manner it is about ~1mb every 2-3 requests…

everywhere I use the same model, even I tried with many (custom and free available)… the result is always the same.

So, please give me some more time to make a reasonable example to reproduce the issue.

madkote on Dec 4, 2019

@madkote this starlette issue includes a good discussion of the problem (and some patterns to avoid that might be exacerbating it): https://github.com/encode/starlette/issues/667

dmontagu on Dec 4, 2019

@madkote I’m not 100% confident this explains the issue, but I think this may actually just be how python works. I think this article does a good job explaining it:

You’ll notice that I’ve been saying “free” in quotes quite a bit. The reason is that when a block is deemed “free”, that memory is not actually freed back to the operating system. The Python process keeps it allocated and will use it later for new data. Truly freeing memory returns it to the operating system to use.

Edit: just realized I already replied in this issue with a link to similar discussions 😄.

dmontagu on Dec 4, 2019

hi all, I have noticed the same issue with fastapi. the function is async def - inside I do load a resource ~200mb, do something with this and return the response. the memory is not given free.

Example:

import gc
@router_dev.post(...)
async def endpoint(...):
  model = Model(....)  # load model from file
  results = []
  try:
       ... # do something with model
       ... # alternative - also do something with the model in thread pool
       ... # doing something with the model - is some computation and each step increases memory for about ~1mb (this is expected and gets free once done
      ... # above is tested in the library (normal function, exact same way) and there are no mem leaks - memory gets free as expected
     results.append(...)  # append here a string
  finally:
     # model = None # this is also not working well
     del model
     gc.collect()
  return dict(results=results)

This occurs with:

gunicron uvicorn.workers.UvicornWorker
uvicorn
hypecorn
this is a simple one requests after another… and the memory keeps growing

So to me it seems to be a bug in starlette or uvicorn…

madkote on Dec 4, 2019

@wanaryytel It looks like this might actually be related to how python manages memory – it’s not guaranteed to release memory back to the os.

The top two answers to this stack overflow question have a lot of good info on this topic, and might point in the right direction.

That said, given you are just executing the same call over and over, it’s not clear to me why it wouldn’t reuse the memory – there could be something leaking here (possibly related to the ThreadPoolExecutor…). You could check if it was related to the ThreadPoolExecutor by checking if you got the same behavior with an async def endpoint, which would not run in the ThreadPoolExecutor.

If the requests were being made concurrently, I suppose that could explain the use of more memory, which would then feed into the above stack overflow answer’s explanation. But if you just kept calling the endpoint one call at a time in sequence, I think it’s harder to explain.

If you really wanted to dig into this, it might be worth looking at the gc module and seeing if manually calling the garbage collector helps at all.

@MeteHanC whether this explains/addresses your issue or not definitely depends on what your endpoints are doing.

dmontagu on Oct 16, 2019

You’re using the PyPy compliant uvicorn worker class - is your system based on PyPy? If you’re running on cpython then I suggest you try out the cpython implementation uvicorn.workers.UvicornWorker.

But in other news, I’m seeing something similar. I just run uvicorn with this: uvicorn --host 0.0.0.0 --port 7001 app:api --reload but in some cases the memory is never freed up.

For example this function:

@api.post("/randompath")
def get_xoxo(file: UploadFile = File(...)):
    k = []
    for i in range(10):
        k.append('gab' * 9999999)

When I hit the endpoint once, the memory is cleared up, but when I hit it 10x, some of the memory is left allocated and when I hit it another 10x, even more memory is left allocated. This continues until I run out of memory or restart the process. If I change the get_xoxo function to be async, then the memory is always cleared up, but the function also blocks much more (which makes sense since I’m not taking advantage of any awaits in there).

So - is there a memory leak? I’m not sure, but something is handled incorrectly.

My system is running on python:3.7 Docker container. Basically the same problem occurs in production where uvicorn is run with uvicorn --host 0.0.0.0 --port %(ENV_UVICORN_PORT)s --workers %(ENV_UVICORN_WORKERS)s --timeout-keep-alive %(ENV_UVICORN_KEEPELIVE)s --log-level %(ENV_UVICORN_LOGLEVEL)s app:api.

wanaryytel on Oct 11, 2019