fastapi: The memory usage piles up over the time and leads to OOM

First check

  • I added a very descriptive title to this issue.
  • I used the GitHub search to find a similar issue and didn’t find it.
  • I searched the FastAPI documentation, with the integrated search.
  • I already searched in Google “How to X in FastAPI” and didn’t find any information.
  • I already read and followed all the tutorial in the docs and didn’t find an answer.
  • I already checked if it is not related to FastAPI but to Pydantic.
  • I already checked if it is not related to FastAPI but to Swagger UI.
  • I already checked if it is not related to FastAPI but to ReDoc.
  • After submitting this, I commit to one of:
    • Read open issues with questions until I find 2 issues where I can help someone and add a comment to help there.
    • I already hit the “watch” button in this repository to receive notifications and I commit to help at least 2 people that ask questions in the future.
    • Implement a Pull Request for a confirmed bug.

Example

Here’s a self-contained, minimal, reproducible, example with my use case:

from fastapi import FastAPI

app = FastAPI()


@app.get("/")
def read_root():
    return {"Hello": "World"}

Description

  • Open the browser and call the endpoint /.
  • It returns a JSON with {"Hello": "World"}.
  • But I expected it to return {"Hello": "Sara"}.

Environment

  • OS: [e.g. Linux / Windows / macOS]:
  • FastAPI Version [e.g. 0.3.0]:

To know the FastAPI version use:

python -c "import fastapi; print(fastapi.__version__)"
  • Python version:

To know the Python version use:

python --version

Additional context

Tracemalloc gave insight on the lines , that are top consumers of memory: (top one seems to be below line in uvicorn) /usr/local/lib/python3.6/site-packages/uvicorn/main.py:305: Line: loop.run_until_complete(self.serve(sockets=sockets))

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 25
  • Comments: 86 (3 by maintainers)

Commits related to this issue

Most upvoted comments

im running into the same issue - memory usage slowly builds over time, runnign on gunicorn with 4 uvicorn workers

Many folks are affected by this issue so definitely something is happening, but it could as well be that the problem is in the user code and not in fastapi. So I suggest that, to make things easier for the maintainers, if you’re affected by this issue

  1. Try to give as many details as possible. How is your setup? What versions/Docker images are you using? What Python version, operating system, gunicorn version, etc?
  2. Detail in what timeframe does your issue appear - weeks, days, hours? Does making more requests to the server accelerate the issue? Does the issue still appear if zero requests are made?
  3. Have a look at memray to better understand your program or use @Apiens method above. Share your results in Gist or similar platforms.

Memory issues are tricky but without a good reproducer, it will be impossible for the maintainers to declare whether this is still a problem or not, and if it is, to fix it.

Hello guys I and my colleagues had a similar issue and we solved it. image

After profiling we found out the coroutines created by uvicorn did not disappear but remain in the memory (health check request, which basically does nothing could increase the memory usage). This phenomenon was only observed in the microservices that were using tiangolo/uvicorn-gunicorn-fastapi:python3.9-slim-2021-10-02 as base image. After changing the Image, the memory did not increased anymore.

  • if you are using tiangolo/uvicorn-gunicorn-fastapi as base docker image, try building from python official image. [it worked for us]
  • if it doesn’t work, profile your own reason. the script below may help you.
# [Memory Leak Profiler]
# REF: https://tech.gadventures.com/hunting-for-memory-leaks-in-asyncio-applications-3614182efaf7

def format_frame(f):
    keys = ["f_code", "f_lineno"]
    return OrderedDict([(k, str(getattr(f, k))) for k in keys])

def show_coro(c):
    data = OrderedDict(
        [
            ("txt", str(c)),
            ("type", str(type(c))),
            ("done", c.done()),
            ("cancelled", False),
            ("stack", None),
            ("exception", None),
        ]
    )
    if not c.done():
        data["stack"] = [format_frame(x) for x in c.get_stack()]
    else:
        if c.cancelled():
            data["cancelled"] = True
        else:
            data["exception"] = str(c.exception())
    return data

async def trace_top20_mallocs(sleep_time = 300):
    """
    See https://docs.python.org/ko/3/library/tracemalloc.html
    """
    # has_snap_shot_before = False

    initial_snapshot = (
        tracemalloc.take_snapshot()
    )  # copy.deepcopy(tracemalloc.take_snapshot())
    while True:
        if tracemalloc.is_tracing():
            snapshot = tracemalloc.take_snapshot()
            top_stats = snapshot.compare_to(
                initial_snapshot, "lineno"
            )  # snapshot.statistics("lineno")
            print(f"[ TOP 20 ] diff {datetime.now()}")
            traces = [str(x) for x in top_stats[:20]]
            for t in traces:
                print(t)
            await asyncio.sleep(sleep_time)


async def show_all_unfinished_coroutine_status(sleep_time=200):
    cnt = 0
    while True:
        await asyncio.sleep(sleep_time)
        tasks = asyncio.all_tasks()
        if len(tasks) != cnt:

            for task in tasks:
                formatted = show_coro(task)
                print(json.dumps(formatted, indent=2))
            cnt = len(tasks)
        print(len(tasks))


loop = asyncio.get_running_loop()
asyncio.ensure_future(trace_top20_mallocs(), loop=loop)
asyncio.ensure_future(show_all_unfinished_coroutine_status(), loop=loop)

I did encounter this issue last week, the root cause looked to be mismatched Pydantic types. For instance we had a int defined in a response model but was actually a float when returned from our database. We also had an int that was also a str. Cleaning up the types solved the issue for use. This was a high traffic endpoint of < 100rps

I’m not sure the root cause but I suspect that the errors are caught in and recored in Pydantic somewhere, I suspect here as FastAPI validates returned responses here.

I see the same thing across all my services using different FastApi/Uvicorn/Gunicorn and Sentry SDK versions. This particular one is running:

  • Python 3.10
  • starlette 0.17.1
  • fastapi 0.75.1
  • sentry-sdk 1.5.0 (maybe it’s causing issues)
  • gunicorn 20.1.0
  • uvicorn 0.17.6

And is receiving short-lived requests that trigger longer but relatively short background tasks (not like I made myself a Celery out of it). Another thing I can think of is my ignorance around the subject of database connections where we go something like

engine = create_async_engine(settings.database_dsn)
session_factory = sessionmaker(engine, expire_on_commit=False, class_=AsyncSession)
Session = async_scoped_session(session_factory, scopefunc=current_task)
Base = declarative_base()


@asynccontextmanager
async def get_db():
    session = Session()
    try:
        yield session
        await session.commit()
    except:
        await session.rollback()
        raise
    finally:
        await session.close()

and then use get_db with a with or Depends(get_gb) (I’m very unsure how to work with a DB and FastAPI but that’s another thing), but when I had this service on sync SQLAlchemy I was running into many issues where the connections were not going back to the connection pool, resulting in timeouts when waiting for a db connection.

Using Gunicorn with the uvicorn.workers.UvicornWorker worker, workers set to 4

Here’s the memory usage (the spike on 10/21 is where I increased a replica count and it immediately hogged a lot of memory): image (green is memory %, red is CPU %)

to put it in traffic context (not that any correlation can be seen): image

I wanted to limit requests for gunicorn to refresh the workers but I’m getting Error while closing socket [Errno 9] Bad file descriptor which seems related to https://github.com/benoitc/gunicorn/issues/1877

Please tell me if I can help i.e. by providing more data.

What is the deal with the original issue of not returning {"Hello": "Sara"}? Was the original issue edited and now doesn’t make sense in this regard?

EDIT: oh, okay. I see. The author included the bug report template without editing it. Sorry for the noise.

it is not the fault of pydantic validation it is not the fault of uvicorn

I don’t think the arguments used there are enough to discard those.

Thanks for the MRE. 👍

EDIT: I still cannot reproduce it: https://github.com/lorosanu/simple-api/issues/1#issue-1474238426. EDIT2: I can see the increase.

Would this simple-api sample help?

As far as I can tell

  • the memory increase is most visible on invalid requests (not exclusively though, but maybe this can point you in the right direction?)
  • it is not the fault of pydantic validation
  • it is not the fault of uvicorn

Related to #596, that issue already contains a lot of workarounds and information, please follow the updates over there.

+1

image

python 3.6
fastapi==0.60.1
uvicorn==0.11.3

uvicorn main:app --host 0.0.0.0 --port 8101 --workers 4
docker:2 core 2GB memory,CentOS Linux release 7.8.2003 (Core)

client call the function below per minute, and server memory usage slowly builds over time.

...
from fastapi import BackgroundTasks
...

@router.get('/tsp/crontab')
def tsp_crontab_schedule(topic: schemas.AgentPushParamsEnum,
                         background_tasks: BackgroundTasks,
                         api_key: str = Header(...)):
    crontab = CrontabMain()

    if topic == topic.schedule_per_minute:
        background_tasks.add_task(crontab.schedule_per_minute)

I have implemented a hacky way to restart workers, but I don’t think it is a good idea to restart services. Waiting for a solution, so that I can remove restart logic and not to care about this weird OOM issue.

I’m also having this issue and I’ve imputed to uvicorn workers. So I opened this issue: https://github.com/encode/uvicorn/issues/1226

@munjalpatel [1] there you see that every route that is not async will be executed in a separate thread … the problem is that it used to use by default a thread pool and this uses up to min(32, os.cpu_count() + 4) workers [2] so i assume that on some python version this workers are not reused or released and you end up increasing memory. I wrote a litte test app demonstrate that. [3]

The implementation run_in_threadpool from [1] is coming from starlette 0.14.2 [4] (fastapi pinned to that version). but they changed there code to anyio [5]. I just looked briefly into the anyio code but to me it looks like an update to the new starlette ,anyio version could fix that memory problem. But 🤷‍♂️ if fastapi will update to the new starlette.

[1] https://github.com/tiangolo/fastapi/blob/master/fastapi/routing.py#L144 [2] https://docs.python.org/3/library/concurrent.futures.html#threadpoolexecutor [3] https://github.com/tiangolo/fastapi/issues/596#issuecomment-734880855 [4] https://github.com/encode/starlette/blob/0.14.2/starlette/concurrency.py#L27 [5] https://github.com/encode/starlette/blob/master/starlette/concurrency.py#L27

Are you able to share a minimal, reproducible, example?

I’m wondering if the run_in_threadpool(field.validate, ...) call here in combination with the global EXC_TYPE_CACHE used by Pydantic’s validation could be contributing.

@lamnguyenx By converting async http_exception_handler function to normal function, the issue remains.

image

Could you try removing the custom http exception handling for now, and then re-run the load testing?

Anyway, I did see that the graph looks a bit less steep (the orange region) after you applied the workaround. Maybe some other things caused the leak too?

Related to #596, that issue already contains a lot of workarounds and information, please follow the updates over there.

It looks like #596 is due to using def instead of async def ? Ive seen this with async def

Can confirm I am experiencing the same issue. Using Python 3.10 + FastAPI + Hypercorn[uvloop] with 2 workers. The FastAPI project is brand new, so there isn’t any tech debt that could possibly be the cause - no models, schemas or anything fancy being done here.

[tool.poetry.dependencies]
python = "^3.9"
celery = "^5.2.3"
neo4j = "^4.4.3"
hypercorn = {extras = ["uvloop"], version = "^0.13.2"}
fastapi = "^0.77.1"

The Docker container starts at around 105.8 MiB of RAM usage when fresh.

After running a Locust swarm (40 Users) all hitting an endpoint that returns data ranging from 200KB to 8MB - the RAM usage of the Docker container grows (and sometimes shrinks, but mostly grows) until I get an OOM exception. The endpoint retrieves data from the Neo4J database and closes the driver connection cleanly each time.

I had some success making the function async def even though there was nothing to await on. But it seems that FastAPI is still holding onto some memory somewhere… caching?

I’m curious why this topic isn’t more popular; surely everyone would be experiencing this. Perhaps we all notice it due to our endpoints returning enough data for us to notice the increase in usage, whereas the general user would most times only return a few KB at a time.

Additional details: Docker CMD

CMD ["hypercorn", "app.main:app", "--bind", "0.0.0.0:8001", "--worker-class", "uvloop", "--workers", "2"]

@Xcompanygames Consider Using ONNX instead of TF as it’s usually faster and more reliable.

I’m having a memory leak, but i think is because the inference data stays on memory / gets dupped at some point. I’ll update later if the issue is not related with the inference process.

Update: I wasn’t closing correctly the onnx inference session. The memory accumulation is almost unnoticeable now!

I am also facing a similar issue. All my api endpoints are defined with async.

One thing I have observed though. When I comment out my background tasks (which mainly consist of Database update queries) a consistent increase in RAM was not observed even after load testing with Locust at 300RPS.

If it helps the database I am using is Postgres.

Versions : Python 3.8.10 Fastapi 0.63.0 Uvicorn 0.13.4

In my case:

  • I tried multiple memory profiling tools but they didn’t work well with such complex application like FastAPI
  • tracemalloc() did partly work but the it didn’t report the memory correctly. I was able to track down another memory leaking due to not closing tempfile correctly. However, this time it’s different. I guess it had trouble working with asyncio and ThreadPoolExecutor.

Finally, I was able to track down the memory leak by simply commenting block by block in my code. Turns out, it was due to this snippet at the top of main.py:

from starlette.exceptions import HTTPException as StarletteHTTPException
from fastapi.responses import PlainTextResponse
@app.exception_handler(StarletteHTTPException)
async def http_exception_handler(request, exc):
    return PlainTextResponse(str(exc.detail), status_code=exc.status_code)

By converting that async function to normal function. The memory stopped leaking (I used locust to spawn 4000 users uploading an audio file of 20 seconds):

from starlette.exceptions import HTTPException as StarletteHTTPException
from fastapi.responses import PlainTextResponse
@app.exception_handler(StarletteHTTPException)
def http_exception_handler(request, exc):
    return PlainTextResponse(str(exc.detail), status_code=exc.status_code)

Python Version: 3.8.9 FastAPI Version: 0.67.0 Environment: Linux 5.12.7 x86_64

We are able to consistently produce a memory leak by using a synchronous Depends:

from fastapi import FastAPI, Body, Depends
import typing
import requests

app = FastAPI()

def req() -> bool: 
    r = requests.get("https://google.com")
    return True

@app.post("/")
def root(payload: list = Body(...), got: bool = Depends(req)):
    return payload

This is resolved by switching both endpoint and depends to async def. This took us a while to hunt down. At first we also thought it only occurred on EC2, but that’s because we were disabling our authentication routines for local testing, which is where the issue was located. For those struggling here: check your depends, if you’ve got them.

What version of Python are you using @lmssdd ?

@ycd it’s 3.8.5, running on a DigitalOcean droplet. it’s on gunicorn with Uvicorn workers.

Same issue here. Did anybody find a good workaround in the meantime?

run into the same error def -> memory leak async def -> no memory leak thanks @curtiscook

Ah. I’ll try and do some profiling. Unfortunately my time is pretty scarce these days with the number of different projects I’m working on but fingers crossed.

@curtiscook The max-requests restarts the service completely, we need to configure workers to keep one running always, when we restart another, was able to solve memory issue, but got into one more, now sometimes I get multiple requests to workers with same data and each worker creates new entry into database.