uvicorn: Using UvicornWorkers in Gunicorn cause OOM on K8s

Checklist

The bug is reproducible against the latest release and/or master.
There are no similar issues or pull requests to fix it yet.

Describe the bug

I’m developing a FastApi application deployed on a Kubernetes cluster using gunicorn as process manager.

I’m also using UvicornWorkers for sure, because of the async nature of fastapi.

After the application deployment I can see the memory growing up at rest, until OOM.

This happen just when I use UvicornWorker.

Tests made by me:

Comment all my code to ensure is not a my application mem leak (leak present);
Start the application using uvicorn instead of gunicorn (no leak present);
Start application using gunicorn sync workers (no leak present);
Start application using gunicorn + UvicornWorker (leak present);
Start application using gunicorn + UvicornWorker + max_requests (leak present);

Plus, this happens just on the Kubernetes cluster, when I run my application locally (MacBook pro 16) (is the same docker image used on k8s) the leak is not present.

Anyone else had a similar problem?

[!IMPORTANT]

We’re using Polar.sh so you can upvote and help fund this issue.

We receive the funding once the issue is completed & confirmed by you.

Thank you in advance for helping prioritize & fund our backlog.

<picture> <source media="(prefers-color-scheme: dark)" srcset="https://polar.sh/api/github/encode/uvicorn/issues/1226/pledge.svg?darkmode=1"> Fund with Polar

</picture>

About this issue

Original URL
State: closed
Created 3 years ago
Reactions: 8
Comments: 41 (20 by maintainers)

Commits related to this issue

Check application health over HTTPS instead of TCP Closes #35. This reverts commit ba043be080e3bc02f3985559c58949a6af151beb. — committed to rouge8/wanikani-apprentice by rouge8 3 years ago
Check application health over HTTP instead of TCP This may also fix a memory leak: https://github.com/encode/uvicorn/issues/1226 Closes #35. — committed to rouge8/wanikani-apprentice by rouge8 3 years ago

Most upvoted comments

EDIT: it seems that I cannot reproduce the numbers.

Hey, I encounter the same issue. My code snippet to reproduce the issue:

async def app(scope, receive, send):
    assert scope['type'] == 'http'

    data = [0] * 10_000_000

    await send({
        'type': 'http.response.start',
        'status': 200,
        'headers': [
            [b'content-type', b'text/plain'],
        ],
    })
    await send({
        'type': 'http.response.body',
        'body': b'Hello, world!',
    })

Launched using uvicorn main:app --host 0.0.0.0 --port 8000. After sending ~10k requests to the API, memory usage goes from ~18MB to 39MB. The same thing happens with startlette and apidaora (launching them with uvicorn) and the memory usage patterns looks quite similar. I also tested flask with gunicorn, sanic and their memory usage stayed ± the same. I also checked PR#1244, but the issue persists.

Library	ASGI	memory (start)	memory (end)	requests
uvicorn	uvicorn	18 MB	38 MB	11k
starlette	uvicorn	19 MB	39 MB	10k
apidoara	uvicorn	18 MB	39 MB	12k
uvicorn PR#1244	uvicorn	18 MB	39 MB	10k
quart	hypercorn	26 MB	63 MB	10k
apidoara	hypercorn	16 MB	42 MB	10k
starlette	daphne	34 MB	46 MB	10k
apidoara	daphne	32 MB	45 MB	10k
flask	gunicorn	35 MB	35 MB	10k
sanic	sanic	17 MB	22 MB	10k

evaldask on Nov 29, 2021

For the record: Issue was solved by uvicorn 0.17.1.

Kludex on Jan 28, 2022

It seems that I cannot reproduce the memory increase anymore 🤔 I ran my experiments in docker and taken notes about usage, but as of today the memory stays ± the same after 10k requests. I guess base docker image or any of the python dependencies changed, but I have no idea what happened.

evaldask on Nov 29, 2021

@KiraPC @adamantike Thanks so much for the detailed reports and debugging so far.

IIUC this seems to be related to #869 introducing some kind of connection leak that’s visible if we let a server hit by health checks run for a few minutes.

The keepalive timeout exploration in #1192 is interesting. I wonder if the keepalive task is being properly cleaned up too? We use a separate connection_lost future now. Maybe we need to pass it to the keepalive task?

florimondmanca on Nov 8, 2021

I have created a gist with the configuration I used to bisect the memory leak, based on what @KiraPC provided, but using tcping within Docker, and docker stats to avoid any other dependencies that could affect the memory measurements.

https://gist.github.com/adamantike/d2af0f0fda5893789d0a1ab71565de48

adamantike on Nov 7, 2021

A few stats that could allow a faster review and merge for PR #1192. All scenarios running for 10 minutes in the same testing environment:

Uvicorn 0.13.4: 0 MiB memory increase (initial: 22 MiB, after tested period: 22 MiB)
Uvicorn on current master: 78 MiB memory increase (initial: 22 MiB, after tested period: 100 MiB)
Uvicorn on current master, with PR #1192 applied: 5 MiB memory increase (initial: 22 MiB, after tested period: 27 MiB)

adamantike on Nov 5, 2021

@KiraPC, thanks for the detailed explanation on how to reproduce it! I was able to test it locally, and bisected the library to find the commit where Uvicorn started leaking memory.

The memory leak starts being reproducible on this commit: https://github.com/encode/uvicorn/commit/960d4650db0259b64bc41f69bc7cdcdb1fdbcbf3 (#869).

I tested #1192 applied to the current master branch, and the memory leak seems to be still happening, but is notoriously slower. I will keep researching to find the root cause.

adamantike on Nov 5, 2021

I tested again using the instructions from https://github.com/encode/uvicorn/issues/1226#issuecomment-962644922, with 20 concurrent tcping invocations over 10 minutes.

Uvicorn 0.17.0: 184 MiB memory increase (initial: 22 MiB, after tested period: 206 MiB)
Uvicorn 0.17.1: 0 MiB memory increase (initial: 22 MiB, after tested period: 22 MiB) 🎉

adamantike on Jan 28, 2022

I ran some more tests with images to grow memory usage faster. I arrived to 2 conclusions:

the issue is with asyncio and not uvicorn
the memory stops growing after a while. I cannot tell when, but I saw that memory growth stabilized after some time.

Code to reproduce the memory consumption with asyncio only:

import httpx
import asyncio

async def main( host, port):
    async def handler(reader: asyncio.StreamReader, writer: asyncio.StreamWriter) -> None:
        async with httpx.AsyncClient() as client:
            r = await client.get('https://raw.githubusercontent.com/tomchristie/uvicorn/master/docs/uvicorn.png')
            data = await r.aread()

        body = "Hello World!"
        response = "HTTP/1.1 200\r\n"
        response += f"Content-Length: {len(body)}\r\n"
        response += f"Content-Type: text/html; charset=utf-8\r\n"
        response += "Connection: close\r\n"
        response += "\r\n"
        response += body
        writer.write(response.encode("utf-8"))
        await writer.drain()
        writer.close()

    server = await asyncio.start_server(handler, host=host, port=port)

    async with server:
        await server.serve_forever()

asyncio.run(main("0.0.0.0", "8000"))

This sample code started ~20MB of ram usage and after 30k requests used ~280MB.

evaldask on Nov 23, 2021

Tangentially related, if you’re already using Kubernetes there’s no reason (unless you have something specific to your project) to use Gunicorn as a process manager. Kubernetes should be running your workers directly. Less layers, less complexity and your readiness/liveness probes will be more correct.

Tinche on Nov 18, 2021

Opened #1244 with a possible fix — at least on my machine. Happy for you to try it out @KiraPC @adamantike.

Edit: meh, #1244 seems to break a bunch of fundamental functionality, at least with the test HTTPX client. Needs refining…

florimondmanca on Nov 8, 2021

@adamantike I am very happy that he was able to help. It is the minimum for the community.

In next days, if I’lll be able to find some free times, I’ll try to have a look.

KiraPC on Nov 5, 2021

Next step would be to play with --loop and --http CLI parameters.

Kludex on Oct 22, 2021

I guess I figured out. Thanks to @Reikun85.

The should be caused by the tcp connection. That is why on my local I did not have the problem but on K8s yes. On the cluster there are some tcp ping from loadbalancer and so on.

So I replicated locally with tcping, and the leak appeared also on my PC.

I also tested the problem with different uvicorn version, and the leak appear from uvicorn>=0.14.0, no problem with 0.13.4.

Also noticed that the leak is present just using the “standard” version of uvicorn and not the full one. NB: the standard version is the most used when using gunicorn as a process manager to run uvicorn workers.

To replicate the problem I create an example gist, just build the img with both version, using the two provided dockerfile, than ping the application with a tcp ping tool, you can verify that app memory will grow up never stopping.

Here the gist: https://gist.github.com/KiraPC/5016ecee2ae1dd6e860b4494415dbd49

Let me know in case of more information or if something is not clear.

KiraPC on Oct 22, 2021