openai-python: Memory leak in the chat completion `create` method

Confirm this is an issue with the Python library and not an underlying OpenAI API

  • This is an issue with the Python library

Describe the bug

Calling the create method on completions introduces a memory leak, according to tracemalloc.

Example:

client = OpenAI(MY_KEY)
client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "Can you write a poem?",
        }
    ],
    model="gpt-3.5-turbo"
)

How to determine it’s a memory leak?

I use tracemalloc with my flask application:

@blueprint.route("/admin/sys/stats")
def admin_sys_stats():
    snapshot = tracemalloc.take_snapshot()
    top_stats = snapshot.statistics('lineno')

    from openai import OpenAI

    client = OpenAI(KEY)
    client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": "Can you write a poem?",
            }
        ],
        model="gpt-3.5-turbo"
    )

    stats = ""
    for stat in top_stats[:1000]:
        if grep in str(stat):
            stats += str(stat) + "\n"

    return f"<pre>{stats}</pre>", 200

When running this endpoint multiple times, one line is at the very top (which means it’s the most expensive one):

\venv\Lib\site-packages\openai\_response.py:227: size=103 KiB, count=1050, average=100 B

When I refresh, the size increases. Of course, in a production environment, the numbers get high a lot quicker.

To Reproduce

There’s no one way to prove there’s a memory leak. But what I did was:

  1. Setup a flask application
  2. Create the route provided in the bug description
  3. Hit the route multiple times, you’ll see an increase in the size of the object

Code snippets

No response

OS

Linux, macOS

Python version

Python v3.11.2

Library version

openai v1.2.4

About this issue

  • Original URL
  • State: closed
  • Created 8 months ago
  • Reactions: 3
  • Comments: 30

Most upvoted comments

Wouldn’t “moving the client outside of the request handler” be the same thing as a module level client?

From the README, I figured the network resources are getting cleaned up when the OpenAI client gets garbage collected. Sounds like this isn’t necessarily true?

This might be outside of the scope of this library but I’m also trying to understand how this fits in your typical backend app.

It seems like using a module scoped client is heavily discouraged but so is instantiating a new client with every request (ie a FastAPI dependency). What is the other option here?

You shouldn’t be instantiating a new client for every request, if you move the client outside of the request handler you shouldn’t see any memory leaks.

I did manage to replicate the memory leak you’re seeing with your example and moving the client instantiation outside of the function resulted in stable memory usage.

@RobertCraigie

How can we implement an async client with fastapi correctly?

Is it safe to implement like this (singleton approach) ? https://github.com/tiangolo/fastapi/discussions/8301#discussioncomment-5150759

Another problem is that we have multiple api keys like below. How can we handle this nicely if we have to instantiate a client once as mentioned in this issue https://github.com/openai/openai-python/issues/874?

    client = AsyncAzureOpenAI(
        api_key=api_keys_list[random_index],
        ...

@brian-goo you can create a single client and use FastAPI app events to close it when the server is stopped and then use client.with_options(api_key='foo') to use different API keys for different requests.

Anyway, it really does not matter as its not in the scope of this issue. Focus is that there seems to be memory leaks, regardless of how the library is used.

@RobertCraigie that’s super helpful, both of those patterns. Didn’t realize you could pass in the http_client for the client.

I pushed client.close() a couple of hours ago BTW, and that solved my memory leak issues. Thanks for your help on this!

Screenshot 2023-12-03 at 4 58 49 PM

It’s a bit scary to potentially serve 1 user’s request with another user’s api key, even if that can only happen due to a bug in my code

This is a fair point, I do think there are ways you could structure your code to make this practically impossible to happen though. Even for anyone making changes to your codebase for the first time.

For example, instead of using .with_options() yourself, you could define a helper function that you use whenever you need to make an OpenAI request, e.g.

# utils/openai_client.py
from openai import AsyncOpenAI

_client = AsyncOpenAI(
  # provide a dummy API key so that requests made directly will always fail
  api_key='<this client should never be used directly!>',
)

def get_openai(user: User) -> AsyncOpenAI:
  return _client.with_options(api_key=user.openai_api_key)

# backend.py

async def handler():
  user = get_user()

  completion = await get_openai(user).chat.completions.create(...)
  # ...

It’s also worth noting that you could use your existing pattern and avoid memory leaks entirely by sharing the http client instance if you really don’t want to use with_options, the only risk here is that you could instantiate a client and forget to pass in your http client which then gives you memory leaks again.

import httpx
from openai import AsyncOpenAI

http_client = httpx.AsyncClient()

async def handler(api_key):
  openai_client = AsyncOpenAI(api_key=api_key, http_client=http_client)

I would argue though that the first snippet in this comment is less likely to result in accidentally using the incorrect API key because you could accidentally just write AsyncOpenAI() and then it will use your API key from your environment.

However with a helper function similar to get_openai() you’re encoding the setup of your system directly into it so instead of having to know to pass api_key=user_api_key you’re forced to by your own function definition.

The only way you’ll accidentally use the wrong API key is by forgetting to use this helper function, which it may be possible to define lint rules for (I don’t know of an easy way to do this off the top of my head).

Hi!

Started noticing a weird behaviour in our api server and after digging a bit deeper I think I found the underlaying issue with memory leaks. The way classes uses references to self won’t allow for resources to be freed as classes have objects that in turn have references to the class it self.

After testing a bit with forced close functions that frees resources, it runs deinit and frees memory as it should.

image image image