openai-python: Memory leak in the chat completion `create` method
Confirm this is an issue with the Python library and not an underlying OpenAI API
- This is an issue with the Python library
Describe the bug
Calling the create method on completions introduces a memory leak, according to tracemalloc.
Example:
client = OpenAI(MY_KEY)
client.chat.completions.create(
messages=[
{
"role": "user",
"content": "Can you write a poem?",
}
],
model="gpt-3.5-turbo"
)
How to determine it’s a memory leak?
I use tracemalloc with my flask application:
@blueprint.route("/admin/sys/stats")
def admin_sys_stats():
snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('lineno')
from openai import OpenAI
client = OpenAI(KEY)
client.chat.completions.create(
messages=[
{
"role": "user",
"content": "Can you write a poem?",
}
],
model="gpt-3.5-turbo"
)
stats = ""
for stat in top_stats[:1000]:
if grep in str(stat):
stats += str(stat) + "\n"
return f"<pre>{stats}</pre>", 200
When running this endpoint multiple times, one line is at the very top (which means it’s the most expensive one):
\venv\Lib\site-packages\openai\_response.py:227: size=103 KiB, count=1050, average=100 B
When I refresh, the size increases. Of course, in a production environment, the numbers get high a lot quicker.
To Reproduce
There’s no one way to prove there’s a memory leak. But what I did was:
- Setup a flask application
- Create the route provided in the bug description
- Hit the route multiple times, you’ll see an increase in the
sizeof the object
Code snippets
No response
OS
Linux, macOS
Python version
Python v3.11.2
Library version
openai v1.2.4
About this issue
- Original URL
- State: closed
- Created 8 months ago
- Reactions: 3
- Comments: 30
Wouldn’t “moving the client outside of the request handler” be the same thing as a module level client?
From the README, I figured the network resources are getting cleaned up when the OpenAI client gets garbage collected. Sounds like this isn’t necessarily true?
This might be outside of the scope of this library but I’m also trying to understand how this fits in your typical backend app.
It seems like using a module scoped client is heavily discouraged but so is instantiating a new client with every request (ie a FastAPI dependency). What is the other option here?
You shouldn’t be instantiating a new client for every request, if you move the client outside of the request handler you shouldn’t see any memory leaks.
I did manage to replicate the memory leak you’re seeing with your example and moving the client instantiation outside of the function resulted in stable memory usage.
@RobertCraigie
How can we implement an async client with fastapi correctly?
Is it safe to implement like this (singleton approach) ? https://github.com/tiangolo/fastapi/discussions/8301#discussioncomment-5150759
Another problem is that we have multiple api keys like below. How can we handle this nicely if we have to instantiate a client once as mentioned in this issue https://github.com/openai/openai-python/issues/874?
@brian-goo you can create a single client and use FastAPI app events to close it when the server is stopped and then use
client.with_options(api_key='foo')to use different API keys for different requests.Anyway, it really does not matter as its not in the scope of this issue. Focus is that there seems to be memory leaks, regardless of how the library is used.
@RobertCraigie that’s super helpful, both of those patterns. Didn’t realize you could pass in the http_client for the client.
I pushed
client.close()a couple of hours ago BTW, and that solved my memory leak issues. Thanks for your help on this!This is a fair point, I do think there are ways you could structure your code to make this practically impossible to happen though. Even for anyone making changes to your codebase for the first time.
For example, instead of using
.with_options()yourself, you could define a helper function that you use whenever you need to make an OpenAI request, e.g.It’s also worth noting that you could use your existing pattern and avoid memory leaks entirely by sharing the http client instance if you really don’t want to use
with_options, the only risk here is that you could instantiate a client and forget to pass in your http client which then gives you memory leaks again.I would argue though that the first snippet in this comment is less likely to result in accidentally using the incorrect API key because you could accidentally just write
AsyncOpenAI()and then it will use your API key from your environment.However with a helper function similar to
get_openai()you’re encoding the setup of your system directly into it so instead of having to know to passapi_key=user_api_keyyou’re forced to by your own function definition.The only way you’ll accidentally use the wrong API key is by forgetting to use this helper function, which it may be possible to define lint rules for (I don’t know of an easy way to do this off the top of my head).
Hi!
Started noticing a weird behaviour in our api server and after digging a bit deeper I think I found the underlaying issue with memory leaks. The way classes uses references to self won’t allow for resources to be freed as classes have objects that in turn have references to the class it self.
After testing a bit with forced close functions that frees resources, it runs deinit and frees memory as it should.