channels_redis: Memory leak when using channels redis PubSub layer
Hello! On the production service, we are using channels and channels Redis to deliver heavy updates on the web socket clients and it has worked fine for us for the last six months.
However, since the day that we switched to using the new PubSubChannelLayer, we are facing the problem of the constant infinite growth of memory consumption. and the only way around that we have found is restarting the server process on a constant interval to release the memory back.
Here you could find some metadata about the system that we are running:
Socket Updates are mostly generated by management commands which are running outside of the context of server processes and they call group_send to provide consumers with new updates.
channels==3.0.4
channels-redis==3.3.0
the server is running under an Nginx - uvicorn==0.12.1 stack with the supervisor as the process manager. however, we have tested gunicorn and daphne as well and the memory problem stayed the same.
OS: Ubuntu Server 18
About this issue
- Original URL
- State: open
- Created 3 years ago
- Reactions: 10
- Comments: 24 (12 by maintainers)
bump
@Sharpek @pbylina how about you guys spend an ounce of energy on this, instead of posting unhelpful comments.
Any progress here, maybe?
Yes, I’ll put some time on my calendar in the next few weeks to create a PR to django/channels.
@fosterseth Thanks for investigating this. Your discoveries got me thinking, and now I have some ideas on what is going wrong here.
First, some context:
The giant comment I wrote here is because django/channels calls
new_channel()(here), but it doesn’t offer an obvious way “clean up” that new channel… thus the only way I could figure out how to clean up (i.e. calldel self.channels[channel]) was to wait for aCancelledErrorinreceive()(here) and use that opportunity to clean stuff up.So, what’s going wrong:
@fosterseth I think you’ve found an execution path where
CancelledErroris never thrown intoreceive(), thus the PubSub channel layer never gets to clean up. Looking at how that might happen… perhaps if an exception is thrown here, then the channel layer’sreceive()is never rescheduled, thus is never canceled. But… whatever the reason… I think we cannot solve this without changing django/channels, which is not too surprising and was the reason I wrote that original big comment. Basically, I believe we’re doing the best we can in PubSub to clean up given that django/channels doesn’t give us a chance to clean up.How to fix:
I think we should add code (perhaps here) … something like:
That is,
clean_channel()will cleanup anything done bynew_channel()… thus an opportunity to clean up that is guaranteed to be called.Then of course we’d change the PubSub impl to clean up in
clean_channel()instead of how it does it now.Memory going down…
Memory management at the OS-level is whack enough (i.e. calling
free()in most real-world applications doesn’t cause a drop in memory consumption reported by the OS, due to fragmentation). Then you add cpython’s memory manager / garbage collector on top of that… and you just cannot expect memory to go down even in times you might expect it. (similar to what @qeternity is saying as well)My use case…
Again, like @qeternity’s case, in our case we cycle in/out containers often enough (usually because we’re deploying new versions of our app) that this leak isn’t hurting us in production. That said, it would certainly be great to fix it.
another thing to note – when I control + C my client, I see this traceback logged in the
runserverterminalTo get around this traceback I added some try / except in my consumers.py
Now when my client disconnects, I no longer see memory growth, which is good. However, I expected the memory to go down (i.e. all of those buffered messages should be released), but it does not. I don’t believe that queue is being cleaned up.
If there are messages in the channel Queue, and the consumer disconnects (I control + C my client), I am not hitting https://github.com/django/channels_redis/blob/bba93196d8fe5e5fbfc470350c1f3da168c56739/channels_redis/pubsub.py#L191
But if there are no messages in the channel Queue, and the consumer disconnects, then I do hit that
del self.channels[channel]I can reproduce this reliably each time.
This article might help some of you:
https://www.paradigm.co/blog/anatomy-of-a-python-memory-leak
Ha! I just saw that @acu192 is already on it about 20 min ago!
@fosterseth I’m having a quick look now, just in case the far more capable @acu192 is busy at the moment
put up a PR ^
hopefully someone can help test this out
@fosterseth given CPython’s memory allocator, is this really to be expected?
There is a memory leak somewhere, unfortunately we end up cycling k8s containers often enough that it’s not much of an issue and haven’t investigated further.
I set up a basic channels app to explore this a bit
my generator
I have a client that reads these (from the consumer) as fast as it can. Importantly, I let the client run for a bit, then hard control + C the client to disconnect it. I see the server detects this disconnect too
Now, the generator is still pumping out tons of messages, and I notice the memory of my “python manage.py runserver” process is growing a lot. After all the messages are sent, the memory usage remains high indefinitely (does not go back down).
Note: I did not notice this growth when using
channels_redis.core.RedisChannelLayer, only when using PubSubtoward the end of sending the messages, I reconnect my client. I have a pdb debugger set up on the consumer to trigger after all of the messages have sent, and I use guppy3 to print a heap
hp = guppy.hpy() hp.heap()
okay so 68% of that memory are
bytes, let’s zoom inhp.heap()[0].byrcs
all of that data is in collections.deque objects
Hopefully this sheds some light on this issue