python-socketio: Emitting events to clients stops working

Describe the bug This is a bit of a nasty bug I’ve been trying to diagnose over the last month or two. I’ve been desperately trying to come up with reproduction steps, but so far haven’t had any luck. What I do have, though, is a great deal of information about the symptoms. I’m hoping you may be able to help point me in the right direction so I can provide more information to you.

For a bit of context about the application: I have several python-socketio instances deployed that push out events to clients throughout the day. The events are generated from a Django app that utilizes a RedisManager in write_only mode. The python-socketio side is fully async, and uses an AsyncRedisManager. The python-socketio instances are deployed in ASGI mode, running via a UnicornWorker class on gunicorn.

After a few weeks, I started observing that certain connected clients were not receiving messages. After a bit of diagnosing, I identified that, of the 3 python-socketio instances running, 1 of them was causing the issue. Clients connecting to that instance could connect, emit events (and receive responses), as well as send/receive heartbeat messages, but they were not receiving events emitted from the server itself.

I initially blamed it on a possible disconnection between that python-socketio instance and redis, but with further diagnostics I was able to see that any calls to emit, even local ones from within the namespace, are not delivered. To illustrate this point, I added this method to my namespace:

async def on_log(self, sid, msg):
    logger.warn(f'[Message from "{sid}"] {msg}')
    await self.emit(Event.LOG, data=msg, room=sid)

On clients connected to a functioning python-socketio instance, emitting a log received a response from the server as expected: working

However, on the python-socketio instance experiencing this issue, nothing was sent back to the client.

I can confirm that the instance did indeed get the event from the client though, because the logger.warn() was being called: Screen Shot 2021-12-02 at 11 36 27 AM

The thing that is particularly interesting about this is that returning data from an event on the server does still work. When the frontend clients connect, they emit an event which always receives a response from the server. Even when emitting an event to a python-socketio instance with this issue, that always gets delivered. So it seems to specifically be something about the emit process that is breaking down.

This issue generally starts happening to an instance after a few days. I’ve seen it happen as soon as 1 day after a container being created, and as long as a few weeks.

This is everything I can think of right now. If you have any insight at all as to what I could do to further investigate, please let me know. I know this is a difficult ask because there aren’t any repro steps, but I’m hoping with your full understanding of the system you may have some ideas. Thank you so much!

To Reproduce None yet, see above

Expected behavior Emitted messages should not stop getting delivered.

Logs I am going to redeploy with additional logging enabled. Nothing was in the normal logs.

Additional context I’m using uvloop, and I initially thought it could’ve been an outdated version of it causing issues. I’ve since upgraded uvloop to 0.16.0 and the issue still happens.

Full versions are below:

python-socketio==5.4.1
aioredis==1.3.1
hiredis==2.0.0
websockets==9.1  # Temporarily pinned to 9.1 until the closing issue is fixed (https://github.com/aaugustin/websockets/issues/1072)
gunicorn==20.1.0
uvicorn[standard]==0.15.0
uvloop==0.16.0

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 24 (11 by maintainers)

Most upvoted comments

Closing as this appears to be an issue with the aioredis client. The issue popped up again so I killed the connection from redis, and it didn’t reconnect. Here’s hoping aioredis 2 fixes the issue. Thank you so much for all of your help! I’ll follow up once I get some data from aioredis 2 just in case anyone stumbles across this in the future.

@miguelgrinberg ahh okay, thank you for clarifying that! I’ll do more digging and will report back when I have better logs. Hopefully soon but it could be a few weeks. Thanks again!