godot: Websocket server crashes when client disconnects (50% of the time).
Godot version: Latest 3.2alpha3 and master
OS/device including version: Happens on Win10 and Ubuntu 18.x
Issue description:
I get the following error log, I suspect it could be because the server is trying to send an rpc call to a client that just left?
ERROR: _get_socket_error: Socket error: 10054
At: drivers/unix/net_socket_posix.cpp:202
ERROR: put_packet: Condition ' !is_connected_to_host() ' is true. returned: FAILED
At: modules/websocket/wsl_peer.cpp:238
Client 1252280632 was unregistered
ERROR: put_packet: Condition ' !is_connected_to_host() ' is true. returned: FAILED
At: modules/websocket/wsl_peer.cpp:238
Steps to reproduce: Occasionally (around 50% of the time), the server crashes when the client disconnects. I’m using the same code from the websockets demo projects from the demos repo so there is nothing custom going on in that area.
Minimal reproduction project: The websockets projects from the demos repository.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 33 (19 by maintainers)
Yeah, the culprit doesn’t seem related to the websocket implementation at all. Even from the stack trace, it seems a GDScript call to a freed object
So maybe I found something that may help. Digging a bit into the websocket code (mainly lws_server in 3.1 and wsl_server in 3.2) I found what I think might be causing my error in the
disconnect_peer
function around line 286https://github.com/godotengine/godot/blob/cc3b7d2ee2bcd0a4f8f88421fcdca6436b2416b1/modules/websocket/wsl_server.cpp#L285-L289
From this code you can see that an error check is being done to check if the passed peer id is in the
_peer_map
. This check is being repeated again though inget_peer
https://github.com/godotengine/godot/blob/cc3b7d2ee2bcd0a4f8f88421fcdca6436b2416b1/modules/websocket/wsl_server.cpp#L268-L271
If this one fails though NULL is returned. This means if anything modifies that map between the first and second check and removes that peer the
close
method will be called on a NULL pointer causing the segfault. While this could be intentional I assume this method should actually just be logging the error and continuing on its way.Due to this I think the way to fix it is similar to the fix that was applied to the
websocket_multiplayer_peer
by @Faless on PR #31482 and checking if the peer is null instead of just checking if the peer id exists.I do not know if this would fix the issue that @asheraryam is encountering but there seems to be a fair number of instances of
get_peer
being used directly without null checks withinwebsocket_multiplayer_peer
so maybe one of those is failing under similar circumstances.This is all just a theory right now though so I could be wrong. I have the changes implemented for 3.1 and plan on testing it tomorrow so maybe I might be able to get more useful information then.
Agreed. This was just an observation on my journey… Well… FYI, I’m really close. I decided to work my way backwards from my Asteroids project towards my PlatformBuddies project, and I fixed the websocket server issue. I’m in the process of trying to inject the code that causes the issue in my Asteroids project into Platform Buddies and I’ll paste that up here when I can crash the server! 😛 Hopefully soon, but I am leaving for the next couple hours. Be back later!
I think we are hitting a similar situation. At a glance I think it might be a race condition where a user disconnects in the middle of an RPC call. Upping the packet and buffer size seemed to help some but we still get it quite frequently if we have a fair number of players coming in and out.
Here is our log and stack dump: