godot: Websocket server crashes when client disconnects (50% of the time).

Godot version: Latest 3.2alpha3 and master

OS/device including version: Happens on Win10 and Ubuntu 18.x

Issue description:

I get the following error log, I suspect it could be because the server is trying to send an rpc call to a client that just left?

ERROR: _get_socket_error: Socket error: 10054
   At: drivers/unix/net_socket_posix.cpp:202
ERROR: put_packet: Condition ' !is_connected_to_host() ' is true. returned: FAILED
   At: modules/websocket/wsl_peer.cpp:238
Client 1252280632 was unregistered
ERROR: put_packet: Condition ' !is_connected_to_host() ' is true. returned: FAILED
   At: modules/websocket/wsl_peer.cpp:238

Steps to reproduce: Occasionally (around 50% of the time), the server crashes when the client disconnects. I’m using the same code from the websockets demo projects from the demos repo so there is nothing custom going on in that area.

Minimal reproduction project: The websockets projects from the demos repository.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 33 (19 by maintainers)

Most upvoted comments

TLDR: Looks like a questionable script that spits out stats caused errors during the WebSocket connection process.

Yeah, the culprit doesn’t seem related to the websocket implementation at all. Even from the stack trace, it seems a GDScript call to a freed object

So maybe I found something that may help. Digging a bit into the websocket code (mainly lws_server in 3.1 and wsl_server in 3.2) I found what I think might be causing my error in the disconnect_peer function around line 286

https://github.com/godotengine/godot/blob/cc3b7d2ee2bcd0a4f8f88421fcdca6436b2416b1/modules/websocket/wsl_server.cpp#L285-L289

From this code you can see that an error check is being done to check if the passed peer id is in the _peer_map. This check is being repeated again though in get_peer

https://github.com/godotengine/godot/blob/cc3b7d2ee2bcd0a4f8f88421fcdca6436b2416b1/modules/websocket/wsl_server.cpp#L268-L271

If this one fails though NULL is returned. This means if anything modifies that map between the first and second check and removes that peer the close method will be called on a NULL pointer causing the segfault. While this could be intentional I assume this method should actually just be logging the error and continuing on its way.

Due to this I think the way to fix it is similar to the fix that was applied to the websocket_multiplayer_peer by @Faless on PR #31482 and checking if the peer is null instead of just checking if the peer id exists.

I do not know if this would fix the issue that @asheraryam is encountering but there seems to be a fair number of instances of get_peer being used directly without null checks within websocket_multiplayer_peer so maybe one of those is failing under similar circumstances.

This is all just a theory right now though so I could be wrong. I have the changes implemented for 3.1 and plan on testing it tomorrow so maybe I might be able to get more useful information then.

This doesn’t seem related to the issue at hand.

Agreed. This was just an observation on my journey… Well… FYI, I’m really close. I decided to work my way backwards from my Asteroids project towards my PlatformBuddies project, and I fixed the websocket server issue. I’m in the process of trying to inject the code that causes the issue in my Asteroids project into Platform Buddies and I’ll paste that up here when I can crash the server! 😛 Hopefully soon, but I am leaving for the next couple hours. Be back later!

I think we are hitting a similar situation. At a glance I think it might be a race condition where a user disconnects in the middle of an RPC call. Upping the packet and buffer size seemed to help some but we still get it quite frequently if we have a fair number of players coming in and out.

Here is our log and stack dump:

Current player count: 5
Game Resumed
User: 1591479155 disconnected
Current player count: 4
Game Resumed
ERROR: get_peer: Condition ' !has_peer(p_id) ' is true. returned: __null
   At: modules/websocket/lws_server.cpp:178.

=================================================================
        Native Crash Reporting
=================================================================
Got a SIGSEGV while executing native code. This usually indicates
a fatal error in the mono runtime or one of the native libraries
used by your application.
=================================================================
/proc/self/maps:
402a9000-402b9000 rwxp 00000000 00:00 0
414bb000-414cb000 rwxp 00000000 00:00 0
41d21000-41d51000 rwxp 00000000 00:00 0
55c51d9fd000-55c52033c000 r-xp 00000000 fc:01 1553231                    /root/v                                                                                                                                                             sm-godot-server/bin/TestHeadlessServer/vsm_test_server
55c52053b000-55c5205c8000 r--p 0293e000 fc:01 1553231                    /root/v                                                                                                                                                             sm-godot-server/bin/TestHeadlessServer/vsm_test_server
55c5205c8000-55c5205d2000 rw-p 029cb000 fc:01 1553231                    /root/v                                                                                                                                                             sm-godot-server/bin/TestHeadlessServer/vsm_test_server
55c5205d2000-55c520600000 rw-p 00000000 00:00 0
55c520bf1000-55c52364c000 rw-p 00000000 00:00 0                          [heap]
7f5c60000000-7f5c60021000 rw-p 00000000 00:00 0
7f5c60021000-7f5c64000000 ---p 00000000 00:00 0
7f5c667c0000-7f5c66840000 rw-p 00000000 00:00 0
7f5c66844000-7f5c668c4000 rw-p 00000000 00:00 0
7f5c668c8000-7f5c66948000 rw-p 00000000 00:00 0
7f5c6694c000-7f5c669cc000 rw-p 00000000 00:00 0
7f5c669d0000-7f5c66a50000 rw-p 00000000 00:00 0
7f5c66a54000-7f5c66ad4000 rw-p 00000000 00:00 0
7f5c66ad8000-7f5c66b58000 rw-p 00000000 00:00 0
7f5c66b5c000-7f5c66bdc000 rw-p 00000000 00:00 0
7f5c66be0000-7f5c66c60000 rw-p 00000000 00:00 0
7f5c66c64000-7f5c66ce4000 rw-p 00000000 00:00 0
7f5c66ce8000-7f5c66d68000 rw-p 00000000 00:00 0
7f5c66d6c000-7f5c66dec000 rw-p 00000000 00:00 0
7f5c66df0000-7f5c66e70000 rw-p 00000000 00:00 0
7f5c66e74000-7f5c66ef4000 rw-p 00000000 00:00 0
7f5c66ef8000-7f5c66f78000 rw-p 00000000 00:00 0

=================================================================
        Native stacktrace:
=================================================================
        0x55c51e128be5 - ./vsm_test_server : (null)
        0x55c51e128f81 - ./vsm_test_server : (null)
        0x55c51e11b4d1 - ./vsm_test_server : (null)
        0x55c51e099ea1 - ./vsm_test_server : (null)
        0x55c51e4ea302 - ./vsm_test_server : _ZN24WebSocketMultiplayerPeer13_ser                                                                                                                                                             ver_relayEiiPKhj
        0x7ffce2eb2120 - Unknown

=================================================================
        Telemetry Dumper:
=================================================================
Pkilling 0x7f5c7575f700 from 0x7f5c77150780
Entering thread summarizer pause from 0x7f5c77150780
Finished thread summarizer pause from 0x7f5c77150780.

Waiting for dumping threads to resume

=================================================================
        External Debugger Dump:
=================================================================
mono_gdb_render_native_backtraces not supported on this platform, unable to find                                                                                                                                                              gdb or lldb

=================================================================
        Basic Fault Address Reporting
=================================================================
Memory around native instruction pointer (0x55c51e4ea302):0x55c51e4ea2f2  ff 90                                                                                                                                                              38 01 00 00 48 8b 3c 24 44 89 ea 4c 89 e6  ..8...H.<$D..L..
0x55c51e4ea302  48 8b 07 ff 90 b8 00 00 00 48 8b 3c 24 89 c3 48  H........H.<$..                                                                                                                                                             H
0x55c51e4ea312  85 ff 74 5c e8 45 14 3d 01 84 c0 74 53 48 8b 2c  ..t\.E.=...tSH.                                                                                                                                                             ,
0x55c51e4ea322  24 48 89 ef e8 e5 6f 3a 01 84 c0 74 43 48 8b 45  $H....o:...tCH.                                                                                                                                                             E

=================================================================
        Managed Stacktrace:
=================================================================
=================================================================
Aborted (core dumped)