socket.io: "Session ID unknown" after handshake on high server load [Socket.io 1.0.6]
I am running a multi-node server (16 workers running Socket.io 1.0.6; accessed via Nginx, configured as a reverse proxy supporting sticky sessions) for ~ 5k users. While the load of the server is low (2~3 on a 20 core server / 2k users), everyone is able to connect instantly. When the load of the server gets higher (5~6 / 5k users), new users are not able to connect and receive data instantly. In this case, it takes 2~4 handshakes for the users to connect succesfully.
This is what happens (high load):
- User opens the website; receives HTML and JS
- User’s browser attempts to initialize a socket.io connection to the server (
io.connect(...)) - A handshake request is sent to the server, the server responds with a SID and other information (
{"sid":"f-re6ABU3Si4pmyWADCx","upgrades":["websocket"],"pingInterval":25000,"pingTimeout":60000}) - The client initiates a polling-request, including this SID:
GET .../socket.io/?EIO=2&transport=polling&t=1408648886249-1&sid=f-re6ABU3Si4pmyWADCx - Instead of sending data, the server responds with 400 Bad Request:
{"code":1,"message":"Session ID unknown"} - The client performs a new handshake (
GET .../socket.io/?EIO=2&transport=polling&t=1408648888050-3, notice the previously received SID is omitted) - The server responds with new connection data, including a new SID: (
{"sid":"DdRxn2gv6vrtZOBiAEAS","upgrades":["websocket"],"pingInterval":25000,"pingTimeout":60000}) - The client performs a new polling request, including the new SID:
GET .../socket.io/?EIO=2&transport=polling&t=1408648888097-4&sid=DdRxn2gv6vrtZOBiAEAS - The server responds with the data that is
emitted in the worker source code.
Depending on the load of the server, it may happen 1~3 times that the server responds with "Session ID unknown" and the client needs to perform a new handshake before data is actually received.
About this issue
- Original URL
- State: closed
- Created 10 years ago
- Reactions: 9
- Comments: 30 (7 by maintainers)
for me, it was with nginx ssl http2, and it was polling, so the good config is:
DON’T FORGET TO CONFIGURE CLIENT AS WELL
Making just nodejs backend to use
transportas websocket protocol won’t do much. socket.io clients are also required to set with the same configuration. So, in my onion below should work:in nodejs:
and in js client:
['websocket', 'polling']will force socket.io to try webscoket as the first protocol to connect, otherwise fall back to polling (just in case some browsers/clients may not support websockets). For cluster environment, better to use['websocket']only.you are using polling, you can’t have a sticky session with polling or overly complex, with a connection via websocket, it opens and it keeps it there, if you have a cluster socket does not know where he is and it will creating a new connection every time or fails
We finally figured this out. The root cause in our case:
upstream-Ais unavailable,ip_hashwill route all ofA’s requests instead toupstream-Bupstream-Bgets the new requests, it spits out 5xx errors (correctly) because the SID is not found inthis.clientsupstream-CWe solved it by changing the nginx
max_failsto something more reasonable (and upping the open file-descriptor limit for our app’s user, which was a secondary failure point, exacerbated by the constant reconnects)Hi, I am having this same problem with nginx, node and socket.io. There is a way for nginx to use ‘sticky’ session ids passed along in the HTTP cookie that would solve it, but its part of their commerical offering. I was hoping the socket.io redis would address this by storing the session id in redis and using it from another socket.io-redis enabled node, but it doesn’t work. Maybe this is something that could be made to work using the redis adaptor?
I had this problem hosting my project with Heroku when I switched to multiple dynos, I solved enabling the sticky sessions with
heroku features:enable http-session-affinity.I have the same problem… Any solution?
For future readers:
Please note that using
transports: ['websocket']disables HTTP long-polling, so there’s no fallback if the WebSocket connection cannot be achieved (which might be acceptable or not, depending on your use case).Reference: https://socket.io/docs/v4/client-options/#transports
@over2000 if HTTP long-polling makes too many requests, then that surely means something is wrong with the setup, like CORS. Please check our troubleshooting guide: https://socket.io/docs/v4/troubleshooting-connection-issues/
Do you guys have any further debugging information? Could it be a problem in the stickiness logic? The only way for
{"code":1,"message":"Session ID unknown"}to be returned is if the SID is simply not in the in-memory datastructure.