node-slack-sdk: Socket mode: Bolt doesn't handle WebSocket disconnections gracefully
Description
Bolt crashes when it loses its WebSocket connection in socket mode, and there doesn’t seem to be a great way for the calling code to handle the situation. Disconnections from an EC2 instance in us-east-1 are happening frequently enough for this to be a significant issue. This happens even when a bot is idle.
What type of issue is this? (place an x
in one of the [ ]
)
- bug
- enhancement (feature request)
- question
- documentation related
- example code related
- testing related
- discussion
Requirements (place an x
in each of the [ ]
)
- I’ve read and understood the Contributing guidelines and have done my best effort to follow them.
- I’ve read and agree to the Code of Conduct.
- I’ve searched for any related issues and avoided creating a duplicate issue.
Bug Report
Filling out the following details about bugs will help us solve your issue sooner.
Reproducible in:
package version: 3.3.0
node version: 16.1.0
OS version(s): Ubuntu 20.04.2 LTS (focal)
Steps to reproduce:
I have a fairly simple bot in the following form:
const app = new App({
token: "...",
appToken: "...",
socketMode: true,
});
app.message(/^foobar$/, async ({ message, say, client }) => {
// Doesn't matter what's here; it doesn't need to ever trigger for the bug to manifest
});
await app.start();
The crash happens if I:
- Start the bot on an EC2 us-east-1 instance.
- Leave it running for an arbitrary period of time–sometimes it takes an hour, sometimes a few days.
- At some point, the WebSocket will close. I’m not sure why, but I assume it’s something on Slack’s end, as the EC2 instance is left untouched.
Expected result:
The bot will either:
- Handle connectivity interruptions gracefully, without the need for the programmer to manually reconnect or instantiate a new
App
. - Throw an exception or trigger an event that can be handled by the calling code in a trivial manner–one that doesn’t require instantiating a new
App
. This would probably need to be documented and demonstrated in the examples, since a socket mode Bolt bot isn’t really viable without some mechanism for handling connectivity interruptions.
Actual result:
[ERROR] socket-mode:SocketModeClient:0 failed to send message on websocket: WebSocket is not open: readyState 3 (CLOSED)
/git/slackbot/node_modules/@slack/socket-mode/dist/errors.js:29
const error = errorWithCode(new Error(`Failed to send message on websocket: ${original.message}`), ErrorCode.WebsocketError);
^
Error: Failed to send message on websocket: WebSocket is not open: readyState 3 (CLOSED)
at Object.websocketErrorWithOriginal (/git/slackbot/node_modules/@slack/socket-mode/dist/errors.js:29:33)
at /git/slackbot/node_modules/@slack/socket-mode/dist/SocketModeClient.js:347:48
at sendAfterClose (/git/slackbot/node_modules/ws/lib/websocket.js:758:5)
at WebSocket.send (/git/slackbot/node_modules/ws/lib/websocket.js:345:7)
at /git/slackbot/node_modules/@slack/socket-mode/dist/SocketModeClient.js:344:32
at new Promise (<anonymous>)
at SocketModeClient.send (/git/slackbot/node_modules/@slack/socket-mode/dist/SocketModeClient.js:329:16)
at ack (/git/slackbot/node_modules/@slack/socket-mode/dist/SocketModeClient.js:469:24)
at App.processEvent (/git/slackbot/node_modules/@slack/bolt/dist/App.js:428:19)
at runMicrotasks (<anonymous>) {
code: 'slack_socket_mode_websocket_error',
original: Error: WebSocket is not open: readyState 3 (CLOSED)
at sendAfterClose (/git/slackbot/node_modules/ws/lib/websocket.js:754:17)
at WebSocket.send (/git/slackbot/node_modules/ws/lib/websocket.js:345:7)
at /git/slackbot/node_modules/@slack/socket-mode/dist/SocketModeClient.js:344:32
at new Promise (<anonymous>)
at SocketModeClient.send (/git/slackbot/node_modules/@slack/socket-mode/dist/SocketModeClient.js:329:16)
at ack (/git/slackbot/node_modules/@slack/socket-mode/dist/SocketModeClient.js:469:24)
at App.processEvent (/git/slackbot/node_modules/@slack/bolt/dist/App.js:428:19)
at runMicrotasks (<anonymous>)
at processTicksAndRejections (node:internal/process/task_queues:96:5)
at async SocketModeClient.<anonymous> (/git/slackbot/node_modules/@slack/bolt/dist/receivers/SocketModeReceiver.js:88:13)
}
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 7
- Comments: 21 (8 by maintainers)
Commits related to this issue
- Fix #1243 Socket Mode reconnection issue — committed to seratch/node-slack-sdk by seratch 2 years ago
- Fix #1243 Socket Mode reconnection issue — committed to seratch/node-slack-sdk by seratch 2 years ago
- Fix #1243 Socket Mode reconnection issue (#1465) * Fix #1243 Socket Mode reconnection issue Co-authored-by: Fil Maj <maj.fil@gmail.com> — committed to slackapi/node-slack-sdk by seratch 2 years ago
- upgrade @slack/socket-mode to vnext see https://github.com/slackapi/node-slack-sdk/issues/1243#issuecomment-1099845452 — committed to balpha/wheelofblame by balpha 2 years ago
- upgrade @slack/socket-mode to vnext in a way it actually works it's a dependency of bolt, not a top-level dependency ref: dc1d8bc4b8817344b437c7fef6b57dc554376677, https://github.com/slackapi/node-s... — committed to balpha/wheelofblame by balpha 2 years ago
👋 Hi, all the developers watching this thread!
We’ve improved the
SocketModeClient
internals a lot to be more robust. You can try the updated one by installing the RC version by running the following command:If you are interested in all the changes included in the release, refer to https://github.com/slackapi/node-slack-sdk/releases/tag/%40slack%2Fsocket-mode%401.3.0-rc.0 for the details.
If everything goes well, we will release v1.3.0 next week!
I also notice connection failures lately. Twice today actually. Here are the debug logs. App is stuck at this state until i manually intervene and do an app restart. Though stuck at this state, it is still able to receive events but eventually it errors before it reaches the code
app.event('app_mention', handler)
version 1.3.0 is now available. Please upgrade to the latest version 👋
Anyone trying this, note my mistake in the first commit ⬆️ – if you’re using bolt, then socket-mode is a transitive dependency.
We encountered a very similar problem. But WebSocket disconnections occur constantly.
Meanwhile, i did this as a workaround. Not sure if it is the right approach to depend on
app.receiver.client.badConnection
, but based on my testing seems to works well. Alternatively, would be great for the library to support a health check ability for the bot. My deploy setup would restart the bot if/health
fails with500
We encountered this exact issue as well. WebSocket disconnections seem to happen pretty rarely but still, I think that this is a pretty major issue for socket mode bots.
Something I thought I would point out is that the default handling of unhandled promise rejections has changed between Node 14.x and 16.x. In 14.x (latest LTS) you just get
UnhandledPromiseRejectionWarning
but in 16.x the application will be terminated. This probably explains the different behavior reported by @Zenexer and @usubram .We’re running in 14.x and decided that the easiest fix for now was to add a handler for
unhandledRejection
that forces the application to crash (so our process manager could then restart it):But anyhow, restarting the whole app just to reconnect lost WebSocket connection doesn’t seem like a good long term solution…