node-datachannel: Crash in Node server after browser client closes RTCPeerConnection
After a successful WebRTC connection (offer created by Node server) and verifying that messages were sent and received over a RTCDataChannel, the browser client runs RTCPeerConnection.close(). This crashed the Node server with:
FATAL ERROR: Error::New napi_get_last_error_info
1: 0xb7a940 node::Abort() [node]
2: 0xa8e72f node::FatalError(char const*, char const*) [node]
3: 0xa8e738 node::OOMErrorHandler(char const*, bool) [node]
4: 0xb40ac3 napi_fatal_error [node]
5: 0x7fbd2ce055f0 Napi::CallbackScope::~CallbackScope() [<REPO_PATH>/node_modules/node-datachannel/build/Release/node_datachannel.node]
6: 0x7fbd2ce05918 Napi::Error::New(napi_env__*) [<REPO_PATH>/node_modules/node-datachannel/build/Release/node_datachannel.node]
7: 0x7fbd2cdeb5d3 [<REPO_PATH>/node_modules/node-datachannel/build/Release/node_datachannel.node]
8: 0x7fbd2ce41255 ThreadSafeCallback::callbackFunc(Napi::Env, Napi::Function, Napi::Reference<Napi::Value>*, ThreadSafeCallback::CallbackData*) [<REPO_PATH>/node_modules/node-datachannel/build/Release/node_datachannel.node]
9: 0xb4083c [node]
10: 0x165f601 uv_run [node]
11: 0xabda6d node::SpinEventLoop(node::Environment*) [node]
12: 0xbc1164 node::NodeMainInstance::Run() [node]
13: 0xb35bc8 node::LoadSnapshotDataAndRun(node::SnapshotData const**, node::InitializationResult const*) [node]
14: 0xb3976f node::Start(int, char**) [node]
15: 0x7fbd34d00d0a __libc_start_main [/lib/x86_64-linux-gnu/libc.so.6]
16: 0xabbdee _start [node]
Aborted (core dumped)
Environment:
node-datachannelversion:0.4.3- Docker image
node:18.16.1-bullseye-slimrunning on DigitalOcean droplet uname -ainside Docker container:Linux do-sfo3-0-dev-swarm-drone-demand-node-pool-f9ed0 6.1.0-0.deb11.6-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.15-1~bpo11+1 (2023-03-16) x86_64 GNU/Linux
I can swap between node-datachannel and wrtc for debugging purposes. When using wrtc, I don’t see a crash like this on the Node server after running RTCPeerConnection.close() on the browser client
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 19 (12 by maintainers)
@murat-dogan Thanks to your help, I was able to test the change and could not reproduce the crash with the change
How I tested:
node-datachannelversion:0.4.3. I was able to repro after 2 attemptsnode_modules/node-datachannel/build/Release/node_datachannel.nodewith the one from https://github.com/murat-dogan/node-datachannel/releases/download/v0.4.4-dev/node-datachannel-v0.4.4-dev-node-v108-linux-x64.tar.gznode-datachannelversion:0.4.3and was able to repro the crash after 3 attemptsOh, I misunderstood, sorry. Thank you for working on a fix!
I didn’t mean that either. I meant if we were deleting from instance list then cb will not be called. The solution is not related to that.
Thank you @regnaio for detailed instructions.
This problem is related to https://github.com/murat-dogan/node-datachannel/issues/87 In order to receive close events we need to keep callbacks, but that has side effects.
I added log outputs for this issue and for future issues. I got this log;
So the problem is you are calling
peerConnection.close();beforemOnStateChangeCallback callmRtcPeerConnPtris already destroyed but we didn’t have time to call callbacks yet. Here we don’t reset callbacks for events, so it is still in the instances list. Found a solution, and I am testing it.Yes I could reproduce it with this example code. I need some time to investigate more.
Just a note, in all cases you should not need reset callbacks manually.
For reference; Crashed log;
NOT crashed;
The issue here seems to be that access to the callbacks is not synchronized in the wrapper, so this code ends up assigning
PeerConnectionWrapper::mOnSignalingStateChangeCallbackat the same time as it is called in thertc::PeerConnection::onSignalingStateChangecallback: