mediasoup: Segmentation fault in DTLS procedures
Bug Report
Your environment
- Operating system: Ubuntu 20.04.4 LTS
- Node version: v16.15.0
- npm version: 8.5.5
- gcc/clang version: 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
- mediasoup version: 3.10.0
- mediasoup-client version: n/a
Issue description
We’re seeing sporadic worker crashes (SIGSEGV) after trying mediasoup 3.9.12 or later that seem related to DTLS.
- I’ve tried it with 3.9.12, 3.9.13 and 3.10.0 in some of our demo servers and could see crashes pop up in monitoring with all of them.
- It doesn’t seem to be happening in another demo server with similar user load and control plane that’s still running mediasoup
3.9.10.
I’ve set workers to be “long lived” in those demo servers (ie I usually rotate them every ~3 to ~5 days) and the crash seems to occur every ~two days (with slight variations to that - ie it’s not that frequent).
For the sake of consistency, the info I’m attaching is related to a 3.10.0 occurrence.
Core dump
The attached tarball includes a core dump, logs from the worker and the worker binary file.
- Since not every log is annotated with the
pidand the server is running multiple workers, some log entries (eg transport creation etc) may be missing because I couldn’t isolate them reliably. - ms3100c.tar.gz
Backtrace snippet
0 0x000055d775fdbc6a in dtls1_get_timeout ()
No symbol table info available.
#1 0x000055d775fdbd80 in dtls1_is_timer_expired ()
No symbol table info available.
#2 0x000055d775fdbf22 in dtls1_handle_timeout ()
No symbol table info available.
#3 0x000055d775fdbff5 in dtls1_ctrl ()
No symbol table info available.
#4 0x000055d775c9e962 in RTC::DtlsTransport::SetTimeout() ()
No symbol table info available.
#5 0x000055d775c9e9d4 in RTC::DtlsTransport::SetTimeout() ()
No symbol table info available.
#6 0x000055d775c9e9d4 in RTC::DtlsTransport::SetTimeout() ()
No symbol table info available.
[... goes on with thousands of similar entries ...]
#74693 0x000055d775c9e9d4 in RTC::DtlsTransport::SetTimeout() ()
No symbol table info available.
#74694 0x000055d775c9e9d4 in RTC::DtlsTransport::SetTimeout() ()
No symbol table info available.
#74695 0x000055d775c9e9d4 in RTC::DtlsTransport::SetTimeout() ()
No symbol table info available.
#74696 0x000055d775c9d671 in RTC::DtlsTransport::ProcessDtlsData(unsigned char const*, unsigned long) ()
No symbol table info available.
#74697 0x000055d775d4295f in non-virtual thunk to RTC::WebRtcTransport::OnUdpSocketPacketReceived(RTC::UdpSocket*, unsigned char const*, unsigned long, sockaddr const*) ()
No symbol table info available.
#74698 0x000055d77603d978 in uv.udp_recvmmsg ()
No symbol table info available.
#74699 0x000055d77603e6e3 in uv.udp_io ()
No symbol table info available.
#74700 0x000055d7760421d5 in uv.io_poll ()
No symbol table info available.
#74701 0x000055d776034e5a in uv_run ()
No symbol table info available.
#74702 0x000055d775c4490d in DepLibUV::RunLoop() ()
No symbol table info available.
#74703 0x000055d775c4f8eb in Worker::Worker(Channel::ChannelSocket*, PayloadChannel::PayloadChannelSocket*) ()
No symbol table info available.
#74704 0x000055d775c43707 in mediasoup_worker_run ()
No symbol table info available.
#74705 0x000055d775c423cf in main ()
Partial worker logs
Jul 3 04:48:26 <some-server> sfu[2547297]: 2022-07-03T04:48:26.335Z mediasoup:Channel [pid:2547387] RTC::Transport::HandleRequest() | enabling TransportCongestionControlServer with transport-cc
Jul 3 04:48:26 <some-server> sfu[2547297]: 2022-07-03T04:48:26.600Z mediasoup:Channel [pid:2547387] RTC::IceServer::HandleTuple() | transition from state 'new' to 'connected' [hasUseCandidate:false, hasNomination:false, nomination:0]
Jul 3 04:48:26 <some-server> sfu[2547297]: 2022-07-03T04:48:26.600Z mediasoup:Channel [pid:2547387] RTC::WebRtcTransport::OnIceServerSelectedTuple() | ICE selected tuple
Jul 3 04:48:26 <some-server> sfu[2547297]: 2022-07-03T04:48:26.600Z mediasoup:Channel [pid:2547387] RTC::WebRtcTransport::OnIceServerConnected() | ICE connected
Jul 3 04:48:26 <some-server> sfu[2547297]: 2022-07-03T04:48:26.600Z mediasoup:Channel [pid:2547387] RTC::WebRtcTransport::MayRunDtlsTransport() | running DTLS transport in local role 'client'
Jul 3 04:48:26 <some-server> sfu[2547297]: 2022-07-03T04:48:26.600Z mediasoup:Channel [pid:2547387] RTC::WebRtcTransport::OnDtlsTransportConnecting() | DTLS connecting
Jul 3 04:48:26 <some-server> sfu[2547297]: 2022-07-03T04:48:26.600Z mediasoup:Channel [pid:2547387] RTC::DtlsTransport::Run() | running [role:client]
Jul 3 04:48:26 <some-server> sfu[2547297]: 2022-07-03T04:48:26.600Z mediasoup:Channel [pid:2547387] RTC::DtlsTransport::OnSslInfo() | DTLS handshake start
Jul 3 04:48:26 <some-server> sfu[2547297]: 2022-07-03T04:48:26.600Z mediasoup:Channel [pid:2547387] RTC::DtlsTransport::OnSslInfo() | [role:client, action:'before SSL initialization']
Jul 3 04:48:26 <some-server> sfu[2547297]: 2022-07-03T04:48:26.600Z mediasoup:Channel [pid:2547387] RTC::DtlsTransport::OnSslInfo() | [role:client, action:'SSLv3/TLS write client hello']
Jul 3 04:48:26 <some-server> sfu[2547297]: 2022-07-03T04:48:26.600Z mediasoup:Channel [pid:2547387] RTC::DtlsTransport::OnSslInfo() | role: client, waiting:'SSLv3/TLS write client hello']
Jul 3 04:48:26 <some-server> sfu[2547297]: 2022-07-03T04:48:26.931Z mediasoup:Channel [pid:2547387] RTC::DtlsTransport::OnSslInfo() | [role:client, action:'SSLv3/TLS write client hello']
Jul 3 04:48:26 <some-server> sfu[2547297]: 2022-07-03T04:48:26.931Z mediasoup:Channel [pid:2547387] RTC::DtlsTransport::OnSslInfo() | [role:client, action:'SSLv3/TLS read server hello']
Jul 3 04:48:26 <some-server> sfu[2547297]: 2022-07-03T04:48:26.931Z mediasoup:Channel [pid:2547387] RTC::DtlsTransport::OnSslInfo() | [role:client, action:'SSLv3/TLS read server certificate']
Jul 3 04:48:26 <some-server> sfu[2547297]: 2022-07-03T04:48:26.932Z mediasoup:Channel [pid:2547387] RTC::DtlsTransport::OnSslInfo() | [role:client, action:'SSLv3/TLS read server key exchange']
Jul 3 04:48:26 <some-server> sfu[2547297]: 2022-07-03T04:48:26.932Z mediasoup:Channel [pid:2547387] RTC::DtlsTransport::OnSslInfo() | [role:client, action:'SSLv3/TLS read server certificate request']
Jul 3 04:48:26 <some-server> sfu[2547297]: 2022-07-03T04:48:26.932Z mediasoup:Channel [pid:2547387] RTC::DtlsTransport::OnSslInfo() | [role:client, action:'SSLv3/TLS read server done']
Jul 3 04:48:26 <some-server> sfu[2547297]: 2022-07-03T04:48:26.932Z mediasoup:Channel [pid:2547387] RTC::DtlsTransport::OnSslInfo() | [role:client, action:'SSLv3/TLS write client certificate']
Jul 3 04:48:26 <some-server> sfu[2547297]: 2022-07-03T04:48:26.932Z mediasoup:Channel [pid:2547387] RTC::DtlsTransport::OnSslInfo() | [role:client, action:'SSLv3/TLS write client key exchange']
Jul 3 04:48:26 <some-server> sfu[2547297]: 2022-07-03T04:48:26.932Z mediasoup:Channel [pid:2547387] RTC::DtlsTransport::OnSslInfo() | [role:client, action:'SSLv3/TLS write certificate verify']
Jul 3 04:48:26 <some-server> sfu[2547297]: 2022-07-03T04:48:26.932Z mediasoup:Channel [pid:2547387] RTC::DtlsTransport::OnSslInfo() | [role:client, action:'SSLv3/TLS write change cipher spec']
Jul 3 04:48:26 <some-server> sfu[2547297]: 2022-07-03T04:48:26.932Z mediasoup:Channel [pid:2547387] RTC::DtlsTransport::OnSslInfo() | [role:client, action:'SSLv3/TLS write finished']
Jul 3 04:48:26 <some-server> sfu[2547297]: 2022-07-03T04:48:26.932Z mediasoup:Channel [pid:2547387] RTC::DtlsTransport::OnSslInfo() | role: client, waiting:'SSLv3/TLS write finished']
Jul 3 04:48:27 <some-server> sfu[2547297]: 2022-07-03T04:48:27.175Z mediasoup:Channel [pid:2547387] RTC::DtlsTransport::OnSslInfo() | [role:client, action:'SSLv3/TLS write finished']
Jul 3 04:48:27 <some-server> sfu[2547297]: 2022-07-03T04:48:27.175Z mediasoup:Channel [pid:2547387] RTC::DtlsTransport::OnSslInfo() | [role:client, action:'SSLv3/TLS read change cipher spec']
Jul 3 04:48:27 <some-server> sfu[2547297]: 2022-07-03T04:48:27.175Z mediasoup:Channel [pid:2547387] RTC::DtlsTransport::OnSslInfo() | [role:client, action:'SSLv3/TLS read finished']
Jul 3 04:48:27 <some-server> sfu[2547297]: 2022-07-03T04:48:27.175Z mediasoup:Channel [pid:2547387] RTC::DtlsTransport::OnSslInfo() | DTLS handshake done
Jul 3 04:48:27 <some-server> sfu[2547297]: 2022-07-03T04:48:27.175Z mediasoup:Channel [pid:2547387] RTC::DtlsTransport::CheckRemoteFingerprint() | valid remote fingerprint
Jul 3 04:48:27 <some-server> sfu[2547297]: 2022-07-03T04:48:27.175Z mediasoup:Channel [pid:2547387] RTC::DtlsTransport::GetNegotiatedSrtpCryptoSuite() | chosen SRTP crypto suite: SRTP_AES128_CM_SHA1_80
Jul 3 04:48:27 <some-server> sfu[2547297]: 2022-07-03T04:48:27.175Z mediasoup:Channel [pid:2547387] RTC::WebRtcTransport::OnDtlsTransportConnected() | DTLS connected
Jul 3 04:48:27 <some-server> sfu[2547297]: 2022-07-03T04:48:27.708Z mediasoup:Channel [pid:2547387] RTC::Producer::CreateRtpStream() | [encodingIdx:0, ssrc:2625057736, rid:, payloadType:96]
Jul 3 04:48:27 <some-server> sfu[2547297]: 2022-07-03T04:48:27.708Z mediasoup:Channel [pid:2547387] RTC::Producer::CreateRtpStream() | FIR supported
Jul 3 04:48:27 <some-server> sfu[2547297]: 2022-07-03T04:48:27.708Z mediasoup:Channel [pid:2547387] RTC::Producer::CreateRtpStream() | NACK supported
Jul 3 04:48:27 <some-server> sfu[2547297]: 2022-07-03T04:48:27.708Z mediasoup:Channel [pid:2547387] RTC::Producer::CreateRtpStream() | PLI supported
Jul 3 04:48:27 <some-server> sfu[2547297]: 2022-07-03T04:48:27.708Z mediasoup:Channel [pid:2547387] RTC::Producer::ReceiveRtpPacket() | key frame received [ssrc:2625057736, seq:3161]
Jul 3 04:48:29 <some-server> sfu[2547297]: 2022-07-03T04:48:29.432Z mediasoup:Channel [pid:2547387] RTC::IceServer::HandleTuple() | transition from state 'connected' to 'completed' [hasUseCandidate:true, hasNomination:false, nomination:0]
Jul 3 04:48:29 <some-server> sfu[2547297]: 2022-07-03T04:48:29.432Z mediasoup:Channel [pid:2547387] RTC::WebRtcTransport::OnIceServerCompleted() | ICE completed
Jul 3 04:48:30 <some-server> sfu[2547297]: 2022-07-03T04:48:30.418Z mediasoup:Channel [pid:2547387] RTC::SimpleConsumer::CreateRtpStream() | [ssrc:972893934, payloadType:96]
Jul 3 04:48:30 <some-server> sfu[2547297]: 2022-07-03T04:48:30.418Z mediasoup:Channel [pid:2547387] RTC::SimpleConsumer::CreateRtpStream() | FIR supported
Jul 3 04:48:30 <some-server> sfu[2547297]: 2022-07-03T04:48:30.418Z mediasoup:Channel [pid:2547387] RTC::SimpleConsumer::CreateRtpStream() | NACK supported
Jul 3 04:48:30 <some-server> sfu[2547297]: 2022-07-03T04:48:30.418Z mediasoup:Channel [pid:2547387] RTC::SimpleConsumer::CreateRtpStream() | PLI supported
Jul 3 04:48:30 <some-server> sfu[2547297]: 2022-07-03T04:48:30.668Z mediasoup:Channel [pid:2547387] RTC::IceServer::HandleTuple() | transition from state 'new' to 'completed' [hasUseCandidate:true, hasNomination:false, nomination:0]
Jul 3 04:48:30 <some-server> sfu[2547297]: 2022-07-03T04:48:30.668Z mediasoup:Channel [pid:2547387] RTC::WebRtcTransport::OnIceServerSelectedTuple() | ICE selected tuple
Jul 3 04:48:30 <some-server> sfu[2547297]: 2022-07-03T04:48:30.668Z mediasoup:Channel [pid:2547387] RTC::WebRtcTransport::OnIceServerCompleted() | ICE completed
Jul 3 04:48:30 <some-server> sfu[2547297]: 2022-07-03T04:48:30.669Z mediasoup:Channel [pid:2547387] RTC::WebRtcTransport::MayRunDtlsTransport() | transition from DTLS local role 'auto' to 'server' and running DTLS transport
Jul 3 04:48:30 <some-server> sfu[2547297]: 2022-07-03T04:48:30.669Z mediasoup:Channel [pid:2547387] RTC::WebRtcTransport::OnDtlsTransportConnecting() | DTLS connecting
Jul 3 04:48:30 <some-server> sfu[2547297]: 2022-07-03T04:48:30.669Z mediasoup:Channel [pid:2547387] RTC::DtlsTransport::Run() | running [role:server]
Jul 3 04:48:30 <some-server> sfu[2547297]: 2022-07-03T04:48:30.669Z mediasoup:Channel [pid:2547387] RTC::DtlsTransport::OnSslInfo() | DTLS handshake start
Jul 3 04:48:30 <some-server> sfu[2547297]: 2022-07-03T04:48:30.669Z mediasoup:Channel [pid:2547387] RTC::DtlsTransport::OnSslInfo() | [role:server, action:'before SSL initialization']
Jul 3 04:48:30 <some-server> sfu[2547297]: 2022-07-03T04:48:30.669Z mediasoup:Channel [pid:2547387] RTC::DtlsTransport::OnSslInfo() | role: server, waiting:'before SSL initialization']
Jul 3 04:48:32 <some-server> sfu[2547297]: 2022-07-03T04:48:32.453Z mediasoup:Channel [pid:2547387] RTC::DtlsTransport::OnSslInfo() | role: server, waiting:'before SSL initialization']
Jul 3 04:48:32 <some-server> sfu[2547297]: 2022-07-03T04:48:32.531Z mediasoup:ERROR:Worker worker process died unexpectedly [pid:2547387, code:null, signal:SIGSEGV]
Jul 3 04:48:32 <some-server> sfu[2547297]: 2022-07-03T04:48:32.531Z mediasoup:Worker died() [error:Error: [pid:2547387, code:null, signal:SIGSEGV]]
Additional info
The last thing to happen in that worker prior to the crash seems to be an attempted negotiation of a WebRtc transport with the sole purpose of consuming from an existing producer (also WebRTC). The requester was a Firefox 102 instance in a Windows 7 (NT 6.1) box.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 1
- Comments: 23 (13 by maintainers)
Commits related to this issue
- May fix DTLS related crash (issue #861) It may fix #861. - `DtlsTransporet.cpp': If `DTLSv1_get_timeout()` doesn't return 0 then computed `timeoutMs` will never be 0, so don't handle that case that ... — committed to versatica/mediasoup by ibc 2 years ago
- May fix DTLS related crash (issue #861) (#867) * Fix DTLS related crash (issue #861) — committed to versatica/mediasoup by ibc 2 years ago
- Update mediasoup-sys for fixed crash https://github.com/versatica/mediasoup/issues/861 — committed to oviceinc/mediasoup-elixir by satoren 2 years ago
- Update mediasoup-sys for fixed crash (#157) * Update mediasoup-sys for fixed crash https://github.com/versatica/mediasoup/issues/861 — committed to oviceinc/mediasoup-elixir by satoren 2 years ago
- May fix DTLS related crash (issue #861) (#867) * Fix DTLS related crash (issue #861) — committed to dyte-in/mediasoup by ibc 2 years ago
Released Rust version with this as well
Amazing. Let’s wait a few more days and will merge and release. Thanks!
@prlanzarin feel free to test this branch now. Yes, run a
Debugbuild instead and, if you get a coredump, could you open it with gdb and print it here? We were not able to open the one you provided above.@prlanzarin please wait a bit, I’ll add a change in the PR to verify a thing and let you know here once done.