mediasoup: consuming plain producer results in worker crash

I’m pushing video to mediasoup(3.11.4) producer with gstremer. There is no problems with it if there is no consumers for the producer. But after about 80 seconds of consuming (video play perfectly whole time) mediasoup worker process dies with failed assertion 'this->buffer.size() <= MaxSeq': StorageItemBuffer contains more than 65535 entries

Core dump:

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
        set = {__val = {0, 94619480744673, 28202962353000, 56, 94619497407008, 94619497407720, 140736458774320, 94619497345440, 140283140088888, 94619480744716, 94619497031937, 28202962353000, 56,
            0, 18446744073709551615, 18446744073709551615}}
        pid = <optimized out>
        tid = <optimized out>
        ret = <optimized out>
#1  0x00007f96356aa7f1 in __GI_abort () at abort.c:79
        save_stage = 1
        act = {__sigaction_handler = {sa_handler = 0x1, sa_sigaction = 0x1}, sa_mask = {__val = {0, 0, 511101108348, 395136991342, 140283121853312, 94619497341264, 512, 140283121851456,
              4294934528, 94619569736784, 32769, 32705, 140283118359213, 390842024046, 140283121854080, 140283121836704}}, sa_flags = 896436269, sa_restorer = 0x560e55d448e0}
        sigs = {__val = {32, 0 <repeats 15 times>}}
        __cnt = <optimized out>
        __set = <optimized out>
        __cnt = <optimized out>
        __set = <optimized out>
#2  0x0000560e50497752 in RTC::RtpStreamSend::StorageItemBuffer::Insert(unsigned short, RTC::RtpStreamSend::StorageItem*) ()
No symbol table info available.
#3  0x0000560e50497b1e in RTC::RtpStreamSend::StorePacket(RTC::RtpPacket*, std::shared_ptr<RTC::RtpPacket>&) ()
No symbol table info available.
#4  0x0000560e50497e1a in RTC::RtpStreamSend::ReceivePacket(RTC::RtpPacket*, std::shared_ptr<RTC::RtpPacket>&) ()
No symbol table info available.
#5  0x0000560e504a9157 in RTC::SimpleConsumer::SendRtpPacket(RTC::RtpPacket*, std::shared_ptr<RTC::RtpPacket>&) ()
No symbol table info available.
#6  0x0000560e50479b3b in RTC::Router::OnTransportProducerRtpPacketReceived(RTC::Transport*, RTC::Producer*, RTC::RtpPacket*) ()
No symbol table info available.
#7  0x0000560e50465f33 in RTC::Producer::ReceiveRtpPacket(RTC::RtpPacket*) ()
No symbol table info available.
#8  0x0000560e504c0a91 in RTC::Transport::ReceiveRtpPacket(RTC::RtpPacket*) ()
No symbol table info available.
#9  0x0000560e50451ffb in RTC::PlainTransport::OnRtpDataReceived(RTC::TransportTuple*, unsigned char const*, unsigned long) ()
No symbol table info available.
#10 0x0000560e5045316e in non-virtual thunk to RTC::PlainTransport::OnUdpSocketPacketReceived(RTC::UdpSocket*, unsigned char const*, unsigned long, sockaddr const*) ()
No symbol table info available.
#11 0x0000560e50820cd3 in uv.udp_recvmmsg ()
No symbol table info available.
#12 0x0000560e50821a43 in uv.udp_io ()
No symbol table info available.
#13 0x0000560e50825286 in uv.io_poll ()
No symbol table info available.
#14 0x0000560e50817a06 in uv_run ()
No symbol table info available.
#15 0x0000560e503acdb9 in DepLibUV::RunLoop() ()
No symbol table info available.
#16 0x0000560e503ba751 in Worker::Worker(Channel::ChannelSocket*, PayloadChannel::PayloadChannelSocket*) ()
No symbol table info available.
#17 0x0000560e503ab2e6 in mediasoup_worker_run ()
No symbol table info available.
#18 0x0000560e503a9e74 in main ()
No symbol table info available.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 48 (30 by maintainers)

Commits related to this issue

Most upvoted comments

In some cases there is just no way to work around broken clients in a sane way. With more complex logic we can shorten the max buffer size to less than 2^16-1, but I think that should be a follow-up optimization.

I believe that there is a consensus to make it not crash first and optimize later.

IMHO ClearOldPackets() should handle this situation and remove the oldest items when the most X recent timestamp is the same.

Honestly I don’t think this is the way to go. We must not assume frame max size. The encoder may need to send many packets with same timestamp due to huge video frames. If the client is buggy (such as in this issue AFAIU) then that’s its problem. What we have to fix is the Insert() method which is the one that should never store more than MaxSeq packets.

idx <= static_cast<uint16_t>(this->buffer.size() - 1) ensures this in one of the branches, it’ll simply start overriding old values once we reach buffer size of elements.

The last else branch is where this issue probably happens. I guess there was incorrect expectation about sequence numbers there and some packets do get processed out of order enough for sequence number to overflow and become below this->startSeq. I think similar check for buffer size needs to be added here and start overriding values instead of pushing beyond buffer size.

It will result in messed up contents of packets when decoder tries to interpret it, but we for sure must not crash here.

If it can help we have video that does not crash worker even without do-timestamp . https://codeda.com/data/videoRecordH264.mp4 You just need to replace url param in the demo

It seemed to help on the first try!! We will keep testing and keep you updated. Thanks!

Ah i understand, it’s RTP packet timestamps which are all the same, but codec packets’ pts within them are correct so the stream is playable. We will look into it on our side thanks.

Please @angedonik, do this change too. TS in strict mode does not allow octal literals.

diff --git a/webrtc-handler/src/api-handler.ts b/webrtc-handler/src/api-handler.ts
index a004a2a..a97c25f 100644
--- a/webrtc-handler/src/api-handler.ts
+++ b/webrtc-handler/src/api-handler.ts
@@ -131,7 +131,7 @@ export class ApiHandler {
         this.gst?.kill()
         const tmpDir=await dir();
         const fifoPath=join(tmpDir.path,'video');
-        mkfifoSync(fifoPath, 664);
+        mkfifoSync(fifoPath, 0o664);
         const {ssrc, listenIp, rtpPort, rtcpPort, payloadType} = await this.plainProduce();
         this.ffmpeg = spawn('ffmpeg', ['-analyzeduration', '20M', '-probesize', '20M', '-re', '-i', url, '-map', '0:v:0', '-c:v', 'copy',
             '-async', '10000', '-f', 'tee', `[select=v:f=h264]${fifoPath}`], {detached: false});
@@ -143,7 +143,8 @@ export class ApiHandler {
             delete this.ffmpeg;
             this.gst?.kill()
         })

BTW: I have it working, waiting for it to crash.

Done

With this demo the crash happens every single time: https://github.com/angedonik/plain-producer-demo